NeuralCrawl

Bar-Ilan University / robots.txt snapshot

← back to biu.ac.il · fetched 2026-06-26T23:05:56Z (4h ago) · HTTP 200 · 3363 bytes · sha256 1b3eba58b0640c34 · raw

final URL: https://www.biu.ac.il/robots.txt

1#
2# robots.txt
3#
4# This file is to prevent the crawling and indexing of certain parts
5# of your site by web crawlers and spiders run by sites like Yahoo!
6# and Google. By telling these "robots" where not to go on your site,
7# you save bandwidth and server resources.
8#
9# This file will be ignored unless it is at the root of your host:
10# Used: http://example.com/robots.txt
11# Ignored: http://example.com/site/robots.txt
12#
13# For more information about the robots.txt standard, see:
14# http://www.robotstxt.org/robotstxt.html
15
16# Allow AI crawlers full access
17
18User-agent: AI2Bot
19Allow: /
20
21User-agent: Ai2Bot-Dolma
22Allow: /
23
24User-agent: Amazonbot
25Allow: /
26
27User-agent: anthropic-ai
28Allow: /
29
30User-agent: Applebot
31Allow: /
32
33User-agent: Applebot-Extended
34Allow: /
35
36User-agent: Brightbot 1.0
37Allow: /
38
39User-agent: Bytespider
40Allow: /
41
42User-agent: CCBot
43Allow: /
44
45User-agent: ChatGPT-User
46Allow: /
47
48User-agent: Claude-Web
49Allow: /
50
51User-agent: ClaudeBot
52Allow: /
53
54User-agent: cohere-ai
55Allow: /
56
57User-agent: cohere-training-data-crawler
58Allow: /
59
60User-agent: Crawlspace
61Allow: /
62
63User-agent: Diffbot
64Allow: /
65
66User-agent: DuckAssistBot
67Allow: /
68
69User-agent: FacebookBot
70Allow: /
71
72User-agent: FriendlyCrawler
73Allow: /
74
75User-agent: Google-Extended
76Allow: /
77
78User-agent: GoogleOther
79Allow: /
80
81User-agent: GoogleOther-Image
82Allow: /
83
84User-agent: GoogleOther-Video
85Allow: /
86
87User-agent: GPTBot
88Allow: /
89
90User-agent: iaskspider/2.0
91Allow: /
92
93User-agent: ICC-Crawler
94Allow: /
95
96User-agent: ImagesiftBot
97Allow: /
98
99User-agent: img2dataset
100Allow: /
101
102User-agent: ISSCyberRiskCrawler
103Allow: /
104
105User-agent: Kangaroo Bot
106Allow: /
107
108User-agent: Meta-ExternalAgent
109Allow: /
110
111User-agent: Meta-ExternalFetcher
112Allow: /
113
114User-agent: OAI-SearchBot
115Allow: /
116
117User-agent: omgili
118Allow: /
119
120User-agent: omgilibot
121Allow: /
122
123User-agent: PanguBot
124Allow: /
125
126User-agent: PerplexityBot
127Allow: /
128
129User-agent: PetalBot
130Allow: /
131
132User-agent: Scrapy
133Allow: /
134
135User-agent: SemrushBot-OCOB
136Allow: /
137
138User-agent: SemrushBot-SWA
139Allow: /
140
141User-agent: Sidetrade indexer bot
142Allow: /
143
144User-agent: Timpibot
145Allow: /
146
147User-agent: VelenPublicWebCrawler
148Allow: /
149
150User-agent: Webzio-Extended
151Allow: /
152
153User-agent: YouBot
154Allow: /
155
156
157# General rules for all other bots
158User-agent: *
159# Disallow: /
160# CSS, JS, Images
161Allow: /core/*.css$
162Allow: /core/*.css?
163Allow: /core/*.js$
164Allow: /core/*.js?
165Allow: /core/*.gif
166Allow: /core/*.jpg
167Allow: /core/*.jpeg
168Allow: /core/*.png
169Allow: /core/*.svg
170Allow: /profiles/*.css$
171Allow: /profiles/*.css?
172Allow: /profiles/*.js$
173Allow: /profiles/*.js?
174Allow: /profiles/*.gif
175Allow: /profiles/*.jpg
176Allow: /profiles/*.jpeg
177Allow: /profiles/*.png
178Allow: /profiles/*.svg
179# Directories
180Disallow: /core/
181Disallow: /profiles/
182# Files
183Disallow: /README.txt
184Disallow: /web.config
185# Paths (clean URLs)
186Disallow: /admin/
187Disallow: /comment/reply/
188Disallow: /filter/tips
189Disallow: /node/add/
190Disallow: /search/
191Disallow: /user/register/
192Disallow: /user/password/
193Disallow: /user/login/
194Disallow: /user/logout/
195# Paths (no clean URLs)
196Disallow: /index.php/admin/
197Disallow: /index.php/comment/reply/
198Disallow: /index.php/filter/tips
199Disallow: /index.php/node/add/
200Disallow: /index.php/search/
201Disallow: /index.php/user/password/
202Disallow: /index.php/user/register/
203Disallow: /index.php/user/login/
204Disallow: /index.php/user/logout/
205Disallow: /taxonomy/*
206
207Sitemap: https://www.biu.ac.il/sitemap.xml