NeuralCrawl

Al Jazeera / robots.txt snapshot

← back to aljazeera.com · fetched 2026-06-20T01:10:30Z (18h ago) · HTTP 200 · 1629 bytes · sha256 ea665a97f2241b4e · raw

final URL: https://www.aljazeera.com/robots.txt

1# Al Jazeera Media Network content is made available for your personal, non-commercial
2# use subject to our Terms and Conditions:
3# https://www.aljazeera.com/terms-and-conditions/
4# Any other uses are not permitted, including but not limited to:
5# (1) the development of any software, machine learning, artificial intelligence (AI),
6# and/or large language models (LLMs);
7# (2) text and data mining activities;
8# (3) creating or providing archived or cached data sets containing our content to others; and/or
9# (4) any commercial purposes.
10# Use of any device, tool, or process designed to data mine or scrape the content
11# using automated means is prohibited without prior written permission from
12# Al Jazeera Media Network. Contact https://network.aljazeera.net/en/contact for assistance.
13
14User-agent: *
15Disallow: /api
16Disallow: /asset-manifest.json
17Allow: /search/$
18Disallow: /search/
19Disallow: /home/search?q=
20Disallow: /*/liveblog/2026/6/*?*update=*
21
22# Allow Rules
23
24User-agent: AmazonAdBot
25Allow: /
26
27# Disallow Rules
28
29User-agent: anthropic-ai
30Disallow: /
31
32User-agent: ChatGPT-User
33Disallow: /
34
35User-agent: ClaudeBot
36Disallow: /
37
38User-agent: Claude-Web
39Disallow: /
40
41User-agent: cohere-ai
42Disallow: /
43
44User-agent: GPTBot
45Disallow: /
46
47User-agent: PerplexityBot
48Disallow: /
49
50User-agent: Bytespider
51Disallow: /
52
53
54# Sitemaps
55
56Sitemap: https://www.aljazeera.com/news-sitemap.xml
57Sitemap: https://www.aljazeera.com/sitemaps/article-archive.xml
58Sitemap: https://www.aljazeera.com/sitemaps/article-new.xml
59Sitemap: https://www.aljazeera.com/sitemaps/video-archive.xml
60Sitemap: https://www.aljazeera.com/sitemaps/video-new.xml