NeuralCrawl

Sky News / robots.txt snapshot

← back to news.sky.com · fetched 2026-06-20T01:10:30Z (18h ago) · HTTP 200 · 1210 bytes · sha256 bde47c8d5ff5922e · raw

final URL: https://news.sky.com/robots.txt

1Sitemap: https://news.sky.com/sitemap.xml
2
3User-Agent: *
4Disallow: /preview/*
5
6# Disallow AI Model Training Crawlers
7User-agent: AI2Bot
8User-agent: AmazonBot
9User-agent: anthropic-ai
10User-agent: Applebot-Extended
11User-agent: AwarioRssBot
12User-agent: AwarioSmartBot
13User-agent: Bytespider
14User-agent: CCBot
15User-agent: ClaudeBot
16User-agent: cohere-ai
17User-agent: Diffbot
18User-agent: FacebookBot
19User-agent: Google-Extended
20User-agent: GPTBot
21User-agent: magpie-crawler
22User-agent: Meta-ExternalAgent
23User-agent: omgili
24User-agent: omgilibot
25User-agent: PanguBot
26User-agent: PerplexityBot
27User-agent: Scrapy
28User-agent: TurnitinBot
29User-agent: Webzio-Extended
30Disallow: /
31Allow: /info/policies-and-standards
32Allow: /info/library-sales
33
34# Allow user initiated AI actions / searches
35User-agent: ChatGPT-User
36User-agent: Claude-Web
37User-agent: Claude-User
38User-agent: Claude-SearchBot
39User-agent: MistralAI-User
40User-agent: OAI-SearchBot
41User-agent: Perplexity-User
42Disallow: /preview/*
43
44# Disallow news aggregators except on RSS.
45User-agent: NewsNow
46User-agent: news-please
47Disallow: /
48Allow: /info/policies-and-standards
49Allow: /info/library-sales
50Allow: /info/rss
51
52User-agent: DataForSeoBot
53Disallow: /preview/*