The Guardian / robots.txt snapshot
← back to theguardian.com · fetched 2026-06-20T01:10:30Z (18h ago) · HTTP 200 · 2844 bytes · sha256 86ff4f30943f492a · raw
final URL: https://www.theguardian.com/robots.txt
| 1 | # This is the robots.txt file for theguardian.com |
| 2 | |
| 3 | # Guardian content is made available under our terms and conditions of use. |
| 4 | # Any other uses are not permitted, incl. but not limited to: for large language |
| 5 | # models (LLMs), machine learning and/or artificial intelligence-related |
| 6 | # purposes; with any of the aforementioned technologies; and/or for any |
| 7 | # commercial purposes. Contact [email protected] for assistance |
| 8 | |
| 9 | User-agent: * |
| 10 | Disallow: /sendarticle/ |
| 11 | Disallow: /Users/ |
| 12 | Disallow: /users/ |
| 13 | Disallow: /*/print$ |
| 14 | Disallow: /email/ |
| 15 | Disallow: /contactus/ |
| 16 | Disallow: /share/ |
| 17 | Disallow: /websearch |
| 18 | Disallow: /*?commentpage= |
| 19 | Disallow: /whsmiths/ |
| 20 | Disallow: /external/overture/ |
| 21 | Disallow: /discussion/report-abuse/* |
| 22 | Disallow: /discussion/report-abuse-ajax/* |
| 23 | Disallow: /discussion/comment-permalink/* |
| 24 | Disallow: /discussion/report-abuse/* |
| 25 | Disallow: /discussion/user-report-abuse/* |
| 26 | Disallow: /discussion/handlers/* |
| 27 | Disallow: /discussion/your-profile |
| 28 | Disallow: /discussion/your-comments |
| 29 | Disallow: /discussion/edit-profile |
| 30 | Disallow: /discussion/search/comments |
| 31 | Disallow: /discussion/* |
| 32 | Disallow: /search |
| 33 | Disallow: /music/artist/* |
| 34 | Disallow: /music/album/* |
| 35 | Disallow: /books/data/* |
| 36 | Disallow: /settings/ |
| 37 | Disallow: /embed/ |
| 38 | Disallow: /*styles/js-on.css$ |
| 39 | Disallow: /sport/olympics/2008/events/* |
| 40 | Disallow: /sport/olympics/2008/medals/* |
| 41 | Disallow: /f/healthcheck |
| 42 | Disallow: /sections |
| 43 | Disallow: /top-stories |
| 44 | Disallow: /most-read/sport |
| 45 | Disallow: /articles |
| 46 | Disallow: /global$ |
| 47 | Disallow: /*/feedarticle/* |
| 48 | Disallow: /travel/2013/aug/22/been-there-readers-competition?* |
| 49 | Disallow: /preference/* |
| 50 | Disallow: /59666047/ |
| 51 | Disallow: /print/ |
| 52 | Disallow: /info/tech-feedback |
| 53 | Disallow: /production-monitoring/ |
| 54 | Disallow: *.emailjson |
| 55 | Disallow: *.emailtxt |
| 56 | Disallow: /headline.txt |
| 57 | Disallow: *?*dcr=apps* |
| 58 | |
| 59 | User-agent: Mediapartners-Google |
| 60 | Disallow: |
| 61 | |
| 62 | Sitemap: http://www.theguardian.com/sitemaps/news.xml |
| 63 | Sitemap: http://www.theguardian.com/sitemaps/video.xml |
| 64 | |
| 65 | User-agent: NewsNow |
| 66 | User-agent: CCBot |
| 67 | User-agent: TurnitinBot |
| 68 | User-agent: PetalBot |
| 69 | User-agent: MoodleBot |
| 70 | User-agent: FacebookBot |
| 71 | User-agent: Bytespider |
| 72 | User-agent: Mojeek |
| 73 | User-agent: JenkersBot |
| 74 | User-agent: Seekr |
| 75 | User-agent: YouBot |
| 76 | User-agent: Arquivo-web-crawler |
| 77 | User-agent: coccocbot-web |
| 78 | User-agent: SeznamBot |
| 79 | User-Agent: PerplexityBot |
| 80 | User-Agent: yacy |
| 81 | User-agent: anthropic-ai |
| 82 | User-agent: ClaudeBot |
| 83 | User-agent: Claude-SearchBot |
| 84 | User-agent: Claude-User |
| 85 | User-agent: AwarioRssBot |
| 86 | User-agent: AwarioSmartBot |
| 87 | User-agent: SentiOne |
| 88 | User-agent: ImageSift |
| 89 | User-agent: Applebot-Extended |
| 90 | User-agent: YandexAdditional |
| 91 | User-agent: YandexAdditionalBot |
| 92 | User-agent: scalepostAI |
| 93 | User-agent: Buck |
| 94 | User-agent: meta-externalagent |
| 95 | User-agent: Amazonbot |
| 96 | User-agent: DuckAssistBot |
| 97 | User-agent: Google-CloudVertexBot |
| 98 | User-agent: Amzn-SearchBot |
| 99 | User-agent: AhrefsBot |
| 100 | User-agent: AhrefsSiteAudit |
| 101 | Disallow: / |
| 102 | |
| 103 | License: https://theguardian.com/license.xml |