United Kingdom (GOV.UK) / robots.txt snapshot
← back to gov.uk · fetched 2026-06-20T01:10:30Z (19h ago) · HTTP 200 · 1098 bytes · sha256 2d091dfdc782adef · raw
final URL: https://www.gov.uk/robots.txt
| 1 | User-agent: * |
| 2 | Disallow: /*/print$ |
| 3 | # Don't allow indexing of site search |
| 4 | Disallow: /search/all* |
| 5 | Sitemap: https://www.gov.uk/sitemap.xml |
| 6 | |
| 7 | # The Meta-ExternalAgent crawler crawls the web for use cases such as training foundation AI models. |
| 8 | # It results in timeouts from Vertex that back up requests from users making genuine searches |
| 9 | User-agent: meta-externalagent |
| 10 | Disallow: /search/all* |
| 11 | |
| 12 | # https://ahrefs.com/robot/ crawls the site frequently |
| 13 | User-agent: AhrefsBot |
| 14 | Crawl-delay: 10 |
| 15 | |
| 16 | # https://www.deepcrawl.com/bot/ makes lots of requests. Ideally we'd slow it |
| 17 | # down rather than blocking it but it doesn't mention whether or not it |
| 18 | # supports crawl-delay. |
| 19 | User-agent: deepcrawl |
| 20 | Disallow: / |
| 21 | |
| 22 | # Complaints of 429 'Too many requests' seem to be coming from SharePoint servers |
| 23 | # (https://social.msdn.microsoft.com/Forums/en-US/3ea268ed-58a6-4166-ab40-d3f4fc55fef4) |
| 24 | # The robot doesn't recognise its User-Agent string, see the MS support article: |
| 25 | # https://support.microsoft.com/en-us/help/3019711/the-sharepoint-server-crawler-ignores-directives-in-robots-txt |
| 26 | User-agent: MS Search 6.0 Robot |
| 27 | Disallow: / |