NeuralCrawl

United Kingdom (GOV.UK) / robots.txt snapshot

← back to gov.uk · fetched 2026-06-20T01:10:30Z (19h ago) · HTTP 200 · 1098 bytes · sha256 2d091dfdc782adef · raw

final URL: https://www.gov.uk/robots.txt

1User-agent: *
2Disallow: /*/print$
3# Don't allow indexing of site search
4Disallow: /search/all*
5Sitemap: https://www.gov.uk/sitemap.xml
6
7# The Meta-ExternalAgent crawler crawls the web for use cases such as training foundation AI models.
8# It results in timeouts from Vertex that back up requests from users making genuine searches
9User-agent: meta-externalagent
10Disallow: /search/all*
11
12# https://ahrefs.com/robot/ crawls the site frequently
13User-agent: AhrefsBot
14Crawl-delay: 10
15
16# https://www.deepcrawl.com/bot/ makes lots of requests. Ideally we'd slow it
17# down rather than blocking it but it doesn't mention whether or not it
18# supports crawl-delay.
19User-agent: deepcrawl
20Disallow: /
21
22# Complaints of 429 'Too many requests' seem to be coming from SharePoint servers
23# (https://social.msdn.microsoft.com/Forums/en-US/3ea268ed-58a6-4166-ab40-d3f4fc55fef4)
24# The robot doesn't recognise its User-Agent string, see the MS support article:
25# https://support.microsoft.com/en-us/help/3019711/the-sharepoint-server-crawler-ignores-directives-in-robots-txt
26User-agent: MS Search 6.0 Robot
27Disallow: /