United Kingdom (GOV.UK) / robots.txt snapshot

← back to gov.uk · fetched 2026-06-20T01:10:30Z (19h ago) · HTTP 200 · 1098 bytes · sha256 2d091dfdc782adef · raw

final URL: https://www.gov.uk/robots.txt

1	User-agent: *
2	Disallow: /*/print$
3	# Don't allow indexing of site search
4	Disallow: /search/all*
5	Sitemap: https://www.gov.uk/sitemap.xml
6
7	# The Meta-ExternalAgent crawler crawls the web for use cases such as training foundation AI models.
8	# It results in timeouts from Vertex that back up requests from users making genuine searches
9	User-agent: meta-externalagent
10	Disallow: /search/all*
11
12	# https://ahrefs.com/robot/ crawls the site frequently
13	User-agent: AhrefsBot
14	Crawl-delay: 10
15
16	# https://www.deepcrawl.com/bot/ makes lots of requests. Ideally we'd slow it
17	# down rather than blocking it but it doesn't mention whether or not it
18	# supports crawl-delay.
19	User-agent: deepcrawl
20	Disallow: /
21
22	# Complaints of 429 'Too many requests' seem to be coming from SharePoint servers
23	# (https://social.msdn.microsoft.com/Forums/en-US/3ea268ed-58a6-4166-ab40-d3f4fc55fef4)
24	# The robot doesn't recognise its User-Agent string, see the MS support article:
25	# https://support.microsoft.com/en-us/help/3019711/the-sharepoint-server-crawler-ignores-directives-in-robots-txt
26	User-agent: MS Search 6.0 Robot
27	Disallow: /