Common Crawl / robots.txt snapshot

← back to commoncrawl.org · fetched 2026-06-20T14:56:27Z (4h ago) · HTTP 200 · 127 bytes · sha256 7e85cc070dd4aff5 · raw

final URL: https://commoncrawl.org/robots.txt

1	# All robots are explicitly allowed!
2
3	User-agent: *
4	Allow: /
5	Disallow: /search?*
6
7	Sitemap: https://commoncrawl.org/sitemap.xml