Common Crawl / robots.txt snapshot
← back to commoncrawl.org · fetched 2026-06-20T14:56:27Z (4h ago) · HTTP 200 · 127 bytes · sha256 7e85cc070dd4aff5 · raw
final URL: https://commoncrawl.org/robots.txt
| 1 | # All robots are explicitly allowed! |
| 2 | |
| 3 | User-agent: * |
| 4 | Allow: / |
| 5 | Disallow: /search?* |
| 6 | |
| 7 | Sitemap: https://commoncrawl.org/sitemap.xml |