NeuralCrawl

πŸ‡ΊπŸ‡Έ Read the Docs

readthedocs.org · Top 1000 websites · rank #994 · Programming and Developer Software · live robots.txt ↗

AI crawler access (latest snapshot, 13h ago)

blocked restricted allowed faded = inherited from the * wildcard group

GPTBot
ChatGPT-User
OAI-SearchBot
ClaudeBot
Claude-User
Claude-SearchBot
anthropic-ai
Claude-Web
CCBot
Google-Extended
Applebot-Extended
PerplexityBot
Perplexity-User
Bytespider
Amazonbot
FacebookBot
meta-externalagent
meta-externalfetcher
cohere-ai
AI2Bot
Diffbot
omgili
YouBot
DuckAssistBot
MistralAI-User
PanguBot
Timpibot

Current robots.txt 641 bytes · sha256 e487fb28ff96 · raw

# Guidelines for automated access and crawling Read the Docs: https://docs.readthedocs.com/platform/stable/automated-access.html
#
# Bots should not be scraping from the Read the Docs maintainer dashboard in bulk.
# Automated access to documentation sites is allowed in accordance with the policies above.
# For everything else, use our API: https://docs.readthedocs.com/platform/stable/api/index.html

User-agent: *
Disallow: /accounts/
Disallow: /api/
Disallow: /profiles/
Disallow: /projects/*/builds/
Disallow: /projects/tags/
Disallow: /search/

# This was hitting our site and causing a lot of issues
User-agent: AhrefsBot
Disallow: /

Change history

  1. initial snapshot
    • First snapshot of robots.txt archived