NeuralCrawl

AI & search crawlers

Every crawler NeuralCrawl tracks in robots.txt — vendor, purpose, and how many monitored sites block it by name.

AI crawlers 27 bots

Bot Vendor Purpose Sites blocking
CCBot Common Crawl Open web corpus (used for LLM training) 93 (8.7%)
GPTBot OpenAI LLM training crawler 89 (8.3%)
ClaudeBot Anthropic LLM training crawler 87 (8.1%)
Bytespider ByteDance LLM training crawler 84 (7.8%)
Google-Extended Google Gemini training opt-out token 75 (7.0%)
meta-externalagent Meta Meta AI training crawler 72 (6.7%)
Applebot-Extended Apple Apple Intelligence training opt-out 69 (6.4%)
Amazonbot Amazon Alexa / LLM crawler 63 (5.9%)
anthropic-ai Anthropic Legacy crawler token 60 (5.6%)
Diffbot Diffbot Structured data extraction 60 (5.6%)
omgili Webz.io Data feeds resold for AI training 56 (5.2%)
PerplexityBot Perplexity Answer-engine indexing 55 (5.1%)
cohere-ai Cohere LLM training crawler 51 (4.7%)
FacebookBot Meta Meta AI crawler (legacy) 50 (4.7%)
Claude-Web Anthropic Legacy crawler token 45 (4.2%)
ChatGPT-User OpenAI User-triggered browsing 42 (3.9%)
Timpibot Timpi Decentralised index crawler 37 (3.4%)
Claude-User Anthropic User-triggered browsing 32 (3.0%)
Claude-SearchBot Anthropic Search indexing 30 (2.8%)
Perplexity-User Perplexity User-triggered browsing 29 (2.7%)
YouBot You.com Answer-engine indexing 29 (2.7%)
DuckAssistBot DuckDuckGo DuckAssist answers 28 (2.6%)
meta-externalfetcher Meta User-triggered fetcher 28 (2.6%)
AI2Bot Allen Institute for AI Research crawler 27 (2.5%)
OAI-SearchBot OpenAI Search indexing 26 (2.4%)
MistralAI-User Mistral AI User-triggered browsing 22 (2.0%)
PanguBot Huawei LLM training crawler 22 (2.0%)

Search-engine crawlers 12 bots

Bot Vendor Purpose Sites blocking
PetalBot Huawei Petal Search index 46 (4.3%)
Baiduspider Baidu Baidu Search index 25 (2.3%)
SeznamBot Seznam Seznam Search index (Czechia) 7 (0.7%)
Applebot Apple Siri & Spotlight Search index 5 (0.5%)
YandexBot Yandex Yandex Search index 5 (0.5%)
Slurp Yahoo Yahoo Search index 4 (0.4%)
Sogou Sogou Sogou Search index (China) 3 (0.3%)
Googlebot-News Google Google News index 1 (0.1%)
Bingbot Microsoft Bing Search index 0 (0.0%)
DuckDuckBot DuckDuckGo DuckDuckGo Search index 0 (0.0%)
Googlebot Google Google Search index 0 (0.0%)
Googlebot-Image Google Google Images index 0 (0.0%)

See also The Wall — a site-by-site matrix of blocking status across the most newsworthy crawlers.