NeuralCrawl

Trends

How the web's Robots Exclusion Protocol policy is evolving (especially towards AI crawlers) across every dataset I monitor.

For journalists & analysts shareable, citeable statistics — updated every crawl

9.3% OpenAI

“Across the web's 1,000 most-visited sites, 13.2% block an AI crawler (and 9.3% name OpenAI).” Elie Berreby

≥1 AI 13.2%
OpenAI 9.3%
GPTBot 9.1%
Google-Ext 7.9%
Top 1000 websites · n=910 · June 26, 2026
2.4% OpenAI

“Corporate America is still largely open: 5.1% of the US 500 websites block at least one AI crawler (only 2.4% name GPTBot).” Elie Berreby

≥1 AI 5.1%
OpenAI 2.4%
GPTBot 2.4%
Google-Ext 2%
US 500 · n=455 · June 26, 2026
2.2% OpenAI

“The tech-heavy Nasdaq-100 is split on AI scraping: 2.2% block OpenAI's crawlers; 1.1% opt out of Gemini training via Google-Extended.” Elie Berreby

≥1 AI 7.5%
OpenAI 2.2%
GPTBot 2.2%
Google-Ext 1.1%
Nasdaq 100 · n=93 · June 26, 2026
5.4% OpenAI

“Even AI's own builders sometimes pull up the ladder: 8.1% of AI labs block at least one crawler (only 5.4% block GPTBot).” Elie Berreby

≥1 AI 8.1%
OpenAI 5.4%
GPTBot 5.4%
Google-Ext 8.1%
AI labs · n=37 · June 26, 2026
0% OpenAI

“For all their talk of web threats, only 7.3% of cybersecurity vendors block at least one AI crawler (none name GPTBot).” Elie Berreby

≥1 AI 7.3%
OpenAI
GPTBot
Google-Ext
Cybersecurity · n=55 · June 26, 2026
1.9% OpenAI

“The crawlers that index the web write their own robots.txt rules: 1.9% block at least one AI crawler; 1.9% block OpenAI.” Elie Berreby

≥1 AI 1.9%
OpenAI 1.9%
GPTBot 1.9%
Google-Ext 1.9%
SEO & AI search · n=52 · June 26, 2026
25.7% OpenAI

“Creative rights-holders are among the most defensive: 25.7% block an AI crawler, and each names GPTBot.” Elie Berreby

≥1 AI 25.7%
OpenAI 25.7%
GPTBot 25.7%
Google-Ext 14.3%
Creative rights-holders · n=35 · June 26, 2026
28.2% OpenAI

“The platforms whose content trains the models are pushing back: 41% of knowledge & UGC sites block an AI crawler (only 28.2% name OpenAI).” Elie Berreby

≥1 AI 41%
OpenAI 28.2%
GPTBot 28.2%
Google-Ext 12.8%
Knowledge & UGC · n=39 · June 26, 2026
3.3% OpenAI

“From the FTSE to the Nikkei, AI blocking is uneven: 5% of blue-chip index constituents block an AI crawler (only 3.3% name OpenAI).” Elie Berreby

≥1 AI 5%
OpenAI 3.3%
GPTBot 3.3%
Google-Ext 2.5%
National indices · n=121 · June 26, 2026
0% OpenAI

“Banks and fintechs guard customer data, but their websites are open to AI crawlers: only 2.2% block at least one AI crawler (none name OpenAI).” Elie Berreby

≥1 AI 2.2%
OpenAI
GPTBot
Google-Ext
Banks & fintech · n=46 · June 26, 2026
20% OpenAI

“Retailers are slowly fencing their catalogs: 25% of major e-commerce sites block an AI crawler (and 20% name OpenAI).” Elie Berreby

≥1 AI 25%
OpenAI 20%
GPTBot 20%
Google-Ext 10%
E-commerce · n=20 · June 26, 2026
1.9% OpenAI

“European corporate giants remain mostly welcoming to AI crawlers (1.9% block OpenAI, 3.2% block at least one crawler).” Elie Berreby

≥1 AI 3.2%
OpenAI 1.9%
GPTBot 1.9%
Google-Ext 1.3%
European companies · n=154 · June 26, 2026
48% OpenAI

“The platforms sitting on the most user data are the most guarded: 52% of major social networks block an AI crawler (only 48% name GPTBot).” Elie Berreby

≥1 AI 52%
OpenAI 48%
GPTBot 48%
Google-Ext 36%
Social Networks · n=25 · June 26, 2026
7.5% OpenAI

“Government websites stay wide open: only 7.5% of national governments restrict at least one AI crawler.” Elie Berreby

≥1 AI 7.5%
OpenAI 7.5%
GPTBot 7.5%
Google-Ext 7.5%
Governments · n=53 · June 26, 2026
57.8% OpenAI

“Publishers protect their content: 54.7% block GPTBot, and 78.1% block at least one AI crawler.” Elie Berreby

≥1 AI 78.1%
OpenAI 57.8%
GPTBot 54.7%
Google-Ext 56.2%
Publishers · n=64 · June 26, 2026

AI bots rejection timeline weekly blocking % · US 500 default cohort

Block ≥1 AI crawler Block GPTBot
2026-06-20 2026-06-25

20 earlier weeks omitted (archive building) · 6 weekly snapshots with reliable coverage

Block ≥1 AI crawler by dataset · share of analysed sites

Publishers 78.1% (50/64)
Social Networks 52.0% (13/25)
Knowledge & UGC 41.0% (16/39)
E-commerce 25.0% (5/20)
Top 1000 websites 13.2% (120/910)
AI labs 8.1% (3/37)
Nasdaq 100 7.5% (7/93)
Governments 7.5% (4/53)
Cybersecurity 7.3% (4/55)
US 500 5.1% (23/455)
National indices 5.0% (6/121)
European companies 3.2% (5/154)
Banks & fintech 2.2% (1/46)
SEO & AI search 1.9% (1/52)

Block OpenAI crawlers by dataset · GPTBot / OAI-SearchBot

Publishers 57.8% (37/64)
Social Networks 48.0% (12/25)
Knowledge & UGC 28.2% (11/39)
E-commerce 20.0% (4/20)
Top 1000 websites 9.3% (85/910)
Governments 7.5% (4/53)
AI labs 5.4% (2/37)
National indices 3.3% (4/121)
US 500 2.4% (11/455)
Nasdaq 100 2.2% (2/93)
SEO & AI search 1.9% (1/52)
European companies 1.9% (3/154)
Cybersecurity 0.0% (0/55)
Banks & fintech 0.0% (0/46)

AI crawlers explicitly blocked across 1076 archived robots.txt files

CCBot Common Crawl 93 (8.6%)
GPTBot OpenAI 90 (8.4%)
ClaudeBot Anthropic 88 (8.2%)
Bytespider ByteDance 85 (7.9%)
Google-Extended Google 76 (7.1%)
meta-externalagent Meta 73 (6.8%)
Applebot-Extended Apple 71 (6.6%)
Amazonbot Amazon 65 (6.0%)
anthropic-ai Anthropic 60 (5.6%)
Diffbot Diffbot 60 (5.6%)
PerplexityBot Perplexity 57 (5.3%)
omgili Webz.io 55 (5.1%)
cohere-ai Cohere 52 (4.8%)
FacebookBot Meta 50 (4.6%)
Claude-Web Anthropic 46 (4.3%)
ChatGPT-User OpenAI 43 (4.0%)
Timpibot Timpi 37 (3.4%)
Claude-User Anthropic 32 (3.0%)
Claude-SearchBot Anthropic 31 (2.9%)
Perplexity-User Perplexity 30 (2.8%)
YouBot You.com 29 (2.7%)
meta-externalfetcher Meta 28 (2.6%)
DuckAssistBot DuckDuckGo 28 (2.6%)
OAI-SearchBot OpenAI 27 (2.5%)
AI2Bot Allen Institute for AI 27 (2.5%)
MistralAI-User Mistral AI 22 (2.0%)
PanguBot Huawei 22 (2.0%)

Counts companies whose robots.txt contains a dedicated User-agent group with Disallow: / for that bot. Wildcard-only blocks are excluded.

Changes per week

24
25

ISO week numbers; excludes initial archive events.

Most active companies

  1. Toronto Star 23 change(s)
  2. IQVIA Holdings 12 change(s)
  3. Amgen 10 change(s)
  4. Bank Central Asia 9 change(s)
  5. Pinterest 6 change(s)
  6. Adidas 6 change(s)
  7. KLA 5 change(s)
  8. Glencore 4 change(s)
  9. Eli Lilly 3 change(s)
  10. Kraken 3 change(s)
  11. F5 3 change(s)
  12. Cencora 2 change(s)
  13. Gilead Sciences 2 change(s)
  14. Martin Marietta Materials 2 change(s)
  15. Advance Auto Parts 2 change(s)

Recent AI-policy events

  • 🤖 AI SoundCloud: Blocked Amazonbot (Amazon) entirely
  • 🤖 AI BuzzFeed: Added Allow: /rss-feeds/feeds/bf-amazon-ai-grounding.extended.xml for Amazonbot
  • 🤖 AI Figma Community: Added Allow: /de-de/resource-library/ki-coding-tools/$ for CHATGPT-User
  • 🤖 AI El País: Blocked Claude-SearchBot (Anthropic) entirely
  • 🤖 AI Figma Community: Added Allow: /blog/2026-ai-report/$ for CHATGPT-User
  • 🤖 AI Ireland (Gov.ie): Blocked Amazonbot (Amazon) entirely
  • 🤖 AI W.W. Grainger: Blocked Amazonbot (Amazon) entirely
  • 🤖 AI Regeneron Pharmaceuticals: Added rules for AI crawler Amazonbot (Amazon)
  • 🤖 AI Evonik: Added rules for AI crawler Amazonbot (Amazon)
  • 🤖 AI Nikkei: Blocked AI2Bot (Allen Institute for AI) entirely
  • 🤖 AI Khan Academy: Blocked GPTBot (OpenAI) entirely
  • 🤖 AI American Financial Group: Blocked meta-externalagent (Meta) entirely