Trends
How the web's Robots Exclusion Protocol policy is evolving (especially towards AI crawlers) across every dataset I monitor.
For journalists & analysts shareable, citeable statistics — updated every crawl
“Across the web's 1,000 most-visited sites, 13.2% block an AI crawler (and 9.3% name OpenAI).” Elie Berreby
≥1 AI 13.2%OpenAI 9.3%GPTBot 9.1%Google-Ext 7.9%Top 1000 websites · n=910 · June 26, 2026
“Corporate America is still largely open: 5.1% of the US 500 websites block at least one AI crawler (only 2.4% name GPTBot).” Elie Berreby
≥1 AI 5.1%OpenAI 2.4%GPTBot 2.4%Google-Ext 2%US 500 · n=455 · June 26, 2026
“The tech-heavy Nasdaq-100 is split on AI scraping: 2.2% block OpenAI's crawlers; 1.1% opt out of Gemini training via Google-Extended.” Elie Berreby
≥1 AI 7.5%OpenAI 2.2%GPTBot 2.2%Google-Ext 1.1%Nasdaq 100 · n=93 · June 26, 2026
“Even AI's own builders sometimes pull up the ladder: 8.1% of AI labs block at least one crawler (only 5.4% block GPTBot).” Elie Berreby
≥1 AI 8.1%OpenAI 5.4%GPTBot 5.4%Google-Ext 8.1%AI labs · n=37 · June 26, 2026
“For all their talk of web threats, only 7.3% of cybersecurity vendors block at least one AI crawler (none name GPTBot).” Elie Berreby
≥1 AI 7.3%OpenAI —GPTBot —Google-Ext —Cybersecurity · n=55 · June 26, 2026
“The crawlers that index the web write their own robots.txt rules: 1.9% block at least one AI crawler; 1.9% block OpenAI.” Elie Berreby
≥1 AI 1.9%OpenAI 1.9%GPTBot 1.9%Google-Ext 1.9%SEO & AI search · n=52 · June 26, 2026
“Creative rights-holders are among the most defensive: 25.7% block an AI crawler, and each names GPTBot.” Elie Berreby
≥1 AI 25.7%OpenAI 25.7%GPTBot 25.7%Google-Ext 14.3%Creative rights-holders · n=35 · June 26, 2026
“The platforms whose content trains the models are pushing back: 41% of knowledge & UGC sites block an AI crawler (only 28.2% name OpenAI).” Elie Berreby
≥1 AI 41%OpenAI 28.2%GPTBot 28.2%Google-Ext 12.8%Knowledge & UGC · n=39 · June 26, 2026
“From the FTSE to the Nikkei, AI blocking is uneven: 5% of blue-chip index constituents block an AI crawler (only 3.3% name OpenAI).” Elie Berreby
≥1 AI 5%OpenAI 3.3%GPTBot 3.3%Google-Ext 2.5%National indices · n=121 · June 26, 2026
“Banks and fintechs guard customer data, but their websites are open to AI crawlers: only 2.2% block at least one AI crawler (none name OpenAI).” Elie Berreby
≥1 AI 2.2%OpenAI —GPTBot —Google-Ext —Banks & fintech · n=46 · June 26, 2026
“Retailers are slowly fencing their catalogs: 25% of major e-commerce sites block an AI crawler (and 20% name OpenAI).” Elie Berreby
≥1 AI 25%OpenAI 20%GPTBot 20%Google-Ext 10%E-commerce · n=20 · June 26, 2026
“European corporate giants remain mostly welcoming to AI crawlers (1.9% block OpenAI, 3.2% block at least one crawler).” Elie Berreby
≥1 AI 3.2%OpenAI 1.9%GPTBot 1.9%Google-Ext 1.3%European companies · n=154 · June 26, 2026
“The platforms sitting on the most user data are the most guarded: 52% of major social networks block an AI crawler (only 48% name GPTBot).” Elie Berreby
≥1 AI 52%OpenAI 48%GPTBot 48%Google-Ext 36%Social Networks · n=25 · June 26, 2026
“Government websites stay wide open: only 7.5% of national governments restrict at least one AI crawler.” Elie Berreby
≥1 AI 7.5%OpenAI 7.5%GPTBot 7.5%Google-Ext 7.5%Governments · n=53 · June 26, 2026
“Publishers protect their content: 54.7% block GPTBot, and 78.1% block at least one AI crawler.” Elie Berreby
≥1 AI 78.1%OpenAI 57.8%GPTBot 54.7%Google-Ext 56.2%Publishers · n=64 · June 26, 2026
AI bots rejection timeline weekly blocking % · US 500 default cohort
20 earlier weeks omitted (archive building) · 6 weekly snapshots with reliable coverage
Block ≥1 AI crawler by dataset · share of analysed sites
Block OpenAI crawlers by dataset · GPTBot / OAI-SearchBot
AI crawlers explicitly blocked across 1076 archived robots.txt files
Counts companies whose robots.txt contains a dedicated User-agent group
with Disallow: / for that bot. Wildcard-only blocks are excluded.
Changes per week
ISO week numbers; excludes initial archive events.
Most active companies
- Toronto Star 23 change(s)
- IQVIA Holdings 12 change(s)
- Amgen 10 change(s)
- Bank Central Asia 9 change(s)
- Pinterest 6 change(s)
- Adidas 6 change(s)
- KLA 5 change(s)
- Glencore 4 change(s)
- Eli Lilly 3 change(s)
- Kraken 3 change(s)
- F5 3 change(s)
- Cencora 2 change(s)
- Gilead Sciences 2 change(s)
- Martin Marietta Materials 2 change(s)
- Advance Auto Parts 2 change(s)
Recent AI-policy events
- 🤖 AI SoundCloud: Blocked Amazonbot (Amazon) entirely
- 🤖 AI BuzzFeed: Added Allow: /rss-feeds/feeds/bf-amazon-ai-grounding.extended.xml for Amazonbot
- 🤖 AI Figma Community: Added Allow: /de-de/resource-library/ki-coding-tools/$ for CHATGPT-User
- 🤖 AI El País: Blocked Claude-SearchBot (Anthropic) entirely
- 🤖 AI Figma Community: Added Allow: /blog/2026-ai-report/$ for CHATGPT-User
- 🤖 AI Ireland (Gov.ie): Blocked Amazonbot (Amazon) entirely
- 🤖 AI W.W. Grainger: Blocked Amazonbot (Amazon) entirely
- 🤖 AI Regeneron Pharmaceuticals: Added rules for AI crawler Amazonbot (Amazon)
- 🤖 AI Evonik: Added rules for AI crawler Amazonbot (Amazon)
- 🤖 AI Nikkei: Blocked AI2Bot (Allen Institute for AI) entirely
- 🤖 AI Khan Academy: Blocked GPTBot (OpenAI) entirely
- 🤖 AI American Financial Group: Blocked meta-externalagent (Meta) entirely