NeuralCrawl

๐Ÿ‡ง๐Ÿ‡ท Globo

globo.com · Top 1000 websites · rank #26 · Web · live robots.txt ↗

AI crawler access (latest snapshot, 1h ago)

blocked restricted allowed faded = inherited from the * wildcard group

GPTBot
ChatGPT-User
OAI-SearchBot
ClaudeBot
Claude-User
Claude-SearchBot
anthropic-ai
Claude-Web
CCBot
Google-Extended
Applebot-Extended
PerplexityBot
Perplexity-User
Bytespider
Amazonbot
FacebookBot
meta-externalagent
meta-externalfetcher
cohere-ai
AI2Bot
Diffbot
omgili
YouBot
DuckAssistBot
MistralAI-User
PanguBot
Timpibot

Current robots.txt 408 bytes · sha256 51949b387de7 · raw

#
# robots.txt
#

User-Agent: *
Disallow: /busca/
Disallow: /beta/
Disallow: /historico-home/
Disallow: *globo-cdn-src/*
Disallow: /alt-a/
Disallow: /alt-b/
Disallow: /alt-c/
Disallow: /alt-d/
Disallow: /recomendado/
Disallow: /explore/
Sitemap: http://www.globo.com/sitemap-image.xml


###### 

User-agent: CCBot
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

######

Change history

  1. initial snapshot
    • First snapshot of robots.txt archived