NeuralCrawl

πŸ‡ΊπŸ‡Έ Yahoo Search

yahoo.com · SEO & AI search · rank #27 · Search engine · live robots.txt ↗

AI crawler access (latest snapshot, 3h ago)

blocked restricted allowed faded = inherited from the * wildcard group

GPTBot
ChatGPT-User
OAI-SearchBot
ClaudeBot
Claude-User
Claude-SearchBot
anthropic-ai
Claude-Web
CCBot
Google-Extended
Applebot-Extended
PerplexityBot
Perplexity-User
Bytespider
Amazonbot
FacebookBot
meta-externalagent
meta-externalfetcher
cohere-ai
AI2Bot
Diffbot
omgili
YouBot
DuckAssistBot
MistralAI-User
PanguBot
Timpibot

Current robots.txt 1624 bytes · sha256 021adb45d8aa · raw

User-agent: *
Disallow: /info/p.gif
Disallow: /p/
Disallow: /r/
Disallow: /bin/
Disallow: /caas/
Disallow: /blank.html
Disallow: /includes/
Disallow: /_td_api
Disallow: /tdv2_fp
Disallow: /nel_ms
Disallow: /fp_ms
Disallow: /sports_fp_ms
Disallow: /search_ms
Disallow: /_tdpp_api
Disallow: /_remote
Disallow: /_multiremote
Disallow: /_tdhl_api
Disallow: /digest
Disallow: /fpjs
Disallow: /myjs
Disallow: /news/m/

User-agent: ADmantX
User-agent: AlphaBot
User-agent: anthropic-ai
User-agent: AwarioRssBot
User-agent: AwarioSmartBot
User-agent: BLEXBot
User-agent: Buzzbot
User-agent: Bytespider
User-agent: CCBot
User-agent: ChatGPT-User
User-agent: claritybot
User-agent: Claude-Web
User-agent: ClaudeBot
User-agent: cohere-ai
User-agent: Diffbot
User-agent: FacebookBot
User-agent: FriendlyCrawler
User-agent: Google-Extended
User-agent: GPTBot
User-agent: huggingface
User-agent: ImagesiftBot
User-agent: img2dataset
User-agent: magpie-crawler
User-agent: Meltwater
User-agent: Neevabot
User-agent: news-please
User-agent: NewsNow
User-agent: Nutch
User-agent: omgili
User-agent: omgilibot
User-agent: panscient.com
User-agent: Perplexity-ai
User-agent: PerplexityBot
User-agent: PetalBot
User-agent: PiplBot
User-agent: scoop.it
User-agent: Scrapy
User-agent: Seekr
User-agent: SentiBot
User-agent: SeznamBot
User-agent: TurnitinBot
User-agent: YouBot
User-agent: ZumBot
Disallow: /

User-agent: Claude-SearchBot
User-agent: OAI-SearchBot
Disallow: */articles/

Sitemap: https://www.yahoo.com/news/weather/sitemap.xml
Sitemap: https://www.yahoo.com/news-sitemap-index.xml
Sitemap: https://www.yahoo.com/sitemap-index.xml

Change history

  1. initial snapshot
    • First snapshot of robots.txt archived