NeuralCrawl

πŸ‡ΊπŸ‡Έ California Institute of Technology

caltech.edu · Universities · rank #7 · University · live robots.txt ↗

AI crawler access (latest snapshot, 3h ago)

blocked restricted allowed faded = inherited from the * wildcard group

GPTBot
ChatGPT-User
OAI-SearchBot
ClaudeBot
Claude-User
Claude-SearchBot
anthropic-ai
Claude-Web
CCBot
Google-Extended
Applebot-Extended
PerplexityBot
Perplexity-User
Bytespider
Amazonbot
FacebookBot
meta-externalagent
meta-externalfetcher
cohere-ai
AI2Bot
Diffbot
omgili
YouBot
DuckAssistBot
MistralAI-User
PanguBot
Timpibot

Current robots.txt 266 bytes · sha256 3f591e8c02a0 · raw

User-agent: SemrushBot
Disallow: /

User-agent: BLP_bbot
Disallow: /

User-agent: *
Disallow: /campus-life-events/calendar/minicalendar/*
Disallow: /map/landmark_ajax/*
Disallow: /map/milestone/*
Crawl-delay: 10
Allow: *
Sitemap: https://www.caltech.edu/sitemap.xml

Change history

  1. initial snapshot
    • First snapshot of robots.txt archived