πΊπΈ Cornell University
cornell.edu · Universities · rank #13 · University · live robots.txt ↗
AI crawler access (latest snapshot, 3h ago)
⛔blocked
restricted
✅allowed
faded = inherited from the * wildcard group
✅GPTBot
✅ChatGPT-User
✅OAI-SearchBot
✅ClaudeBot
✅Claude-User
✅Claude-SearchBot
✅anthropic-ai
✅Claude-Web
✅CCBot
✅Google-Extended
✅Applebot-Extended
✅PerplexityBot
✅Perplexity-User
✅Bytespider
✅Amazonbot
✅FacebookBot
✅meta-externalagent
✅meta-externalfetcher
✅cohere-ai
✅AI2Bot
✅Diffbot
✅omgili
✅YouBot
✅DuckAssistBot
✅MistralAI-User
✅PanguBot
✅Timpibot
Current robots.txt 1018 bytes · sha256 cf32953b9920 · raw
User-agent: * Crawl-Delay: 6 Disallow: /_dynamic_files/ Disallow: /_tasks/ Disallow: /test/ Disallow: /tools/ Disallow: /template/ Disallow: /search/ Disallow: /visit/plan/ Disallow: /video/kaltura/ Disallow: /video/tasks/ Disallow: /server-health-check/ # SiteImprove should ignore these page particularly because they aren't actually used, but are still linked for historical reasons User-agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0) SiteCheck-sitecrawl by Siteimprove.com Disallow: /cuinfo/specialconditions/ Disallow: /_includes/header.cfm User-agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0) LinkCheck by Siteimprove.com Disallow: /cuinfo/specialconditions/ Disallow: /_includes/header.cfm User-agent: HTML validator: Siteimprove_W3C_Validator/1.3 Disallow: /cuinfo/specialconditions/ Disallow: /_includes/header.cfm User-agent: CSS Validator: Jigsaw/2.3.0 W3C_CSS_Validator_JFouffa/2.0 Disallow: /cuinfo/specialconditions/ Disallow: /_includes/header.cfm
Change history
-
initial snapshot
- First snapshot of robots.txt archived