NeuralCrawl

Cornell University / robots.txt snapshot

← back to cornell.edu · fetched 2026-06-26T14:15:22Z (4h ago) · HTTP 200 · 1018 bytes · sha256 cf32953b99203e16 · raw

final URL: https://www.cornell.edu/robots.txt

1User-agent: *
2Crawl-Delay: 6
3Disallow: /_dynamic_files/
4Disallow: /_tasks/
5Disallow: /test/
6Disallow: /tools/
7Disallow: /template/
8Disallow: /search/
9Disallow: /visit/plan/
10Disallow: /video/kaltura/
11Disallow: /video/tasks/
12Disallow: /server-health-check/
13
14
15# SiteImprove should ignore these page particularly because they aren't actually used, but are still linked for historical reasons
16User-agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0) SiteCheck-sitecrawl by Siteimprove.com
17Disallow: /cuinfo/specialconditions/
18Disallow: /_includes/header.cfm
19
20User-agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0) LinkCheck by Siteimprove.com
21Disallow: /cuinfo/specialconditions/
22Disallow: /_includes/header.cfm
23
24User-agent: HTML validator: Siteimprove_W3C_Validator/1.3
25Disallow: /cuinfo/specialconditions/
26Disallow: /_includes/header.cfm
27
28User-agent: CSS Validator: Jigsaw/2.3.0 W3C_CSS_Validator_JFouffa/2.0
29Disallow: /cuinfo/specialconditions/
30Disallow: /_includes/header.cfm