NeuralCrawl

๐Ÿ‡จ๐Ÿ‡ฆ McGill University

mcgill.ca · Universities · rank #39 · University · live robots.txt ↗

AI crawler access (latest snapshot, 26 min ago)

blocked restricted allowed faded = inherited from the * wildcard group

GPTBot
ChatGPT-User
OAI-SearchBot
ClaudeBot
Claude-User
Claude-SearchBot
anthropic-ai
Claude-Web
CCBot
Google-Extended
Applebot-Extended
PerplexityBot
Perplexity-User
Bytespider
Amazonbot
FacebookBot
meta-externalagent
meta-externalfetcher
cohere-ai
AI2Bot
Diffbot
omgili
YouBot
DuckAssistBot
MistralAI-User
PanguBot
Timpibot

Current robots.txt 3386 bytes · sha256 34318cefd235 · raw

#
# robots.txt
#
# This file is to prevent the crawling and indexing of certain parts
# of your site by web crawlers and spiders run by sites like Yahoo!
# and Google. By telling these "robots" where not to go on your site,
# you save bandwidth and server resources.
#
# This file will be ignored unless it is at the root of your host:
# Used:    http://example.com/robots.txt
# Ignored: http://example.com/site/robots.txt
#
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/robotstxt.html

Sitemap: https://www.mcgill.ca/root/sitemap-index.xml
Sitemap: https://www.mcgill.ca/sitemap.xml

User-agent: Lucidworks-Anda/2.0
Crawl-delay: 0

User-agent: Elastic-Crawler
Crawl-delay: 0

User-agent: *
Crawl-delay: 5

User-agent: archive.org_bot
Allow: /study/*

# CSS, JS, Images
Allow: */misc/*.css$
Allow: */misc/*.css?
Allow: */misc/*.js$
Allow: */misc/*.js?
Allow: */misc/*.gif
Allow: */misc/*.jpg
Allow: */misc/*.jpeg
Allow: */misc/*.png
Allow: */modules/*.css$
Allow: */modules/*.css?
Allow: */modules/*.js$
Allow: */modules/*.js?
Allow: */modules/*.gif
Allow: */modules/*.jpg
Allow: */modules/*.jpeg
Allow: */modules/*.png
Allow: */profiles/*.css$
Allow: */profiles/*.css?
Allow: */profiles/*.js$
Allow: */profiles/*.js?
Allow: */profiles/*.gif
Allow: */profiles/*.jpg
Allow: */profiles/*.jpeg
Allow: */profiles/*.png
Allow: */themes/*.css$
Allow: */themes/*.css?
Allow: */themes/*.js$
Allow: */themes/*.js?
Allow: */themes/*.gif
Allow: */themes/*.jpg
Allow: */themes/*.jpeg
Allow: */themes/*.png
# Directories
Disallow: */includes/
Disallow: */modules/
Disallow: */profiles/
Disallow: */scripts/
Disallow: */themes/
# eCals
Disallow: /study/*
# Files
Disallow: */CHANGELOG.txt
Disallow: */cron.php
Disallow: */INSTALL.mysql.txt
Disallow: */INSTALL.pgsql.txt
Disallow: */INSTALL.sqlite.txt
Disallow: */install.php
Disallow: */INSTALL.txt
Disallow: */LICENSE.txt
Disallow: */MAINTAINERS.txt
Disallow: */update.php
Disallow: */UPGRADE.txt
Disallow: */xmlrpc.php
Disallow: */misc/favicon.ico
# Paths (clean URLs)
Disallow: */admin/
Disallow: */comment/reply/
Disallow: */filter/tips/
Disallow: */node/add/
Disallow: */search/
Disallow: /*/people/*
Disallow: /*/events/*
Disallow: /undergraduate-admissions/programs?*
Disallow: /gradapplicants/programs?*
Disallow: */user/register/
Disallow: */user/password/
Disallow: */user/login/
Disallow: */user/logout/
Disallow: */user
# Paths (no clean URLs)
Disallow: */?q=admin/
Disallow: */?q=comment/reply/
Disallow: */?q=filter/tips/
Disallow: */?q=node/add/
Disallow: */?q=search/
Disallow: /?q=study/*/courses/search
Disallow: /?q=study/*/programs/search
Disallow: /?q=study/*/search/all
Disallow: */?q=user/password/
Disallow: */?q=user/register/
Disallow: */?q=user/login/
Disallow: */?q=user/logout/
Disallow: /*.zip$
Disallow: /*.gif$
Disallow: /*.jpg$
Disallow: /*.jpeg$
Disallow: /*.png$
Disallow: /*.tif$
Disallow: /*.tiff$
Disallow: /*.dll$
Disallow: /*.exe$
Disallow: /*.class$
Disallow: /*.wmv$
Disallow: /*.m4v$
Disallow: /*.jar$
Disallow: /*.gz$
Disallow: /*.tar$
Disallow: /*.css$
Disallow: /*.inc$
Disallow: /*.js$
Disallow: /*.js.php$
Disallow: /*.swf$
Disallow: /*.fla$
Disallow: /*.psd$
Disallow: /*.m4a$
Disallow: /*.m4p$
Disallow: /*.aac$
Disallow: /*.m2a$
Disallow: /*.m2v$
Disallow: /*.sit$
Disallow: /*.dmg$
Disallow: /*.wma$
Disallow: /*.mdb$
Disallow: /*.tar.gz2$
Disallow: /*.rar$

Change history

  1. initial snapshot
    • First snapshot of robots.txt archived