NeuralCrawl

πŸ‡ΏπŸ‡¦ University of Cape Town

uct.ac.za · Universities · rank #44 · University · live robots.txt ↗

AI crawler access (latest snapshot, 3h ago)

blocked restricted allowed faded = inherited from the * wildcard group

GPTBot
ChatGPT-User
OAI-SearchBot
ClaudeBot
Claude-User
Claude-SearchBot
anthropic-ai
Claude-Web
CCBot
Google-Extended
Applebot-Extended
PerplexityBot
Perplexity-User
Bytespider
Amazonbot
FacebookBot
meta-externalagent
meta-externalfetcher
cohere-ai
AI2Bot
Diffbot
omgili
YouBot
DuckAssistBot
MistralAI-User
PanguBot
Timpibot

Current robots.txt 3792 bytes · sha256 c2a7d0a8154e · raw

# robots.txt

User-agent: *
# CSS, JS, Images
Allow: /core/*.css$
Allow: /core/*.css?
Allow: /core/*.js$
Allow: /core/*.js?
Allow: /core/*.gif
Allow: /core/*.jpg
Allow: /core/*.jpeg
Allow: /core/*.png
Allow: /core/*.svg
Allow: /profiles/*.css$
Allow: /profiles/*.css?
Allow: /profiles/*.js$
Allow: /profiles/*.js?
Allow: /profiles/*.gif
Allow: /profiles/*.jpg
Allow: /profiles/*.jpeg
Allow: /profiles/*.png
Allow: /profiles/*.svg
# Directories
Disallow: /core/
Disallow: /profiles/
# Files
Disallow: /README.txt
Disallow: /web.config
# Paths (clean URLs)
Disallow: /admin/
Disallow: /comment/reply/
Disallow: /filter/tips
Disallow: /node/add/
Disallow: /search
Disallow: /search/
Disallow: /*/search
Disallow: /*/search/
Disallow: /*/search$
Disallow: /*/search?
Disallow: /*?*search=
Disallow: /user/register
Disallow: /user/password
Disallow: /user/login
Disallow: /user/logout
Disallow: /media/oembed
Disallow: /*/media/oembed
# Paths (no clean URLs)
Disallow: /index.php/admin/
Disallow: /index.php/comment/reply/
Disallow: /index.php/filter/tips
Disallow: /index.php/node/add/
Disallow: /index.php/search/
Disallow: /index.php/user/password
Disallow: /index.php/user/register
Disallow: /index.php/user/login
Disallow: /index.php/user/logout
Disallow: /index.php/media/oembed
Disallow: /index.php/*/media/oembed

User-agent: Googlebot
Disallow: /private/
Disallow: /secret.html
Disallow: /tmp/
Disallow: /cgi-bin/
Disallow: /admin/
Disallow: /search
Disallow: /search/
Disallow: /*/search
Disallow: /*/search/
Disallow: /*/search$
Disallow: /*/search?
Disallow: /*?*search=

User-agent: Bingbot
Disallow: /

User-agent: Slurp
Disallow: /private/
Disallow: /secret.html
Disallow: /tmp/
Disallow: /cgi-bin/
Disallow: /admin/
Disallow: /search
Disallow: /search/
Disallow: /*/search
Disallow: /*/search/
Disallow: /*/search$
Disallow: /*/search?
Disallow: /*?*search=

User-agent: DuckDuckBot
Disallow: /private/
Disallow: /secret.html
Disallow: /tmp/
Disallow: /cgi-bin/
Disallow: /admin/
Disallow: /search
Disallow: /search/
Disallow: /*/search
Disallow: /*/search/
Disallow: /*/search$
Disallow: /*/search?
Disallow: /*?*search=

User-agent: Baiduspider
Disallow: /private/
Disallow: /secret.html
Disallow: /tmp/
Disallow: /cgi-bin/
Disallow: /admin/
Disallow: /search
Disallow: /search/
Disallow: /*/search
Disallow: /*/search/
Disallow: /*/search$
Disallow: /*/search?
Disallow: /*?*search=

User-agent: Yandex
Disallow: /

User-agent: Sogou Spider
Disallow: /private/
Disallow: /secret.html
Disallow: /tmp/
Disallow: /cgi-bin/
Disallow: /admin/
Disallow: /search
Disallow: /search/
Disallow: /*/search
Disallow: /*/search/
Disallow: /*/search$
Disallow: /*/search?
Disallow: /*?*search=

User-agent: MJ12bot
Disallow: /private/
Disallow: /secret.html
Disallow: /tmp/
Disallow: /cgi-bin/
Disallow: /admin/
Disallow: /search
Disallow: /search/
Disallow: /*/search
Disallow: /*/search/
Disallow: /*/search$
Disallow: /*/search?
Disallow: /*?*search=

User-agent: AhrefsBot
Disallow: /private/
Disallow: /secret.html
Disallow: /tmp/
Disallow: /cgi-bin/
Disallow: /admin/
Disallow: /search
Disallow: /search/
Disallow: /*/search
Disallow: /*/search/
Disallow: /*/search$
Disallow: /*/search?
Disallow: /*?*search=

User-agent: SemrushBot
Disallow: /private/
Disallow: /secret.html
Disallow: /tmp/
Disallow: /cgi-bin/
Disallow: /admin/
Disallow: /search
Disallow: /search/
Disallow: /*/search
Disallow: /*/search/
Disallow: /*/search$
Disallow: /*/search?
Disallow: /*?*search=

User-agent: GPTBot
Disallow: /

User-agent: FriendlyCrawler
Disallow: /

User-agent: YandexBot
Disallow: /

User-agent: ByteDanceSpider
Disallow: /

User-agent: ByteDance
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

# sitemap.xml
Sitemap: https://uct.ac.za/sitemap.xml

Change history

  1. initial snapshot
    • First snapshot of robots.txt archived