NeuralCrawl

πŸ‡ΈπŸ‡ͺ KTH Royal Institute of Technology

kth.se · Universities · rank #25 · University · live robots.txt ↗

AI crawler access (latest snapshot, 3h ago)

blocked restricted allowed faded = inherited from the * wildcard group

GPTBot
ChatGPT-User
OAI-SearchBot
ClaudeBot
Claude-User
Claude-SearchBot
anthropic-ai
Claude-Web
CCBot
Google-Extended
Applebot-Extended
PerplexityBot
Perplexity-User
Bytespider
Amazonbot
FacebookBot
meta-externalagent
meta-externalfetcher
cohere-ai
AI2Bot
Diffbot
omgili
YouBot
DuckAssistBot
MistralAI-User
PanguBot
Timpibot

Current robots.txt 941 bytes · sha256 ba3d31bfc57b · raw

#
# well-known resource robots.txt from 17.392
#
User-agent: *
Disallow: /lediga-jobb/language_redirect/
Disallow: /lediga-jobb/interna/
Disallow: /form/
Disallow: /public/
Disallow: /cm/
Disallow: /info/
Disallow: /work/
Disallow: /xyz/
Disallow: /xyz
Disallow: /en/xyz/
Disallow: /en/xyz
Disallow: /2.994/
Disallow: /2.9631/
Disallow: /2.1019/
Disallow: /2.1166/
Disallow: /2.1218/
Disallow: /2.1219/
Disallow: /2.36446/
Disallow: /2.14566/
Disallow: /2.28744/
Disallow: /2.98714/
Disallow: /en/2.28744/
Disallow: /2.9631/
Disallow: /en/2.9631/
Disallow: /search
Disallow: /social/user/_report_/abuse/
Disallow: /gemensamt/kategorier-1.31190
Disallow: /en/gemensamt/kategorier-1.31190
Disallow: /social/api/profile/1.1/
Disallow: /test/
Disallow: /blogs/tags
Disallow: /es/
Disallow: /*?rss=*
Disallow: /*?iCal=*
Disallow: /api/icalendar/
Disallow: /start/
Disallow: /intranat
Disallow: /sci/2.840

Sitemap: https://www.kth.se/sitemap.xml

Change history

  1. initial snapshot
    • First snapshot of robots.txt archived