NeuralCrawl

๐Ÿ‡ณ๐Ÿ‡ฑ Weaviate

weaviate.io · Top 1000 websites · rank #36 · AI Chatbots and Tools · live robots.txt ↗

AI crawler access (latest snapshot, 13h ago)

blocked restricted allowed faded = inherited from the * wildcard group

GPTBot
ChatGPT-User
OAI-SearchBot
ClaudeBot
Claude-User
Claude-SearchBot
anthropic-ai
Claude-Web
CCBot
Google-Extended
Applebot-Extended
PerplexityBot
Perplexity-User
Bytespider
Amazonbot
FacebookBot
meta-externalagent
meta-externalfetcher
cohere-ai
AI2Bot
Diffbot
omgili
YouBot
DuckAssistBot
MistralAI-User
PanguBot
Timpibot

Current robots.txt 556 bytes · sha256 f0d7abe6b9f7 · raw

Sitemap: https://weaviate.io/sitemap-index.xml
LLMS: https://weaviate.io/llms.txt

User-agent: *
Allow: /
Allow: /llms.txt
Disallow: /*?*
Disallow: /expert-sessions
Disallow: /blog/rss.xml
Disallow: /blog/atom.xml
Disallow: /feed
Disallow: /feed.xml
Disallow: /rss
Disallow: /rss.xml
Disallow: /atom
Disallow: /atom.xml

# AI Search Engine Bots
User-agent: GPTBot
Allow: /


User-agent: ChatGPT-User
Allow: /


User-agent: PerplexityBot
Allow: /


User-agent: ClaudeBot
Allow: /


User-agent: anthropic-ai
Allow: /


User-agent: Applebot-Extended
Allow: /

Change history

  1. initial snapshot
    • First snapshot of robots.txt archived