NeuralCrawl

πŸ‡ΊπŸ‡Έ Open Science Framework

osf.io · Academic & open research · rank #12 · Research repository · live robots.txt ↗

AI crawler access (latest snapshot, 3h ago)

blocked restricted allowed faded = inherited from the * wildcard group

GPTBot
ChatGPT-User
OAI-SearchBot
ClaudeBot
Claude-User
Claude-SearchBot
anthropic-ai
Claude-Web
CCBot
Google-Extended
Applebot-Extended
PerplexityBot
Perplexity-User
Bytespider
Amazonbot
FacebookBot
meta-externalagent
meta-externalfetcher
cohere-ai
AI2Bot
Diffbot
omgili
YouBot
DuckAssistBot
MistralAI-User
PanguBot
Timpibot

Current robots.txt 255 bytes · sha256 55143ee8903b · raw

# www.robotstxt.org/

User-agent: *
Disallow: /api/*
Disallow: *?view_only=
crawl-delay: 10

# Robots that have misbehaved
User-agent: PingBot
User-agent: PerplexityBot
User-agent: GPTBot
User-agent: BaiduSpider
User-agent: Meta-ExternalAgent
Disallow: *

Change history

  1. initial snapshot
    • First snapshot of robots.txt archived