NeuralCrawl

๐Ÿ‡ฏ๐Ÿ‡ต Trend Micro

trendmicro.com · Cybersecurity · rank #40 · Cybersecurity · live robots.txt ↗

AI crawler access (latest snapshot, 5h ago)

blocked restricted allowed faded = inherited from the * wildcard group

GPTBot
ChatGPT-User
OAI-SearchBot
ClaudeBot
Claude-User
Claude-SearchBot
anthropic-ai
Claude-Web
CCBot
Google-Extended
Applebot-Extended
PerplexityBot
Perplexity-User
Bytespider
Amazonbot
FacebookBot
meta-externalagent
meta-externalfetcher
cohere-ai
AI2Bot
Diffbot
omgili
YouBot
DuckAssistBot
MistralAI-User
PanguBot
Timpibot

Current robots.txt 2329 bytes · sha256 028994f021ef · raw

# Rules for Adsbot-Google (specific bot)
User-agent: Adsbot-Google
Allow: /core/docs/datasheets/*.pdf$
Allow: *.js$
Allow: *.css$
Disallow: /*.pdf$
Disallow: /*/config/
Disallow: /*/common/
Disallow: /*/error-messages/
Disallow: /*/business/trials_parsing_pages/
Disallow: /*/business/parsing-pages/
Disallow: /download/*
Disallow: /form/
Disallow: /ftp/
Disallow: /go/
Disallow: /housecall/
Disallow: /housecall/us/
Disallow: /housecall7/
Disallow: /ja_jp/forHome/xsp/
Disallow: /machform/
Disallow: /tools/*
Disallow: /*?bvstate=*

# Rules for onsite search crawler
User-agent: cludo
Allow: *.pdf$
Disallow: /*/list/*
Disallow: /*/tag/*
Disallow: /*/config/
Disallow: /*/common/
Disallow: /*/error-messages/
Disallow: /*/business/trials_parsing_pages/
Disallow: /*/business/parsing-pages/

# Rules for LLM/AI crawlers
User-agent: OAI-SearchBot
User-agent: ChatGPT-User
User-agent: ChatGPT-User/2.0
User-agent: ClaudeBot
User-agent: claude-web
User-agent: Claude-SearchBot
User-agent: Claude-User
User-agent: PerplexityBot
User-agent: Perplexity-User
User-agent: Google-Extended
User-agent: Googlebot
User-agent: Bingbot
User-agent: Amazonbot
User-agent: Applebot
User-agent: FacebookBot
User-agent: meta-externalagent
User-agent: LinkedInBot
User-agent: Bytespider
User-agent: DuckAssistBot
User-agent: cohere-ai
User-agent: CCBot
Allow: *.pdf$
Allow: *.js$
Allow: *.css$
Disallow: /*/config/
Disallow: /*/common/
Disallow: /*/error-messages/
Disallow: /*/business/trials_parsing_pages/
Disallow: /*/business/parsing-pages/
Disallow: /download/*
Disallow: /form/
Disallow: /ftp/
Disallow: /go/
Disallow: /housecall/
Disallow: /housecall/us/
Disallow: /housecall7/
Disallow: /ja_jp/forHome/xsp/
Disallow: /machform/
Disallow: /tools/*
Disallow: /*?bvstate=*

# Rules for all other bots
User-agent: *
Allow: /core/docs/datasheets/*.pdf$
Disallow: /*.pdf$
Disallow: /*/config/
Disallow: /*/common/
Disallow: /*/error-messages/
Disallow: /*/business/trials_parsing_pages/
Disallow: /*/business/parsing-pages/
Disallow: /download/*
Disallow: /form/
Disallow: /ftp/
Disallow: /go/
Disallow: /housecall/
Disallow: /housecall/us/
Disallow: /housecall7/
Disallow: /ja_jp/forHome/xsp/
Disallow: /machform/
Disallow: /tools/*
Disallow: /*?bvstate=*
Disallow: /cloudoneconformity-staging/*

Sitemap: https://www.trendmicro.com/sitemap.xml

Change history

  1. initial snapshot
    • First snapshot of robots.txt archived