NeuralCrawl

๐Ÿ‡ฌ๐Ÿ‡ง Arm Holdings

arm.com · Nasdaq 100 · rank #101 · Semiconductors · live robots.txt ↗

AI crawler access (latest snapshot, 5h ago)

blocked restricted allowed faded = inherited from the * wildcard group

GPTBot
ChatGPT-User
OAI-SearchBot
ClaudeBot
Claude-User
Claude-SearchBot
anthropic-ai
Claude-Web
CCBot
Google-Extended
Applebot-Extended
PerplexityBot
Perplexity-User
Bytespider
Amazonbot
FacebookBot
meta-externalagent
meta-externalfetcher
cohere-ai
AI2Bot
Diffbot
omgili
YouBot
DuckAssistBot
MistralAI-User
PanguBot
Timpibot

Current robots.txt 2000 bytes · sha256 6f7749535458 · raw

# ============================
# robots.txt for www.arm.com
# ============================

# ----------------------------
# ALLOW: Trusted AI/LLM Crawlers for Training & Indexing
# ----------------------------

User-agent: anthropic-ai
Allow: /llms.txt

User-agent: GPTBot
Allow: /llms.txt

User-agent: ClaudeBot
Allow: /llms.txt

User-agent: Claude-web
Allow: /llms.txt

User-agent: CCBot
Allow: /llms.txt

User-agent: Google-Extended
Allow: /llms.txt

User-agent: Amazonbot
Allow: /llms.txt

User-agent: Applebot
Allow: /llms.txt

User-agent: Bingbot
Allow: /llms.txt

User-agent: ChatGPT-User
Allow: /llms.txt

User-agent: Bytespider
Allow: /llms.txt

User-agent: PerplexityBot
Allow: /llms.txt

User-agent: Sogou
Allow: /llms.txt

Sitemap: https://www.arm.com/sitemap_index.xml

# ----------------------------
# Internal Search Bot - Limited Access
# ----------------------------
User-agent: CoveoEnterpriseSearch
Allow: /news/20*

User-agent: *
Disallow: /coveo/

# ----------------------------
# BLOCK: Crawlers with Low Value or High Server Load
# ----------------------------
User-agent: AhrefsBot
Disallow: /

User-agent: YandexBot
Disallow: /

User-agent: MegaIndex.ru
Disallow: /

User-agent: SemrushBot
Disallow: /

User-agent: Qwantify/Bleriot
Disallow: /

User-agent: DotBot
Disallow: /

User-agent: MJ12bot
Disallow: /

User-agent: SEOkicks
Disallow: /

# ----------------------------
# GENERAL RULES - Internal/Private Paths
# ----------------------------
User-agent: *

# Backend or dev/testing paths
Disallow: /assets/fonts/
Disallow: /assets/Fonts/
Disallow: /includes/
Disallow: /phpscripts/
Disallow: /shouldremainempty/
Disallow: /xml/
Disallow: /zh/includes/
Disallow: /zh/shouldremainempty/
Disallow: /zh/xml/
Disallow: /about/newsroom

# Block legacy trademark-related assets
Disallow: /-/media/global/company/policies/trademarks/incorrect-logo/

# Block thank-you pages or soft redirects
Disallow: /*-ty$

# Block internal site search
Disallow: /search*
Disallow: /Search*

Change history

  1. initial snapshot
    • First snapshot of robots.txt archived