NeuralCrawl

Midjourney / robots.txt snapshot

← back to midjourney.com · fetched 2026-06-20T11:49:20Z (7h ago) · HTTP 200 · 6300 bytes · sha256 7918d83c331c7b7b · raw

final URL: manual:file

1# www.midjourney.com — robots.txt
2# Last reviewed: 2026-05-14
3#
4# Policy summary:
5# - Open to verified search & social bots (Googlebot, Bingbot, Twitterbot, etc.) — default.
6# - Auth-gated routes and tool surfaces disallowed to save crawl budget.
7# - /jobs/{id} image pages disallowed entirely — internal policy decision.
8# - AI training bots blocked. AI retrieval/citation bots allowed (so we can show up in
9# ChatGPT, Perplexity, Gemini citations without being used as training data).
10# - Sitemap pointer at the bottom.
11#
12# Note: robots.txt is a polite request. Bad actors ignore it. For absolute blocking,
13# layer Cloudflare WAF rules on top.
14
15# ============================================================================
16# DEFAULT — all bots not specifically named below
17# ============================================================================
18
19User-agent: *
20Allow: /
21
22# Auth-only & tool surfaces — these waste crawl budget today (GSC shows /imagine,
23# /account, /editor each absorbing 500K–800K weekly impressions to a login wall).
24Disallow: /api/
25Disallow: /auth/
26Disallow: /account
27Disallow: /preferences
28Disallow: /checkout/
29Disallow: /editor/
30Disallow: /organize/
31Disallow: /imagine
32Disallow: /personalize
33Disallow: /app/
34
35# Image detail pages — internal policy: do not expose the full /jobs/ corpus
36# to crawlers. Existing indexed pages will fall out of SERPs over weeks.
37Disallow: /jobs/
38
39# Allow CSS/JS/fonts so crawlers can render SSR'd content & SPA hydration.
40# Without these, Google may see incomplete pages and downgrade rendering quality.
41Allow: /*.css
42Allow: /*.js
43Allow: /*.woff2
44
45# ============================================================================
46# AI TRAINING BOTS — fully blocked
47# These bots crawl primarily to feed LLM training datasets. Blocking here prevents
48# our content (including /home marketing copy, /explore descriptions, blog posts)
49# from entering future model training corpora.
50# ============================================================================
51
52User-agent: GPTBot
53Disallow: /
54
55User-agent: CCBot
56Disallow: /
57
58User-agent: anthropic-ai
59Disallow: /
60
61User-agent: Bytespider
62Disallow: /
63
64User-agent: FacebookBot
65Disallow: /
66
67User-agent: Diffbot
68Disallow: /
69
70User-agent: Omgilibot
71Disallow: /
72
73User-agent: Omgili
74Disallow: /
75
76User-agent: ImagesiftBot
77Disallow: /
78
79User-agent: PetalBot
80Disallow: /
81
82User-agent: cohere-ai
83Disallow: /
84
85User-agent: cohere-training-data-crawler
86Disallow: /
87
88# Amazon — Rufus (shopping AI) and Alexa retrieval. Training-vs-retrieval split is
89# opaque, and image-platform referral value from Amazon AI products is negligible.
90User-agent: Amazonbot
91Disallow: /
92
93# You.com — small AI search engine with similar training-retrieval ambiguity.
94User-agent: YouBot
95Disallow: /
96
97# Google's training opt-out token. This is a "rules-only" directive (not a real
98# User-Agent string) — adding this tells Google to keep crawling normally for
99# Search but to NOT use our content for Gemini, Vertex AI Search, or other AI products.
100User-agent: Google-Extended
101Disallow: /
102
103# Apple's training opt-out token. Same shape as Google-Extended — Applebot still
104# crawls for Apple Intelligence retrieval/citations; Applebot-Extended blocks training.
105User-agent: Applebot-Extended
106Disallow: /
107
108# Meta's training opt-out token.
109User-agent: meta-externalagent
110Disallow: /
111
112# ============================================================================
113# COMMERCIAL CRAWLERS — crawl-budget protection
114# Not AI-training; these are commercial scrapers that feed paid backlink/keyword
115# indexes (Ahrefs, Semrush, etc.). We leave Ahrefs and Semrush ALLOWED because the
116# growth team uses those tools internally — blocking them would create a blind
117# spot in our own SEO data. MJ12bot (Majestic) and DataForSeoBot blocked because
118# we don't use those products internally, and they consume meaningful crawl bandwidth.
119# ============================================================================
120
121User-agent: MJ12bot
122Disallow: /
123
124User-agent: DataForSeoBot
125Disallow: /
126
127# ============================================================================
128# AI RETRIEVAL / CITATION BOTS — allowed
129# These bots crawl to support live answers in ChatGPT, Perplexity, Bing Copilot,
130# Claude, etc. Being allowed here lets us appear as cited sources, which drives
131# referral traffic from those products.
132# ============================================================================
133
134# OpenAI — search index for ChatGPT Search citations
135User-agent: OAI-SearchBot
136Allow: /
137Disallow: /api/
138Disallow: /auth/
139Disallow: /account
140Disallow: /preferences
141Disallow: /checkout/
142Disallow: /editor/
143Disallow: /organize/
144Disallow: /imagine
145Disallow: /personalize
146Disallow: /app/
147Disallow: /jobs/
148
149# OpenAI — real-time fetch when a ChatGPT user clicks "browse" or invokes web tool
150User-agent: ChatGPT-User
151Allow: /
152Disallow: /api/
153Disallow: /auth/
154Disallow: /account
155Disallow: /preferences
156Disallow: /checkout/
157Disallow: /editor/
158Disallow: /organize/
159Disallow: /imagine
160Disallow: /personalize
161Disallow: /app/
162Disallow: /jobs/
163
164# Perplexity — search index for Perplexity citations
165User-agent: PerplexityBot
166Allow: /
167Disallow: /api/
168Disallow: /auth/
169Disallow: /account
170Disallow: /preferences
171Disallow: /checkout/
172Disallow: /editor/
173Disallow: /organize/
174Disallow: /imagine
175Disallow: /personalize
176Disallow: /app/
177Disallow: /jobs/
178
179# Perplexity — real-time fetch when a Perplexity user submits a query
180User-agent: Perplexity-User
181Allow: /
182Disallow: /api/
183Disallow: /auth/
184Disallow: /account
185Disallow: /preferences
186Disallow: /checkout/
187Disallow: /editor/
188Disallow: /organize/
189Disallow: /imagine
190Disallow: /personalize
191Disallow: /app/
192Disallow: /jobs/
193
194# Anthropic — Claude's retrieval bot (Anthropic's documented policy is search/index only,
195# not training). The training opt-out is the `anthropic-ai` block above.
196User-agent: ClaudeBot
197Allow: /
198Disallow: /api/
199Disallow: /auth/
200Disallow: /account
201Disallow: /preferences
202Disallow: /checkout/
203Disallow: /editor/
204Disallow: /organize/
205Disallow: /imagine
206Disallow: /personalize
207Disallow: /app/
208Disallow: /jobs/
209
210# ============================================================================
211# Sitemap
212# ============================================================================
213
214Sitemap: https://www.midjourney.com/sitemap.xml