NeuralCrawl

The Sydney Morning Herald / robots.txt snapshot

← back to smh.com.au · fetched 2026-06-20T01:10:30Z (18h ago) · HTTP 200 · 4048 bytes · sha256 15896d70e45f4ccd · raw

final URL: https://www.smh.com.au/robots.txt

1# NINE ENTERTAINMENT CO. POLICY STATEMENT
2# Nine Entertainment Co expressly prohibits the use of any Nine
3# content or data, including associated metadata, for any machine
4# learning and/or artificial intelligence including for the purposes
5# of training or development of AI technology, tools and machine
6# learning language models.
7# View our terms of use - https://login.nine.com.au/terms?client_id=smh
8
9# Sitemaps
10Sitemap: https://www.smh.com.au/sitemaps/news/brands/smh
11Sitemap: https://www.smh.com.au/sitemaps/smh-sitemaps-videos.xml
12Sitemap: https://www.smh.com.au/sitemaps/smh-sitemaps-articles.xml
13Sitemap: https://www.smh.com.au/rss/feed.xml
14
15# -----------------------------------------------------------------
16# 1. GENERAL CRAWLER RULES (Allowing standard search engines)
17# -----------------------------------------------------------------
18
19# All visitors
20User-agent: *
21Allow: /
22Disallow: /search?text=*
23Disallow: *?app=*
24Disallow: *?do=*
25Disallow: *?ocid=*
26Disallow: *?ref=*
27
28# -----------------------------------------------------------------
29# 2. SPECIFIC BLOCKS FOR AI, LLM, AND DATA-SCRAPING AGENTS
30# -----------------------------------------------------------------
31
32
33##########
34# Google AI Agents (Allows standard Googlebot to continue crawling)
35User-agent: Google-CloudVertexBot
36Disallow: /
37User-agent: Google-Extended
38Disallow: /
39
40##########
41# OpenAI
42User-agent: ChatGPT-User
43Disallow: /
44User-agent: GPTBot
45Disallow: /
46User-agent: OAI-SearchBot
47Disallow: /
48User-agent: OAISearch
49Disallow: /
50
51##########
52# Anthropic
53User-agent: anthropic-ai
54Disallow: /
55User-agent: claude-web
56Disallow: /
57User-agent: claudebot
58Disallow: /
59
60##########
61# Meta (Facebook/LLaMA)
62User-agent: facebookbot
63Disallow: /
64User-agent: meta-externalagent
65Disallow: /
66User-agent: meta-externalfetcher
67Disallow: /
68
69##########
70# Apple
71User-agent: applebot-extended
72Disallow: /
73
74##########
75# Perplexity AI
76User-agent: perplexitybot
77Disallow: /
78
79##########
80# Cohere
81User-agent: cohere-ai
82Disallow: /
83
84##########
85# You.com
86User-agent: youbot
87Disallow: /
88
89##########
90# Amazon
91User-agent: amazonbot
92Disallow: /
93
94##########
95# Alibaba Cloud
96User-agent: aliyunsecbot
97Disallow: /
98
99##########
100# Audigent
101User-agent: audigentadbot
102Disallow: /
103
104##########
105# Awario
106User-agent: awariorssbot
107Disallow: /
108User-agent: awariosmartbot
109Disallow: /
110
111##########
112# BLEX AI
113User-agent: blexbot
114Disallow: /
115
116##########
117# ByteDance
118User-agent: bytespider
119Disallow: /
120
121##########
122# Common Crawl
123User-agent: ccbot
124Disallow: /
125
126##########
127# DataForSEO
128User-agent: dataforseobot
129Disallow: /
130
131##########
132# Diffbot
133User-agent: diffbot
134Disallow: /
135
136##########
137# DuckDuckGo
138User-agent: duckassistbot
139Disallow: /
140
141##########
142# Echobox
143User-agent: echoboxbot
144Disallow: /
145
146##########
147# Friendly Technologies
148User-agent: friendlycrawler
149Disallow: /
150
151##########
152# Internet Archive / "Wayback Machine"
153User-agent: ia_archiver
154Disallow: /
155
156##########
157# ImageSift
158User-agent: imagesiftbot
159Disallow: /
160
161##########
162# MyCentralAI
163User-agent: mycentralaiscraperbot
164Disallow: /
165
166##########
167# NewsNow
168User-agent: newsnow
169Disallow: /
170
171##########
172# News-Please (Open-source)
173User-agent: news-please
174Disallow: /
175
176##########
177# Omgili
178User-agent: omgili
179Disallow: /
180User-agent: omgilibot
181Disallow: /
182User-agent: webzio-extended
183Disallow: /
184
185##########
186# Peer39
187User-agent: peer39_crawler
188Disallow: /
189User-agent: peer39_crawler/1.0
190Disallow: /
191
192##########
193# QuillBot
194User-agent: quillbot.com
195Disallow: /
196
197##########
198# Quora
199User-agent: quora-bot
200Disallow: /
201
202##########
203# Scrapy (Open-source)
204User-agent: scrapy
205Disallow: /
206
207##########
208# Seekr
209User-agent: seekrbot
210Disallow: /
211
212##########
213# Seznam.cz
214User-agent: seznamhomepagecrawler
215Disallow: /
216
217##########
218# TaraGroup
219User-agent: taragroup intelligent bot
220Disallow: /
221
222##########
223# Timpi
224User-agent: timpibot
225Disallow: /
226
227##########
228# Turnitin
229User-agent: turnitinbot
230Disallow: /
231
232##########
233# Others
234User-agent: viennatinybot
235Disallow: /
236User-agent: jetslide
237Disallow: /
238User-agent: magpie-crawler
239Disallow: /
240User-agent: poseidon research crawler
241Disallow: /
242