NeuralCrawl

United States (USA.gov) / robots.txt snapshot

← back to usa.gov · fetched 2026-06-20T01:10:30Z (18h ago) · HTTP 200 · 1558 bytes · sha256 57210da4611cc41b · raw

final URL: https://www.usa.gov/robots.txt

1#
2# robots.txt
3#
4# This file is to prevent the crawling and indexing of certain parts
5# of your site by web crawlers and spiders run by sites like Yahoo!
6# and Google. By telling these "robots" where not to go on your site,
7# you save bandwidth and server resources.
8#
9# This file will be ignored unless it is at the root of your host:
10# Used: http://example.com/robots.txt
11# Ignored: http://example.com/site/robots.txt
12#
13# For more information about the robots.txt standard, see:
14# http://www.robotstxt.org/robotstxt.html
15
16User-agent: *
17Crawl-delay: 10
18
19# Sitemaps
20Sitemap: https://www.usa.gov/sitemap.xml
21
22# CSS, JS, Images
23Allow: /misc/*.css$
24Allow: /misc/*.css?
25Allow: /misc/*.js$
26Allow: /misc/*.js?
27Allow: /misc/*.gif
28Allow: /misc/*.jpg
29Allow: /misc/*.jpeg
30Allow: /misc/*.png
31Allow: /modules/*.css$
32Allow: /modules/*.css?
33Allow: /modules/*.js$
34Allow: /modules/*.js?
35Allow: /modules/*.gif
36Allow: /modules/*.jpg
37Allow: /modules/*.jpeg
38Allow: /modules/*.png
39Allow: /profiles/*.css$
40Allow: /profiles/*.css?
41Allow: /profiles/*.js$
42Allow: /profiles/*.js?
43Allow: /profiles/*.gif
44Allow: /profiles/*.jpg
45Allow: /profiles/*.jpeg
46Allow: /profiles/*.png
47Allow: /themes/*.css$
48Allow: /themes/*.css?
49Allow: /themes/*.js$
50Allow: /themes/*.js?
51Allow: /themes/*.gif
52Allow: /themes/*.jpg
53Allow: /themes/*.jpeg
54Allow: /themes/*.png
55# Directories
56Disallow: /includes/
57Disallow: /misc/
58Disallow: /modules/
59Disallow: /profiles/
60Disallow: /scripts/
61Disallow: /themes/
62# We shouldn't have these, but:
63Disallow: /node
64Disallow: /node/
65Disallow: /es/node
66Disallow: /es/node/