NeuralCrawl

๐Ÿ‡ซ๐Ÿ‡ท Air Liquide

airliquide.com · European companies · rank #141 · Chemicals · live robots.txt ↗

AI crawler access (latest snapshot, 13h ago)

blocked restricted allowed faded = inherited from the * wildcard group

GPTBot
ChatGPT-User
OAI-SearchBot
ClaudeBot
Claude-User
Claude-SearchBot
anthropic-ai
Claude-Web
CCBot
Google-Extended
Applebot-Extended
PerplexityBot
Perplexity-User
Bytespider
Amazonbot
FacebookBot
meta-externalagent
meta-externalfetcher
cohere-ai
AI2Bot
Diffbot
omgili
YouBot
DuckAssistBot
MistralAI-User
PanguBot
Timpibot

Current robots.txt 3131 bytes · sha256 610d38f12a99 · raw

#
# robots.txt
#
# This file is to prevent the crawling and indexing of certain parts
# of your site by web crawlers and spiders run by sites like Yahoo!
# and Google. By telling these "robots" where not to go on your site,
# you save bandwidth and server resources.
#
# This file will be ignored unless it is at the root of your host:
# Used:    http://example.com/robots.txt
# Ignored: http://example.com/site/robots.txt
#
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/robotstxt.html

User-agent: *
# CSS, JS, Images
Allow: /core/*.css$
Allow: /core/*.css?
Allow: /core/*.js$
Allow: /core/*.js?
Allow: /core/*.gif
Allow: /core/*.jpg
Allow: /core/*.jpeg
Allow: /core/*.png
Allow: /core/*.svg
Allow: /profiles/*.css$
Allow: /profiles/*.css?
Allow: /profiles/*.js$
Allow: /profiles/*.js?
Allow: /profiles/*.gif
Allow: /profiles/*.jpg
Allow: /profiles/*.jpeg
Allow: /profiles/*.png
Allow: /profiles/*.svg
Allow: /sites/*/files/
# Directories
Disallow: /core/
Disallow: /profiles/
# Files
Disallow: /README.txt
Disallow: /web.config
# Paths (clean URLs)
Disallow: /admin/
Disallow: /comment/reply/
Disallow: /filter/tips
Disallow: /node/add/
Disallow: /search/
Disallow: /user/register/
Disallow: /user/password/
Disallow: /user/login/
Disallow: /user/logout/
# Paths (no clean URLs)
Disallow: /index.php/admin/
Disallow: /index.php/comment/reply/
Disallow: /index.php/filter/tips
Disallow: /index.php/node/add/
Disallow: /index.php/search/
Disallow: /index.php/user/password/
Disallow: /index.php/user/register/
Disallow: /index.php/user/login/
Disallow: /index.php/user/logout/
# Specific rules
Disallow: */saml_login
Disallow: */add-to-calendar/ics/*
Disallow: */api/airliquide/download/file/*
Disallow: */form/*
Disallow: */opinion_survey_json/add/*
Disallow: */aggregate
Disallow: */page_action/*
Disallow: */spa/*
Disallow: */ajax/*
Disallow: */jserrors/*
Disallow: */metrics/*
Disallow: */page_view_timing/*
Disallow: */page_view_event/*
Disallow: */session_trace/*
# Blocking all parameters
Disallow: *=*
Disallow: */node*
# Except those
Allow: *page=*
Allow: *languageSelect=*
Allow: *thematic%5B0%5D=*
Allow: *field_date_range_end_value=&field_date_range_end_value_1=&page=*
Allow: *period%5Bmin%5D=&period%5Bmax%5D=&text=&page=*
Allow: *period%5Bmin%5D=&period%5Bmax%5D=&page=*
Allow: *.gif*
Allow: *.jpg*
Allow: *.jpeg*
Allow: *.png*
Allow: *.webp*
# Blocking some PDFs
Disallow: /sites/airliquide.com/files/2023-03/air-liquide-rapport-de-developpement-durable-2022.pdf
Disallow: /sites/airliquide.com/files/2023-03/air-liquide-sustainability-report-2022.pdf
Disallow: /sites/airliquide.com/files/2022-04/2021-sustainability-report.pdf
Disallow: /sites/airliquide.com/files/2022-04/rapport-developpement-durable-2021.pdf
Disallow: /group/press-releases-news/2023-03-24/sustainability-report-2022-air-liquide-presents-its-results-and-sets-additional-objectives
Disallow: /fr/groupe/communiques-presse-actualites/24-03-2023/rapport-de-developpement-durable-2022-air-liquide-presente-ses-resultats-et-se-fixe-des-objectifs
# XML sitemap
Sitemap: https://www.airliquide.com/sitemap.xml

Change history

  1. initial snapshot
    • First snapshot of robots.txt archived