Web Dev//SEO//robots.txt
- A plain text file at the site root (`/robots.txt`) that tells crawlers which paths they may or may not access.
A plain text file at the site root (/robots.txt) that tells crawlers which paths they may or may not access.
Directives: User-agent (which crawler), Allow / Disallow (which paths), Crawl-delay (politeness throttle), Sitemap (pointer to sitemap).
It is a gentleman's agreement, not access control. Malicious bots ignore it. Well-behaved bots (Googlebot, GPTBot, ClaudeBot) respect it.
AI-era relevance: you can selectively block AI crawlers while allowing search engines. User-agent: GPTBot / Disallow: / blocks OpenAI's crawler from training on your content.
Common mistake: blocking crawlers from CSS/JS files makes your page unrenderable for Googlebot (it needs to render the page to index it).
The file itself is public — anyone can read your robots.txt to see what you are hiding. Do not use it for security.