Robots.txt Generator

Quick Presets

Configuration

Sitemap URL

Disallowed Paths (one per line)

Allowed Paths (one per line, overrides disallow)

Crawl Delay (seconds, optional)

Bot-Specific Rules

Block GPTBot (OpenAI)

Block CCBot (Common Crawl)

Block Bingbot

Block SemrushBot

Block AhrefsBot

Generated robots.txt

About robots.txt

The robots.txt file tells search engine crawlers which pages they can and cannot access on your site. It's placed at the root of your domain (e.g., example.com/robots.txt).

Important: robots.txt is a suggestion, not a security measure. Malicious bots can ignore it. Don't use it to hide sensitive content.

Common Directives

User-agent: Which bot the rules apply to (* = all)
Disallow: Path to block
Allow: Path to allow (overrides Disallow)
Sitemap: URL of your XML sitemap
Crawl-delay: Seconds between requests

AI Bot Blocking

Many website owners now block AI training bots. GPTBot (OpenAI), CCBot (Common Crawl), and others scrape content for AI training. You can block them while still allowing Google and Bing.

Understanding robots.txt

The robots.txt file is one of the most important technical SEO files on any website. It lives at the root of your domain and provides instructions to search engine crawlers about which parts of your site they should and shouldn't access. While it's a simple text file, getting it wrong can accidentally block your entire site from being indexed.

Every website should have a robots.txt file, even if it just allows everything. Search engines like Google check for this file before crawling your site. If it's missing, crawlers will assume they can access everything. If it contains errors, crawlers may be blocked from important pages or waste their crawl budget on unimportant ones.

For African businesses and developers, proper robots.txt configuration is especially important for managing crawl budget efficiently. If your site is hosted in Africa, response times to Google's crawlers (which are primarily based in the US and Europe) may be higher, making crawl budget management even more critical. Block unnecessary paths like admin panels, search result pages, and staging areas to ensure Google focuses its crawl budget on your most important pages.

In 2024-2025, blocking AI training bots has become a major concern. Bots like GPTBot (OpenAI), CCBot (Common Crawl/Anthropic), and Google-Extended scrape website content for AI training. Many publishers and businesses now specifically block these bots while continuing to allow search engine indexing. Our generator makes it easy to add these rules.

Frequently Asked Questions

Where do I put the robots.txt file?

The robots.txt file must be placed at the root of your domain: https://example.com/robots.txt. It only works at the root level — placing it in a subdirectory (like /blog/robots.txt) will not work. Upload it via FTP, your hosting file manager, or your CMS settings.

Will robots.txt hide my pages from Google?

Not exactly. Blocking a page in robots.txt prevents Google from crawling it, but the URL may still appear in search results (without a description) if other pages link to it. To fully remove a page from Google, use the "noindex" meta tag instead, and make sure the page is NOT blocked in robots.txt (Google needs to crawl the page to see the noindex tag).