RubanTools

Robots.txt Generator

Configure which crawlers can access which parts of your website - download a ready-to-use robots.txt file.

Configuration
robots.txt Output

                

Robots.txt Generator

A robots.txt file is a plain-text file placed in the root directory of a website (e.g. https://example.com/robots.txt) that instructs web crawlers which pages or sections they may or may not access. It follows the Robots Exclusion Protocol (REP), first proposed in 1994. This generator lets you configure rules for specific bots - Googlebot, Bingbot, or all crawlers - choose directories or file types to block or allow, and add your sitemap URL. The output is ready to paste directly into your server root.

Why robots.txt Matters for Indian Websites

Indian web publishers - from news portals like NDTV and The Hindu to e-commerce sites and government portals - rely on correct robots.txt configuration to manage crawl budget efficiently. Google's Search Console (used by millions of Indian webmasters) highlights robots.txt errors as a critical issue. Incorrectly blocking Googlebot from CSS and JavaScript files - a common mistake - can prevent Google from rendering your pages properly, leading to lower rankings. For Indian government portals registered under the .gov.in domain, NIC (National Informatics Centre) guidelines recommend explicit bot directives to prevent duplicate content being indexed from staging subdomains. The SEBI investor education portal and RBI's public data pages are examples of sites that use robots.txt to manage crawler access carefully.

Common Presets

Use the "Block all bots" preset for staging servers, "Allow all" for new public sites, or build custom rules per bot using the rule builder. Always test your robots.txt using Google Search Console's robots.txt tester before deploying.

Robots.txt Questions

robots.txt is a plain text file placed at the root of your website (e.g. https://example.com/robots.txt) that tells search engine crawlers which pages or sections they are allowed or not allowed to access. It is the standard method for managing bot access to your site and is respected by Google, Bing, and most well-behaved crawlers. Without it, all pages are crawlable by default.

Disallow blocks a crawler from accessing a path or directory. Allow explicitly permits access to a path that would otherwise be blocked by a broader Disallow rule. For example, you can disallow /admin/ but allow /admin/public/ for specific content. Allow rules are evaluated before Disallow rules when both match a URL.

Disallowing a page in robots.txt prevents Google from crawling it, but not necessarily from indexing it. If another site links to a disallowed page, Google may still index it with a minimal entry. To prevent indexing, use the noindex meta robots tag on the page itself - robots.txt blocks the crawler from reading that tag, so use noindex for pages you want removed from search results entirely.

Crawl-delay instructs bots to wait a specified number of seconds between requests to reduce server load. Note: Google officially ignores Crawl-delay in robots.txt - use Google Search Console's Crawl Rate settings instead. Bing, Yandex, and many other bots do respect Crawl-delay. Use it if your server struggles under heavy crawler traffic.

Add a Sitemap directive at the end of your robots.txt file: Sitemap: https://example.com/sitemap.xml. This helps search engines discover your sitemap even if you haven't submitted it via Google Search Console. You can add multiple Sitemap lines if you have multiple sitemap files. Our generator includes a Sitemap URL field for this purpose.