What does Google say about SEO? /
Quick SEO Quiz

Test your SEO knowledge in 5 questions

Less than a minute. Find out how much you really know about Google search.

🕒 ~1 min 🎯 5 questions

Official statement

The most specific directive in the robots.txt file takes precedence over less specific directives, and the order of lines in the file does not matter to Google.
11:02
🎥 Source video

Extracted from a Google Search Central video

⏱ 55:47 💬 EN 📅 25/08/2015 ✂ 9 statements
Watch on YouTube (11:02) →
Other statements from this video 8
  1. 2:06 Le fichier robots.txt est-il vraiment indispensable pour ranker sur Google ?
  2. 4:30 Google peut-il vraiment indexer vos pages sans les crawler ?
  3. 15:52 Faut-il bloquer les pages de filtres par robots.txt ou miser sur la canonicalisation ?
  4. 16:16 Faut-il vraiment corriger toutes les erreurs du fichier robots.txt ?
  5. 18:53 Les outils Search Console pour robots.txt sont-ils vraiment fiables pour éviter les erreurs de crawl ?
  6. 22:14 L'API Google Maps peut-elle bloquer l'indexation de vos données de localisation ?
  7. 33:03 Pourquoi Google ignore-t-il la directive crawl-delay de votre robots.txt ?
  8. 52:55 Pourquoi bloquer des URLs en robots.txt dilue-t-il le PageRank de vos backlinks ?
📅
Official statement from (10 years ago)
TL;DR

Google applies the rule of maximum specificity in robots.txt files: a directive targeting a specific path always overrides a generic directive, regardless of their order of appearance. Specifically, if you allow /blog/article-seo.html and block /blog/, the specific allowance takes precedence. This logic disrupts the practices of those who relied on the sequential order of lines to manage their crawl exclusions.

What you need to understand

What does "specificity" really mean in a robots.txt?

The specificity of a directive is measured by the length and precision of the path it targets. A rule pointing to /folder/subfolder/page.html is more specific than a rule targeting just /folder/. Google examines each crawled URL and applies the most precise directive that matches that URL, end of story.

This logic borrows from CSS principles: when multiple rules conflict, the most targeted one prevails. If you have Disallow: /admin/ and Allow: /admin/public/stats.php, Google will crawl stats.php because the second directive specifies an exact file in a subdirectory, making it more specific than the generic block on /admin/.

Why doesn’t the order of lines matter?

Many SEO practitioners long believed that Google reads the robots.txt from top to bottom, applying the first directive encountered. This is false. Google parses the entire file, identifies all directives that match the URL in question, and then selects the one with the longest pattern.

The result: you can place your Allow: /blog/article-important.html on line 50 and your Disallow: /blog/ on line 2, and Google will still crawl the article because its directive is more granular. The sequential order is a myth inherited from other parsers, not Google’s.

What are the implications for managing crawl budget?

This specificity rule changes the game when you want to block entire sections while allowing a few strategic exceptions. You no longer have to juggle the order of lines or multiply personalized robots.txt files by bot.

However, it requires absolute precision in writing: a typo in the specific path, and the generic directive applies. There is no sequential safety net to catch the mistake. Robots.txt audits thus become even more critical, especially on sites with deep structures or parameterized URLs.

  • Specificity prevails: a directive targeting /a/b/c.html overrides one targeting /a/
  • Writing order is ignored: Google analyzes all lines simultaneously
  • Allowed exceptions must be more precise than generic blocks
  • Wildcards (*) and symbols ($) affect specificity depending on their placement
  • Test with Google Search Console: the robots.txt testing tool simulates this logic in real-time

SEO Expert opinion

Does this statement align with field observations?

Yes, and it’s one of the few areas where Google's official discourse perfectly matches technical reality. Tests with the Search Console tool systematically confirm this hierarchy by specificity. You can reverse the order of lines in a robots.txt, and the result remains unchanged.

However, an important nuance that Mueller does not mention here: wildcards and anchors ($) alter the specificity calculation. A directive Disallow: /*.pdf$ (any PDF) is less specific than Allow: /documents/annual-report.pdf (a specific file), but more specific than Disallow: /documents/ (a whole directory). The devil is in these syntactical details.

What common mistakes does this rule generate?

First classic trap: stacking generic Disallows thinking a higher Allow in the file will short-circuit them. Wrong. If your Allow targets a less specific or poorly formed pattern, the Disallow wins. I've seen sites lose 30% of their crawl on entire categories because of this misunderstanding.

Second mistake: neglecting trailing slashes. Disallow: /admin (without a slash) matches /admin AND /admintools AND /administration. Disallow: /admin/ (with a slash) matches only the directory /admin/ and its content. The difference seems minor, but the specificity diverges: the pattern without a slash is technically broader, thus less specific for a given file in /admin/.

In what situations might this logic get tricky?

On sites with dynamic URLs with multiple parameters, determining which directive is “the most specific” quickly becomes a puzzle. Example: Disallow: /product?id= vs Allow: /product?id=123&ref=promo. Technically, the second is longer, thus more specific, but if you also have Disallow: *?ref=promo, which one wins? [To be verified] Google does not extensively document these edge cases.

Another gray area: subdomains and root paths. A Disallow: / on the main domain blocks everything, but if you have a specific Allow further down, it should theoretically prevail. In practice, some third-party crawlers (not Googlebot) treat Disallow: / as absolute and ignore exceptions. Be cautious if you syndicate content or work with aggregators.

Warning: CDNs and reverse proxies can serve cached versions of robots.txt. If you modify a specific directive and the cache is not purged, Googlebot may see the old version for hours, creating crawl inconsistencies that are difficult to diagnose.

Practical impact and recommendations

How to audit a robots.txt for specificity conflicts?

The first step: list all your Disallow and Allow directives in a spreadsheet, along with the character length of each pattern. Sort by descending length. This gives you an immediate view of the hierarchy Google will apply. Any anomalies will stand out: a short Allow meant to unblock an area covered by a long Disallow will not work.

Next, use the robots.txt testing tool in Search Console. Test your critical URLs one by one. The tool tells you which directive applies and why. This is the only reliable way to validate Google’s logic without waiting for Googlebot to crawl. Don’t rely on third-party validators; many still implement the outdated sequential logic.

What writing rules should you adopt to avoid ambiguities?

Always favor explicit specificity. If you want to block /blog/ except for /blog/best-practices/, write both directives with their complete paths, no shortcuts. Use inline comments (#) to document the intention of each block, especially in multi-section files.

Avoid complex nested wildcards unless you fully understand their scope. A pattern like Disallow: /*?*sort=*&* may seem precise, but its actual specificity depends on the URL being tested. Prefer distinct blocks with fixed paths when possible. And test, test, test: every change needs to go through Search Console before deployment.

What to do if your current robots.txt contains inconsistencies?

Start by identifying risky sections: directories blocked en masse with allowed exceptions. Ensure each exception is indeed more specific than the block. If not, rewrite the patterns to eliminate any ambiguity. Document changes in a versioned changelog because a quick rollback can save your indexing in the event of a failed deployment.

If you manage a multilingual or multi-country site with robots.txt differentiated by ccTLD, synchronize the specificity logic across all files. An inconsistency between robots.txt files on .fr and .com creates indexing disparities that Google does not explicitly flag. Automate post-deployment tests with a script that queries the Search Console API to validate strategic URLs.

  • Audit the robots.txt line by line by calculating the length of patterns
  • Test each critical URL with the Search Console tool before any changes
  • Document each directive with a comment explaining its intention
  • Avoid ambiguous wildcards and prefer explicit paths
  • Version the robots.txt file and maintain a changelog of modifications
  • Set up Search Console alerts on crawl errors related to robots.txt blocks
Advanced management of robots.txt directives according to Google's specificity logic requires sharp technical expertise and ongoing vigilance. Interactions between patterns, wildcards, and complex structures can quickly slip out of control, directly impacting crawl budget and strategic indexing. For e-commerce sites or high-volume platforms, entrusting this aspect to a specialized SEO agency ensures secure crawl infrastructure while benefiting from regular audits and monitoring of parser developments.

❓ Frequently Asked Questions

Si deux directives ont exactement la même longueur de pattern, laquelle prime ?
Google privilégie alors l'Allow sur le Disallow en cas d'égalité stricte de spécificité. C'est une règle de fallback rarement documentée mais observable dans Search Console.
Les wildcards (*) augmentent-ils ou diminuent-ils la spécificité d'une directive ?
Ça dépend du contexte. Un wildcard en milieu de pattern peut rendre la directive plus permissive donc techniquement moins spécifique pour certaines URLs. Testez cas par cas avec Search Console.
Un changement dans robots.txt est-il pris en compte immédiatement par Googlebot ?
Non, Googlebot met à jour son cache du robots.txt toutes les 24h environ. Utilisez l'outil Search Console pour forcer une relecture si c'est urgent, mais attendez-vous à un délai de quelques heures.
Peut-on utiliser des expressions régulières (regex) dans un robots.txt pour Google ?
Non, Google ne supporte pas les regex complètes. Seuls les wildcards * (zéro ou plusieurs caractères) et $ (fin d'URL) sont reconnus. Les autres symboles regex sont interprétés littéralement.
Un Disallow: / avec un Allow: /page.html spécifique fonctionne-t-il vraiment ?
Oui, en théorie la spécificité de /page.html prime sur /. Mais certains crawlers tiers traitent Disallow: / comme absolu. Vérifiez le comportement réel dans vos logs si vous avez du trafic non-Google significatif.
🏷 Related Topics
Crawl & Indexing PDF & Files

🎥 From the same video 8

Other SEO insights extracted from this same Google Search Central video · duration 55 min · published on 25/08/2015

🎥 Watch the full video on YouTube →

Related statements

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.