Official statement
Other statements from this video 13 ▾
- 2:10 Vos pages de localisation risquent-elles d'être pénalisées comme des doorway pages ?
- 5:30 Les alertes HTTPS de Search Console influencent-elles vraiment votre classement Google ?
- 6:58 Pourquoi Google ajoute-t-il votre nom de marque dans les titres de page ?
- 11:37 Pourquoi Google désindexe-t-il des pages après une migration HTTPS ?
- 13:45 Pourquoi robots.txt bloque-t-il aussi les directives noindex et canonical ?
- 16:57 Faut-il signaler le spam des concurrents à Google pour gagner des positions ?
- 19:44 Est-ce que le noindex supprime vraiment le PageRank transmis par vos liens internes ?
- 25:19 Faut-il montrer à Googlebot les bannières anti-bloqueurs de pub ?
- 28:26 Faut-il vraiment optimiser ses sitemaps pour influencer le crawl de Google ?
- 30:01 Les méta descriptions longues génèrent-elles vraiment plus de clics ?
- 36:49 Peut-on vraiment transformer un site éditorial en site transactionnel sans pénalité SEO ?
- 44:22 Faut-il vraiment cacher du contenu à Googlebot pour optimiser l'expérience géolocalisée ?
- 53:55 Googlebot indexe-t-il vraiment tout le contenu JavaScript sans interaction utilisateur ?
Google acknowledges that blocking sorting and faceted navigation pages via robots.txt can be appropriate, but it is not a universal rule. The decision depends on the architecture of each site and the actual SEO value of these pages. Alternatives like rel=canonical or parameter management in Search Console exist and may be more suitable depending on the context.
What you need to understand
What makes facets a SEO problem?
E-commerce and directory sites often generate thousands of page combinations through their filtering systems: sorting by price, color, size, availability. Each combination creates a unique URL that dilutes crawl budget and generates almost duplicate content.
Google has to decide which pages to explore first. If your crawl budget evaporates on facet variations that bring no unique search value, strategic pages may be overlooked. This is the syndrome of a site that explodes into URLs but stagnates in visibility.
What does Mueller actually say about managing these pages?
Mueller remains intentionally vague: blocking via robots.txt is “sometimes appropriate”. Sometimes. Not always. The nuance is crucial. He suggests two alternatives: rel=canonical to consolidate SEO juice to a master page, and the parameter tool in Search Console to indicate to Google how to interpret URLs with query strings.
This statement carefully avoids giving a universal directive. Google puts the responsibility back on the practitioner: it’s up to you to analyze your case and choose the appropriate weapon. There’s no magic recipe.
What’s the difference between blocking and canonicalizing?
Blocking via robots.txt prevents Googlebot from exploring the page. Period. The content is never crawled, internal links are not followed, and no signals go back. It’s a double-locked door.
Rel=canonical, on the other hand, allows Google to crawl the page but tells it that another URL is the reference version. Signals (links, content) can be consolidated to the canonical page. This is a controlled merge rather than a prohibition.
- Robots.txt = total exclusion, no crawl, no signal consolidation
- Rel=canonical = crawl allowed, signals consolidated to the master page
- Search Console Parameters = indications to Google on the role of URL parameters (sort, session, tracking)
- Noindex = crawl allowed but exclusion from the index (often forgotten hybrid option)
- The choice depends on the site architecture, the volume of facets, and their potential SEO value
SEO Expert opinion
Is this permissive approach consistent with on-the-ground practices?
Yes and no. On massive sites (tens of thousands of products, hundreds of possible filters), pure blocking via robots.txt remains the brutal but effective method to prevent crawl budget explosion. I have seen e-commerce platforms with 200,000 indexed URLs where 80% were facets with no organic traffic. Cleaning up via robots.txt freed up budget for strategic pages.
But systematically blocking can be counterproductive. Some facets generate qualified long-tail traffic: “pink size 38 women’s running shoes available immediately” may match a precise search intent. Blocking this URL means abandoning that traffic. [To verify]: Mueller provides no quantitative criteria for deciding. What traffic threshold justifies keeping a facet indexable? Silence.
When does rel=canonical become risky?
Google treats rel=canonical as a strong suggestion, not a command. If the canonicalized page and the facet variant differ too much (content, displayed products, structure), Google may ignore your directive. I have observed cases where Google indexed the facet despite a canonical pointing to the main category, simply because the facet had accumulated external backlinks.
The parameter tool in Search Console is almost abandoned by Google itself: the interface is outdated, updates are slow, and Google now recommends using on-page signals (canonical, meta robots) instead of this tool. Mentioning it in this statement feels like recycling old responses.
What are the true decision criteria?
The choice between blocking, canonicalizing, or allowing indexing should rely on three variables: volume of possible combinations, potential organic traffic per facet, and the technical capacity to manage directives at scale. A site with 50 facets can afford to keep them indexable if each targets a distinct intent. A site with 10,000 combinations must filter.
Mueller provides no decision framework. This is frustrating for a practitioner looking for numeric thresholds or heuristics. The answer “it depends” is technically correct but operationally useless without an analysis grid.
Practical impact and recommendations
How do you audit existing facets on your site?
Start by extracting all indexed URLs containing sorting or filtering parameters. Use Google Search Console (Performance > Pages) and cross-reference with a Screaming Frog or Oncrawl crawl. Identify patterns: ?sort=, ?filter=, ?color=, etc. Rank these URLs by organic traffic volume over the last six months.
Then, calculate the ratio of traffic generated / number of URLs per facet pattern. If a pattern generates 0.1 visits per month per URL on average, it’s a candidate for blocking or canonicalization. If a pattern exceeds 5 visits/month/URL, it’s probably deserving of being indexable.
What technical strategy to deploy based on the diagnosis?
For facets without SEO value (sorting by popularity, date added, session filters), add a targeted Disallow rule in robots.txt. Example: Disallow: /*?sort=. Ensure that internal links to these URLs carry a rel="nofollow" to avoid wasting internal PageRank.
For facets with SEO potential (descriptive filters like “black leather sofa 3-seater”), implement a rel=canonical to the parent category or main filter page. Make sure that the unique content (H1 title, meta description, intro text) justifies indexing. If the facet page has no unique editorial content, there’s no reason to index it even with potential traffic.
How to check that the directives are properly applied?
After modifying robots.txt, use the URL Inspection tool in Search Console to test some blocked URLs. Verify that the status displays “Blocked by robots.txt”. For canonicals, crawl the site with Screaming Frog and export the canonical chains: detect loops, orphan canonicals, and cases where Google index the variant instead of the canonical.
Monitor the evolution of the number of indexed pages in Search Console (Coverage > Excluded). A well-executed facet cleanup reduces indexed URLs by 30 to 70% on some sites, without loss of traffic if the blocked facets were indeed valueless. If traffic drops after blocking, you probably sacrificed performing facets: immediate rollback and more detailed analysis.
- Export all URLs with parameters from Search Console and your crawler
- Calculate the average organic traffic per facet pattern (minimum 6 months)
- Block via robots.txt patterns with no significant traffic (<1 visit/month/URL)
- Canonicalize towards the parent category facets with potential but no unique content
- Only keep indexable facets with proper editorial content and proven traffic
- Test the directives with the URL Inspection tool and a complete crawl post-modification
❓ Frequently Asked Questions
Peut-on bloquer des facettes dans robots.txt sans perdre le trafic qu'elles génèrent ?
Le rel=canonical suffit-il à empêcher l'indexation d'une facette ?
L'outil de gestion des paramètres dans Search Console est-il encore utile ?
Combien de temps faut-il pour que Google prenne en compte un blocage de facettes dans robots.txt ?
Faut-il nofollow les liens internes vers les facettes bloquées ou canonicalisées ?
🎥 From the same video 13
Other SEO insights extracted from this same Google Search Central video · duration 57 min · published on 12/12/2017
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.