What does Google say about SEO? /
Quick SEO Quiz

Test your SEO knowledge in 5 questions

Less than a minute. Find out how much you really know about Google search.

🕒 ~1 min 🎯 5 questions

Official statement

Hreflang is a very strong signal indicating which language version to use. However, if we find that two versions of pages are essentially identical, we may choose to merge them into one. Make sure that each national version of the site has unique content to prevent duplication issues.
4:07
🎥 Source video

Extracted from a Google Search Central video

⏱ 58:27 💬 EN 📅 04/11/2016 ✂ 24 statements
Watch on YouTube (4:07) →
Other statements from this video 23
  1. 1:33 Pourquoi Google affiche-t-il une version de cache erronée pour vos sites multirégionaux ?
  2. 2:07 Hreflang peut-il fusionner vos sites multirégionaux malgré vous ?
  3. 3:41 Les signaux sociaux influencent-ils vraiment le classement Google ?
  4. 3:42 Les signaux sociaux influencent-ils vraiment le classement Google ?
  5. 5:15 Faut-il encore optimiser ses sitelinks ou Google décide-t-il seul ?
  6. 6:26 Pourquoi votre navigation interne conditionne-t-elle l'affichage de vos sitelinks dans Google ?
  7. 10:02 Les extraits enrichis protègent-ils vraiment votre site des pénalités algorithmiques ?
  8. 14:16 Les liens externes comptent-ils vraiment moins que l'UX pour évaluer la qualité d'un site ?
  9. 15:04 Pourquoi bloquer le crawl avec robots.txt peut-il nuire à votre indexation ?
  10. 17:48 Les métriques comportementales influencent-elles vraiment le classement Google ?
  11. 29:01 Faut-il vraiment migrer vers HTTPS en même temps qu'un changement de domaine ?
  12. 29:56 Faut-il vraiment migrer son domaine et passer en HTTPS en une seule fois ?
  13. 29:58 Faut-il vraiment éviter de changer la structure d'URL lors d'une migration de site ?
  14. 31:56 Comment contourner le 'not provided' dans Google Analytics pour analyser vos mots-clés SEO ?
  15. 35:57 Les commentaires peuvent-ils vraiment diluer la qualité SEO de votre contenu ?
  16. 36:21 Faut-il vraiment éviter de dupliquer son contenu en interne pour ranker ?
  17. 36:58 Faut-il vraiment noindexer les archives d'auteurs dans WordPress pour éviter le contenu dupliqué ?
  18. 45:31 AMP est-il vraiment un facteur de classement Google ou juste un mythe SEO ?
  19. 51:33 Les backlinks de mauvaise qualité peuvent-ils vraiment nuire à votre référencement ?
  20. 53:26 Faut-il craindre qu'un lien médiocre ne dévalue vos backlinks de qualité ?
  21. 55:53 Faut-il vraiment ignorer la balise lang HTML pour le référencement international ?
  22. 56:03 L'attribut lang HTML influence-t-il vraiment le référencement international ?
  23. 58:52 Comment Google traite-t-il les pages multilingues dans ses résultats de recherche ?
📅
Official statement from (9 years ago)
TL;DR

Google confirms that hreflang is a strong signal for targeting the appropriate language version but not an absolute guarantee. If two international versions have nearly identical content, the engine may decide to merge them into a single canonical URL, ignoring your hreflang directives. The solution: sufficiently differentiate each national version to prevent Google from seeing them as duplicates.

What you need to understand

Is hreflang really a priority signal for Google?

Yes, hreflang remains a strong signal in the geographic and linguistic targeting algorithm. Google actively uses it to determine which version of a page to display to a user based on their language and location. It is not just a consultative hint.

But "strong" does not mean "absolute." Google reserves the right to override your hreflang annotations if contradictory signals are detected. The most common case: two language versions have so similar content that the engine considers it pure duplication.

What happens when Google detects identical content despite hreflang?

Google then applies its own canonicalization process, merging the two URLs into a single reference version. It doesn’t matter if your hreflang tags are technically flawless: if the content is deemed identical, the engine will select a canonical URL and ignore the other in the SERPs.

In practice? You may notice that a French version never appears in France, being consistently replaced by the English or German version. Or worse: Google indexing a random URL, alternating between versions based on its successive crawls. This phenomenon creates SEO instability characteristic of poorly differentiated multilingual sites.

How does Google assess the similarity between two pages?

Google does not communicate the exact similarity threshold that triggers a merge. It is known that it compares the primary textual content, HTML structure, and likely extracted semantic entities. A simple word-for-word translation, especially between closely related languages (FR/ES/IT), can be detected as nearly identical.

Translated metadata (title, meta description) are not enough. The body text must show substantial differences: cultural adaptation, local examples, specific geographical references, rewritten sentence structure, variable length. An unrefined automated translation is a near-certain red flag.

  • Hreflang is a strong signal but conditional, not an absolute directive
  • Google will merge pages if their content is deemed essentially identical, even with correct hreflang
  • Differentiation must focus on main content, not just on metadata or the menu
  • Unrefined automated translations are particularly at risk of merging
  • No official similarity threshold is communicated by Google, complicating evaluation

SEO Expert opinion

Does this statement align with real-world observations?

Absolutely. For years, sites with impeccable hreflang have still suffered from cannibalization between language versions. Audits regularly reveal cases where the .de version appears in France, or the .com version overwhelms all local variants. The problem is almost never the technical implementation of hreflang.

What Mueller confirms here is that Google prioritizes the coherence of its index over your technical directives. If the engine detects two nearly identical pages with different URLs, its anti-duplication reflex takes precedence over hreflang. This is logical from an algorithmic perspective: avoiding pollution of the index with duplicates.

What nuances should be applied to this recommendation?

Mueller says "unique content" but remains vague about the degree of differentiation needed. Is high-quality human translation enough? Is it necessary to rewrite 30% of the text? 50%? No official figures. [To be verified] on your own sites with A/B testing between more or less differentiated versions.

Another nuance: merging can be intermittent or partial. Google does not always treat all your URLs uniformly. You might have 80% of your multilingual pages working correctly, and 20% merging. This likely depends on the crawl budget allocated, update frequency, and internal signal consistency.

In what cases does this rule not fully apply?

For e-commerce sites with standardized product listings, substantial differentiation is nearly impossible. A Nike shoe remains a Nike shoe, whether sold in France or Belgium. The technical description doesn’t fundamentally change. Yet, these sites need separate versions to manage currencies, local stocks, and specific terms and conditions.

In these cases, Google seems to tolerate similarity more if other signals are consistent: ccTLD domain (.fr, .de), local address in the footer, local legal mentions, server geolocation. But this is never guaranteed. [To be verified]: no official documentation details these exceptions.

Caution: simple automatic translation (DeepL, Google Translate) without human enrichment is almost certain to trigger a merge. The syntactic patterns remain too similar to deceive Google’s duplicate detection algorithms.

Practical impact and recommendations

What steps should you take to avoid merging?

The first step: audit your multilingual pages with a content comparison tool. Compare the main text (excluding header/footer/navigation) between your FR, EN, DE versions, etc. If the similarity exceeds 70-80%, you are in the red zone. Tools like Copyscape, Siteliner, or even a basic diff can reveal the problem.

Next, enrich each version with specific local elements: examples rooted in local culture, references to country events or regulations, local customer testimonials, adapted use cases. Do not settle for word-for-word translations. Rewrite entire passages to create genuine semantic differentiation.

How can you check that Google is not merging your pages?

Use Search Console and filter by country/language. If your .fr version receives zero impressions in France while you have French organic traffic, that’s suspicious. Check which URL Google really serves with a VPN or a geographical simulation tool. Compare the canonical URL declared by Google (by inspecting the indexed page) with your hreflang.

Another test: search with the site: operator: target by domain/subdomain. If Google indexes only one version while you have three with hreflang, it means it has merged them. Also, check the server logs: if Googlebot repeatedly crawls only one version out of three, it has probably chosen it as the unique canonical.

What mistakes should you absolutely avoid?

Do not multiply language versions if you lack the resources to differentiate them. It is better to have one high-quality EN version than a site in 12 languages with identical automated translations. Google will penalize the latter approach by merging everything into a random URL.

Avoid also frequent changes to hreflang structure without content adjustment. If you fix your technical implementation without touching the underlying duplicate content, you will resolve nothing. The issue is rarely technical; it is editorial. Finally, do not rely on ccTLDs or subdomains to compensate for identical content: these signals help but do not replace textual differentiation.

  • Audit textual similarity between language versions (max threshold ~70%)
  • Enrich each version with specific cultural and geographical elements
  • Check in Search Console that each version receives impressions in its target country
  • Test with VPN/geosimulation which URL Google actually serves by region
  • Compare the canonical URL chosen by Google with your hreflang directives
  • Prioritize quality over quantity: better to have 3 truly differentiated versions than 10 identical automated translations
Managing a large-scale multilingual site requires sharp expertise: identifying acceptable duplication thresholds, developing a differentiation strategy, and continuous monitoring of Google’s canonicalization. These optimizations are time-consuming and demand a deep understanding of crawl and indexing mechanisms. For complex or strategically international sites, partnering with a specialized SEO agency can secure the investment and prevent costly traffic losses related to undetected URL mergers.

❓ Frequently Asked Questions

Hreflang suffit-il à garantir que Google affiche la bonne version linguistique ?
Non. Hreflang est un signal fort mais Google peut l'ignorer si deux versions présentent un contenu essentiellement identique. Il fusionnera alors les pages en une seule URL canonique, quelles que soient vos annotations.
Quel pourcentage de différence textuelle faut-il entre deux versions linguistiques ?
Google ne communique aucun seuil officiel. D'après les observations terrain, rester sous 70-80 % de similarité réduit fortement le risque de fusion. La différenciation doit porter sur le contenu principal, pas seulement les métadonnées.
Une traduction humaine de qualité est-elle suffisante pour éviter la fusion ?
Pas toujours. Entre langues proches (FR/ES/IT), même une traduction humaine peut produire des structures sémantiques trop similaires. Il faut enrichir avec des exemples locaux, adapter les tournures, varier la longueur des sections.
Comment savoir si Google a fusionné mes pages multilingues ?
Vérifiez Search Console : si une version ne reçoit aucune impression dans son pays cible, c'est suspect. Utilisez aussi l'opérateur site: et comparez l'URL canonique déclarée par Google avec vos directives hreflang.
Les domaines ccTLD (.fr, .de) protègent-ils mieux contre la fusion que les sous-domaines ?
Ils aident en renforçant le signal géographique, mais ne compensent pas un contenu identique. Google peut toujours fusionner deux ccTLD si le contenu est jugé dupliqué. La différenciation éditoriale reste prioritaire.
🏷 Related Topics
Domain Age & History Content AI & SEO International SEO

🎥 From the same video 23

Other SEO insights extracted from this same Google Search Central video · duration 58 min · published on 04/11/2016

🎥 Watch the full video on YouTube →

Related statements

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.