How does Google decide which version of duplicate content to show in the SERPs?

Official statement

Google typically selects one version of duplicate content to display in search results to avoid saturating the results with the same content. The chosen version depends on various factors, including the authority of the site.

3:18

🎥 Source video

Extracted from a Google Search Central video

⏱ 59:04 💬 EN 📅 28/07/2016 ✂ 9 statements

Watch on YouTube (3:18) →

✂ Other statements from this video 8 ▾

1:49 Pourquoi l'implémentation d'AMP gonfle artificiellement votre trafic direct dans Analytics ?
2:16 Comment récupérer efficacement un site pénalisé par une action manuelle Google ?
5:44 Le tag Last-Modified suffit-il vraiment pour faire découvrir vos nouveaux contenus par Google ?
8:29 Le marquage schema garantit-il vraiment l'affichage des résultats enrichis ?
11:35 Cacher des liens aux robots d'exploration est-il vraiment du cloaking ?
16:14 Google crawle-t-il vraiment tous les liens JavaScript sur votre site ?
16:22 Le contenu caché impacte-t-il vraiment votre classement SEO ?
55:49 Le Video Object Schema sur AMP peut-il vraiment propulser vos vidéos dans Top Stories ?

What you need to understand

Why does Google filter duplicate content in results?

The primary goal is user experience. Nobody wants to see the same page repeated ten times in the SERPs. Therefore, Google applies a filter to present only one version among detected duplicates.

This mechanism is not a penalty. No page is punished for duplication, but only one will be visible. The others remain indexed, merely excluded from the results for that specific query.

What criteria does Google use to decide between multiple versions?

The statement mentions site authority as a distinguishing factor, but remains deliberately vague on other criteria. It can be assumed that PageRank, domain age, the quality of internal linking, and the consistency of canonical signals play a role.

In practice, a site with strong domain authority often eclipses versions hosted on weaker domains, even if the latter published first. The timing of publication alone does not guarantee anything.

How can I detect if my content has been overshadowed by a duplicate?

An URL can be indexed without appearing in the SERPs for its target keywords. The site: test followed by a search for exact phrase helps you see which version Google favors.

If a competitor, aggregator, or scraper displays your content in your place, it means Google deemed it more legitimate. Canonical tags and syndication signals may influence this choice, but nothing is guaranteed.

Google filters, it does not penalize: duplicates remain indexed but invisible.
Domain authority takes precedence: a strong site overshadows a weak site on the same content.
Technical signals matter: canonical, hreflang, sitemaps guide the choice but do not control it entirely.
Scraping can steal your visibility: if a third party replicates your content on a more authoritative domain, it can outrank you.
Manual verification is essential: do not rely solely on tools, test it yourself with targeted searches.

SEO Expert opinion

Is this statement consistent with field observations?

Overall, yes. Practitioners have observed for years that authority outweighs primacy. A small site publishing original content can be overshadowed by an aggregator or a more powerful media outlet that takes it a few hours later.

The ambiguity remains regarding the exact weighting of the criteria. Google talks about "various factors" without detailing, leaving room for interpretation. [To verify]: no public data specifies the respective weight of authority, internal links, freshness, or syndication signals in this equation.

What practical cases complicate this rule?

Multilingual sites pose problems. If you translate content into multiple languages without correctly configured hreflang, Google may arbitrarily choose a language version for a given geolocation, even if it does not match the user's language.

Legitimate syndications also create confusion. If you publish on Medium, LinkedIn, or a media partner, even with a canonical pointing to your site, there is no guarantee that Google will prioritize your version. The authority of the third-party platform can prevail.

Should you always block scrapers to avoid cannibalization?

Theoretically, if a scraper replicates your content, Google should identify the original and favor it. In practice, if the scraper is hosted on a high-authority domain with good linking, it can take precedence.

Blocking scrapers remains a good defensive practice, but it does not resolve the issue of legal reproductions like press releases distributed on third-party platforms. There, only a strategy of canonical signals and strong internal links can help. [To verify]: Google does not document how it arbitrates between a legitimate source and an authorized reproduction when both have similar authority.

Practical impact and recommendations

What should you do to maximize the chances that Google chooses your version?

First, consolidate your domain authority. This involves having a quality inbound link profile, a clean technical architecture, and a consistent publishing history. The more your domain inspires trust in Google, the more it will favor your pages in cases of duplicates.

Next, use canonical tags rigorously. If you syndicate content or allow reproductions, insist that third parties place a canonical to your original URL. This guides Google, even though it is not an absolute guarantee.

What technical errors increase the risk of cannibalization?

Conflicting canonicals are poison. If multiple versions of a page exist (http/https, with/without www, UTM parameters) and each one points to itself as canonical, Google has to decide on its own and may get it wrong.

Polluted sitemaps complicate matters as well. If you submit all URL variants in your sitemap, you send a confusing signal. Only submit the canonical versions you want to see indexed.

How to audit and fix an internal duplication problem?

A complete crawl with Screaming Frog or Oncrawl can detect internal duplicates: paginated pages, e-commerce filters, syndicated content. Each duplicate should point via canonical to the reference version or be excluded from the crawl via robots.txt if it has no SEO value.

For external duplications, monitor your content with Copyscape or Google alerts. If a third party takes your content without a canonical, reach out to them or request a link to your source. In the case of malicious scraping, a DMCA can force removal.

Audit all internal duplicates and verify that each variant points to a clear canonical.
Clean the sitemap to only submit canonical URLs.
Correctly configure hreflang on multilingual sites to avoid geographical confusion.
Monitor external reproductions with duplication detection tools and demand canonicals or links.
Strengthen overall domain authority through a qualitative link building strategy and differentiated content.
Manually test the visibility of your pages with exact phrase searches to identify eclipses.

Managing duplicate content requires a rigorous technical approach and constant monitoring. Between canonicals, hreflang, sitemaps, and domain authority, there are many levers, and their orchestration can prove complex. If your site undergoes recurring cannibalization or if you manage a multilingual or e-commerce environment with thousands of pages, the expertise of a specialized SEO agency can expedite diagnosis and correction, helping you avoid costly visibility errors.

❓ Frequently Asked Questions

Est-ce qu'une page dupliquée est pénalisée par Google ?

Non, Google ne pénalise pas les contenus dupliqués. Il en filtre simplement une version dans les SERP pour éviter la redondance. Les autres restent indexées mais invisibles pour cette requête.

Comment savoir quelle version de mon contenu Google affiche ?

Effectuez une recherche par phrase exacte entre guillemets ou utilisez l'opérateur site: suivi de votre domaine. Comparez ensuite avec les versions tierces pour voir laquelle apparaît en premier.

Un scraper peut-il voler ma visibilité même si j'ai publié en premier ?

Oui, si le scraper est hébergé sur un domaine à forte autorité. Google privilégie l'autorité sur l'antériorité dans ses arbitrages entre doublons.

La balise canonical suffit-elle à garantir que ma version sera choisie ?

Non, c'est un signal fort mais pas absolu. Google peut l'ignorer si d'autres critères (autorité, liens, cohérence) pointent vers une autre version.

Faut-il bloquer l'indexation des pages paginées ou filtrées en e-commerce ?

Pas nécessairement. Utilisez plutôt des canonicals pointant vers la page de référence. Bloquer l'indexation empêche Google de comprendre la structure de votre catalogue.

🎥 From the same video 8

Other SEO insights extracted from this same Google Search Central video · duration 59 min · published on 28/07/2016

🎥 Watch the full video on YouTube →