How does Google decide which page to display when it finds duplicate content?

Quick SEO Quiz

Test your SEO knowledge in 5 questions

Less than a minute. Find out how much you really know about Google search.

🕒 ~1 min 🎯 5 questions

Official statement

When Google detects two identical pages, it tries to display only one in the search results. The decision relies on factors like the canonical tag, redirects, and links to determine which version to show.

5:34

🎥 Source video

Extracted from a Google Search Central video

⏱ 1h07 💬 EN 📅 05/05/2017 ✂ 8 statements

Watch on YouTube (5:34) →

✂ Other statements from this video 7 ▾

📅

Official statement from May 5, 2017 (9 years ago)

⚠ A more recent statement exists on this topic Does duplicate content really hurt your SEO? Google · October 13, 2022 View statement →

TL;DR

When Google detects two identical pages, it shows only one in the results. The decision is based on canonical signals, redirects, and inbound links. For SEO professionals, this means that careful management of canonical tags and internal architecture is essential to control which version gets indexed.

What you need to understand

Why does Google only show one version of duplicate content?

Google aims to maximize relevance in its results. Displaying the same page multiple times offers no value to the user. The algorithm detects identical or very similar content and performs automatic filtering to keep only one occurrence in the SERPs.

This process is not a penalty. It is a default consolidation. Google does not penalize duplicates; it manages them. The search engine allocates ranking signals to the version it considers most legitimate and then hides the others. The discarded pages remain indexed but invisible in the standard results.

What signals does Google use to differentiate between versions?

Mueller mentions three main levers: canonical tags, 301/302 redirects, and link profiles. The rel=canonical tag explicitly indicates which URL to prioritize. If it points to page A, Google generally follows this instruction, unless there's a clear inconsistency.

Permanent or temporary redirects also guide the decision. A 301 redirect to URL B clearly indicates that B is the official version. Inbound links provide an external validation: if 95% of backlinks point to /page-a/ and 5% to /page-b/, Google interprets /page-a/ as the reference version.

Does this logic apply to all types of duplication?

No, and this is where it gets complicated. Mueller's statement primarily concerns technical duplications: www vs non-www, HTTP vs HTTPS, parameterized URL variants, poorly managed pagination. These cases are relatively simple to resolve via canonical tags or redirects.

Editorial duplications — similar content across multiple thematic pages, nearly identical product listings, and media releases — fall under a different mechanism. Google tries to detect the original source via the indexing date, domain authority, and citations. However, this detection is not foolproof, especially if a major aggregator picks up your content before Google crawls your own page.

Well-implemented Canonical: 85-90% chance that Google will respect your choice of preferred version
301 Redirects: near-total transfer of PageRank (95-99%) to the target URL
Inbound Links: cumulative trust signal that strengthens the most linked version
Signal Consistency: if canonical tags, redirects, and links point to different versions, Google arbitrates according to its own algorithm
Edge Cases: cross-domain duplication, syndicated content, and extensive scraping require specific strategies (cross-domain canonical, syndication-source tag)

SEO Expert opinion

Is this statement consistent with real-world observations?

Yes, overall. Empirical tests show that Google mostly respects well-defined canonicals, especially on sites with a good crawl budget. On domains with high authority, the compliance rate approaches 90%. In contrast, on newer or less linked sites, Google sometimes makes arbitrary decisions, indexing the undesired version despite an explicit canonical. [To be verified]: Google never reveals the authority or trust threshold at which it systematically follows canonicals.

301 redirects remain the strongest signal. A well-configured redirect usually overrides other signals. But be careful: Google can ignore a chain of redirects that is too long (3+) or manage loops poorly. In such cases, it sometimes indexes an intermediate URL or entirely stops crawling the affected section.

What nuances should be considered regarding this claim?

Mueller intentionally simplifies. In reality, Google applies a probabilistic logic, not binary. When signals are consistent — canonical + 301 + inbound links to the same URL — the engine almost always follows. When they diverge, a weighting algorithm decides, and this weight varies according to the domain, theme, and site history.

A concrete example: an e-commerce site with 50,000 product listings often generates parameterized URL variations (?color=red, ?size=M, ?sort=price). If each listing correctly declares its canonical to the base URL, Google consolidates. But if 20% of the pages forget the canonical, or if filters create infinite combinations, Google may index hundreds of undesirable variations. I've seen sites lose 40% of their organic traffic because Google massively indexed filtered URLs at the expense of the main pages.

In which cases does this rule not apply as expected?

First case: cross-domain duplication. If your content is reused on a third-party site with more authority than yours, Google may index their version instead of yours, even if you published it first. The cross-domain canonical (rel=canonical pointing to your domain from theirs) exists, but few sites adhere to it.

Second case: pagination and faceted filters. Google tries to automatically detect the structure, but modern JS implementations (SPA, React, Next.js) muddy the waters. If the URLs change on the client side without the server sending consistent HTTP signals, Google sometimes indexes inconsistent intermediate states.

Practitioner Alert: Never rely solely on Google validation tools (Search Console, URL Inspection tool). They show the canonicalized URL at the time of the test, but not necessarily the one Google displays in production. Check in real conditions through site: and inurl: queries over several weeks. I have documented cases where Search Console indicated a respected canonical while SERPs displayed the wrong version for 3 months.

Practical impact and recommendations

What should be done to effectively control the indexed version?

Start with a comprehensive technical audit of your canonical tags. Use Screaming Frog or Oncrawl to extract all declared canonicals and verify their consistency. Each page should point to itself (self-referencing canonical) or to the master version if it's a variant. No chains, no loops, no canonical pointing to a 404 or a redirect.

Next, map your 301 redirects. Any technical duplicate (www/non-www, HTTP/HTTPS, trailing slash) must redirect to a unique version. Test redirect chains: A → B → C should become A → C directly. Google rarely follows beyond two hops.

What mistakes must be absolutely avoided?

First mistake: contradictory canonical. I've seen a site declare rel=canonical to /page-a/ in the HTML and to /page-b/ in the HTTP header. Google indexed /page-c/, a third variant totally ignored in the declarations. Result: 6 months of traffic divided by two before we identified the issue.

Second mistake: forgetting separate mobile versions. If you still use M-dot (m.example.com), each mobile page should declare a canonical to the desktop version, and vice versa through the alternate annotation. Otherwise, Google indexes both, splitting your signals and displaying one or the other randomly depending on the search context.

How can I check that my site is compliant and optimized?

Use the Search Console: Coverage tab, filter for "Excluded — Duplicate, alternate page with appropriate canonical tag." This status indicates that Google has detected and consolidated your duplicates. If the volume is consistent with your architecture (filters, pagination), that’s a good sign. If it spikes suddenly, investigate.

Run queries site:example.com inurl:parameter to detect parameterized URLs indexed despite your canonicals. If you find hundreds when everything is supposed to be canonicalized, it means Google hasn't consolidated. Also check queries intitle:"exact title of your page" to spot multiple indexed versions.

Screaming Frog audit: 0 canonical in chains, 0 canonical to 404, 100% HTML/HTTP consistency
301 redirects: all technical variants redirect to a unique URL, with no chains
Search Console: "Excluded — Duplicate" volume stable and consistent with site architecture
Test site:example.com inurl:? : no indexed parameterized URLs if canonicalized filters
Link profile: 90+% of backlinks point to canonical versions, not variants
Monthly monitoring: automatic alert if the volume of indexed URLs increases sharply (sign of new indexed variants)

Controlling the version indexed by Google relies on signal consistency: canonical tags, redirects, internal architecture, and link profiles must all point in the same direction. A rigorous audit and regular monitoring are essential. These technical optimizations can quickly become complex on large sites or hybrid architectures (JS rendering, internationalization, multi-domains). If you manage an e-commerce catalog with thousands of pages or a high-traffic platform, hiring a specialized SEO agency can save you months of diagnostics and avoid costly mistakes that directly impact your visibility.

❓ Frequently Asked Questions

Google pénalise-t-il les sites avec du contenu dupliqué ?

Non, Google ne pénalise pas le contenu dupliqué sauf manipulation intentionnelle. Il filtre simplement les doublons pour n'afficher qu'une version, ce qui peut réduire la visibilité si la mauvaise version est choisie.

La balise canonical suffit-elle à gérer tous les cas de duplication ?

Non. La canonical gère bien les duplications techniques internes, mais reste un signal, pas une directive absolue. Pour les duplications cross-domaine ou éditoriales, d'autres stratégies sont nécessaires (syndication-source, originalité du contenu, vitesse d'indexation).

Que faire si Google indexe la mauvaise version malgré mon canonical ?

Vérifiez la cohérence canonical/redirections/liens, testez avec l'outil d'inspection d'URL, renforcez les signaux internes (maillage vers la bonne version), et en dernier recours utilisez une 301 pour forcer la consolidation.

Les paramètres d'URL (UTM, filtres) créent-ils systématiquement du contenu dupliqué ?

Oui si non gérés. Les paramètres de tracking (UTM) et de filtres génèrent des URL distinctes pour le même contenu. Une canonical bien configurée ou une gestion via robots.txt/Search Console empêche l'indexation de ces variantes.

Comment savoir quelle version Google a choisi d'indexer pour mon contenu ?

Utilisez l'outil d'inspection d'URL dans la Search Console, qui affiche l'URL canonique reconnue par Google. Complétez par des requêtes site: et intitle: pour vérifier en condition réelle dans les SERP.

🏷 Related Topics

contenu dupliqué canonical redirections 301 indexation consolidation URLs crawl budget architecture SEO backlinks

Domain Age & History Content Crawl & Indexing AI & SEO Links & Backlinks Local Search Redirects

🎥 From the same video 7

Other SEO insights extracted from this same Google Search Central video · duration 1h07 · published on 05/05/2017

🎥 Watch the full video on YouTube →

Related statements

« Previous

Redirects and URL Changes...

Impact of CTR on Ranking...

« Back to results