What does Google say about SEO? /
Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

Google uses over 20 different signals to determine which page should be selected as the canonical URL in a cluster of duplicates. These signals include content, PageRank, HTTPS, sitemaps, and redirects.
12:44
🎥 Source video

Extracted from a Google Search Central video

⏱ 29:01 💬 EN 📅 10/12/2020 ✂ 11 statements
Watch on YouTube (12:44) →
Other statements from this video 10
  1. 8:01 Faut-il vraiment 3000 mots pour bien se classer dans Google ?
  2. 9:01 Comment Google détecte-t-il vraiment les contenus dupliqués avec les checksums ?
  3. 9:03 Google ignore-t-il vraiment votre navigation et vos footers pour détecter les doublons ?
  4. 10:34 Comment Google regroupe-t-il vos pages en clusters de doublons avant de choisir la canonique ?
  5. 13:17 Le PageRank influence-t-il toujours la sélection des URLs canoniques ?
  6. 13:47 La balise canonical peut-elle vraiment être ignorée par Google ?
  7. 14:49 Les redirections écrasent-elles vraiment le signal HTTPS dans le choix de l'URL canonique ?
  8. 15:22 Comment Google pondère-t-il vraiment les signaux de canonicalisation ?
  9. 17:31 La canonicalisation impacte-t-elle vraiment le classement dans Google ?
  10. 22:16 Google lit-il vraiment vos feedbacks sur sa documentation SEO ?
📅
Official statement from (5 years ago)
TL;DR

Google combines over 20 distinct signals to decide which page becomes the canonical URL in a cluster of duplicates. PageRank, HTTPS, sitemaps, redirects, and content are among the explicitly mentioned criteria. For SEOs, this means that imposing a canonical tag is never enough — Google maintains control and weighs all available clues to make a decision.

What you need to understand

What is a duplicate cluster and why does Google need to choose?

When several URLs display identical or very similar content, Google groups them into a duplicate cluster. This phenomenon occurs more often than one might think: HTTP/HTTPS variants, with or without www, tracking parameters, paginated versions, facet filters, syndicated content.

Instead of indexing each variant, the engine selects only one as the canonical URL — the one it judges most relevant for its users. The other URLs in the cluster remain known, sometimes crawled, but do not participate in ranking. This is an economic arbitration for Google and a consolidation of signals for SEO.

What are the 20+ signals mentioned in concrete terms?

Gary Illyes cites five explicit examples: content, PageRank, HTTPS, sitemaps, and redirects. The rest remains vague — deliberately. We can infer that Google also looks at internal consistency (massive internal links pointing to one variant), content freshness, hreflang tags, URL structure, Core Web Vitals, and the crawl volume allocated to each variant.

The order of priority is never revealed. One signal can weigh heavily in one context and become negligible in another. [To be verified] Google claims to use over 20 signals, but it's impossible to list them all or to know their relative weights — total opacity on internal scoring.

Why is the canonical tag just one signal among many?

Many SEOs still believe that placing a canonical tag is enough to impose their choice. This is a mistake. Google treats it as a suggestion, never as a directive. If other signals contradict this tag — for example, an alternative variant accumulates more PageRank or receives more external links — Google can ignore the tag.

This is consistent with the engine's logic: it never fully delegates the decision to a webmaster, who might misconfigure or attempt to manipulate. Canonicalization remains an algorithmic process, where every clue counts but none decides alone.

  • Google groups duplicate content into clusters and chooses only one canonical URL per cluster
  • Over 20 different signals are weighed — content, PageRank, HTTPS, sitemaps, and redirects explicitly mentioned
  • The canonical tag is one signal among others, never an absolute directive
  • The order of priority and exact scoring remain opaque, varying by context
  • Google always keeps final control: it is impossible to force its choice 100%

SEO Expert opinion

Is this statement consistent with real-world observations?

Yes, largely. It is regularly observed that Google ignores poorly placed or contradictory canonical tags. A site that declares page A as canonical but redirects all its link juice to page B will often see B indexed. The same is true for sites that place canonical tags on active pages but neglect their XML sitemap — Google then prioritizes the version present in the sitemap.

The weight of PageRank in the canonical selection is empirically confirmed: a variant that accumulates quality backlinks often gets chosen, even if it is not the one the webmaster wanted. The effect of HTTPS also plays a role — since 2014, Google systematically favors the secure version when one exists.

What nuances should be added to this statement?

First point: [To be verified] the figure "more than 20 signals" remains unverified. It is impossible to know if Google uses 22, 50, or 150. This is a deliberately vague communication to discourage any attempt to game the system.

Second nuance: not all signals carry the same weight. HTTPS and 301/302 redirects are strong signals, almost imperative. A well-configured 301 redirect almost always overrides other clues. Conversely, a simple internal link to a variant weighs little if everything else points elsewhere.

Third point: consistency matters more than any isolated signal. A site that sends contradictory messages — canonical to A, sitemap with B, redirects to C — leaves Google to decide alone, often in unpredictable ways. The algorithm looks for consensus; if it does not find it, it applies its own rules.

In what cases can this rule cause issues?

Sites with facets suffer immensely. Each filter combination generates a unique URL — color, size, price. Google must then choose among hundreds of variants which one it will index. If the webmaster does not properly mark it up (canonical + robots), Google risks indexing low-value pages at the expense of the main categories.

Another critical case: content syndication. An article published on multiple partner sites creates multiple duplicates. Google will try to identify the original via freshness, links, and domain authority. But if all signals are equivalent, the engine may make a mistake and canonize towards a syndicator rather than the source.

Attention: On multilingual sites, the absence of hreflang combined with roughly translated content can cause anomalous canonicalizations — Google merging distinct language versions into the same cluster.

Practical impact and recommendations

What actions should be taken to master canonicalization?

First, audit the existing setup. Use Google Search Console, under Indexing > Pages, to identify URLs that Google has canonized differently than intended. Compare declared canonical URLs with those actually indexed. Any discrepancy indicates a conflict of signals.

Next, align your signals. If you want to enforce URL A as canonical, make sure that internal links massively point to A, that the XML sitemap lists only A, that 301 redirects converge to A, and that the canonical tag correctly points to A. Multi-signal consistency almost always overrides shaky configurations.

What mistakes should absolutely be avoided?

Never point a canonical tag to a URL that is 404 or 301 — Google ignores these tags and chooses arbitrarily. Never chain canonicals (A → B → C) — Google usually stops at the first jump or even ignores the entire chain.

Avoid also missing self-referencing canonicals. Every page should carry a canonical tag pointing to itself if it is the preferred version, or to the correct variant otherwise. The absence of a tag leaves Google to guess, with the risks that entails.

Another common pitfall: declaring a paginated page (page=2) canonical to page 1. Google does not like this — it considers that each page in a paginated series has its own distinct content. Preferably use rel=prev/next (obsolete but still observed) or let each page auto-canonicalize itself.

How can I check that my site is correctly configured?

Run a Screaming Frog or Oncrawl crawl while following the canonicals. Identify loops, chains, and broken canonicals. Ensure that 100% of strategic URLs carry a consistent canonical tag.

Then cross-check with Search Console: export the list of indexed URLs, compare it with your XML sitemap. Any discrepancy warrants investigation — either Google preferred another variant, or a contradictory signal is disrupting the algorithm. Dig into server logs to see which URLs Googlebot crawls the most: this is often an indirect indicator of the chosen canonical.

  • Audit discrepancies between declared canonical and indexed URL using Search Console
  • Align all signals: internal links, sitemap, redirects, canonical tag
  • Avoid canonical chains, canonicals pointing to 404/301, and absence of tag on important pages
  • Crawl the site to detect inconsistencies, loops, broken canonicals
  • Cross-reference Search Console data, XML sitemap, and server logs to identify variants preferred by Google
  • Prioritize strategic pages — categories, key product sheets, SEA landing pages
Mastering canonicalization relies on multi-signal consistency: no isolated clue is sufficient; it is the overall alignment that convinces Google. Canonical tags, redirects, sitemaps, internal links, and HTTPS must all point in the same direction. Regular audits via Search Console and crawlers are essential to detect deviations. Let’s be honest: on complex sites — e-commerce with facets, multilingual, high volumes — this orchestration can quickly become technical. If the analysis reveals structural inconsistencies or persistent anomalous canonicalizations, engaging a specialized SEO agency can accelerate resolution and secure strategic indexing.

❓ Frequently Asked Questions

Google respecte-t-il toujours la balise canonical que je déclare ?
Non, jamais à 100%. Google traite la balise canonical comme une suggestion forte, mais il la pondère avec plus de 20 autres signaux. Si d'autres indices contredisent votre balise, Google peut l'ignorer.
Quel signal pèse le plus lourd dans le choix de l'URL canonique ?
Impossible à quantifier précisément. Les redirections 301 et HTTPS sont des signaux très forts, presque impératifs. Le PageRank et la cohérence interne jouent aussi beaucoup. L'ordre de priorité reste opaque et contextuel.
Comment savoir quelle URL Google a canonisée sur mon site ?
Utilisez Google Search Console, section Indexation > Pages. Google y affiche l'URL canonique choisie pour chaque variante crawlée. Comparez avec vos balises canonical déclarées pour repérer les écarts.
Peut-on forcer Google à indexer une URL spécifique malgré des doublons ?
Vous pouvez fortement l'influencer en alignant tous les signaux — balise canonical, sitemap XML, liens internes, redirections 301 — vers cette URL. Mais Google garde toujours la main finale.
Que faire si Google canonise vers une mauvaise URL malgré ma balise canonical ?
Auditez les autres signaux : liens internes, sitemap, redirections, présence dans le sitemap. Corrigez les incohérences. Si tout est aligné et que Google persiste, supprimez ou désindexez les variantes concurrentes via robots.txt ou noindex.
🏷 Related Topics
Domain Age & History Content Crawl & Indexing HTTPS & Security Links & Backlinks Domain Name Redirects Search Console

🎥 From the same video 10

Other SEO insights extracted from this same Google Search Central video · duration 29 min · published on 10/12/2020

🎥 Watch the full video on YouTube →

Related statements

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.