Does Google really compare the initial HTML AND rendered content for canonicalization?

Quick SEO Quiz

Test your SEO knowledge in 5 questions

Less than a minute. Find out how much you really know about Google search.

🕒 ~1 min 🎯 5 questions

Official statement

Canonicalization and deduplication start with the initial HTML but also consider the rendered HTML. Google compares the content hashes of the initial HTML and the rendered HTML. If the hashes differ after rendering, Google uses the signals from the rendering for canonicalization.

30:00

🎥 Source video

Extracted from a Google Search Central video

⏱ 46:02 💬 EN 📅 25/11/2020 ✂ 29 statements

Watch on YouTube (30:00) →

✂ Other statements from this video 28 ▾

📅

Official statement from November 25, 2020 (5 years ago)

⚠ A more recent statement exists on this topic Should You Remove Links That Are Only Present in the Initial HTML? Martin Splitt · March 24, 2021 View statement →

TL;DR

Google does not simply rely on the initial HTML to decide which version of a page to index. The engine generates hashes of the HTML before and after JavaScript rendering, then compares them. If these hashes differ, the signals from the rendering take precedence in the canonicalization process. In practical terms, your canonical tags injected via JS can therefore override those in the raw HTML.

What you need to understand

What is a content hash and why does Google use it?

A content hash is a unique digital fingerprint generated from the HTML code of a page. Google calculates this signature on the initial HTML (the one served by the server) and on the rendered HTML (after client-side JavaScript execution). If the two hashes are identical, the engine considers that the rendering does not add anything new.

But as soon as the hashes differ, Google knows that JavaScript has significantly altered the DOM. At this point, the engine switches and uses the signals from the rendered HTML to decide which URL to canonicalize. This statement confirms that rendering is not just a cosmetic step — it is a referee in deduplication.

Why does the distinction between initial HTML and rendered HTML change everything?

For years, SEO practitioners have been advised to place canonical tags in the initial HTML to avoid relying on JS execution. This advice remains valid for performance, but this statement seriously nuances the picture. If your framework (React, Vue, Angular) injects or modifies a canonical tag after rendering, Google may very well take it into account.

Let's be honest: this flexibility opens the door to mistakes. A canonical tag present in the initial HTML can be overwritten by JS, and if Google crawls the rendered version, it’s this second tag that prevails. The result: canonicalized URLs that go against your intentions.

When does rendering really influence canonicalization?

Typically, Single Page Applications (SPAs) and headless sites are primarily affected. These architectures often serve a skeleton HTML, then build the entire content on the client side. If your textual content, meta tags, or canonicals only exist after rendering, Google has no choice but to wait for JS execution to calculate the final hash.

Multi-variant e-commerce sites are also affected. Imagine a product page with color variants managed in JS: if each color modifies the URL and the visible content, the hashes will diverge. Google will then have to decide which version to index based on the rendering, not on the raw HTML which remains identical for all variants.

Content hashes allow Google to detect if the JS rendering has substantially altered the DOM.
When the hashes differ, the signals from the rendered HTML (canonical, hreflang, structured data) take precedence over those from the initial HTML.
Modern JS frameworks (React, Vue, Next.js) can inject or modify tags after rendering, directly influencing canonicalization.
SPA and headless sites are particularly exposed since their content often only exists after JS execution.
A canonical tag in the initial HTML can be overwritten by a tag injected via JS if Google crawls the rendered version.

SEO Expert opinion

Is this statement consistent with real-world observations?

Yes and no. It has long been observed that Google indexes content generated by JavaScript, so the idea that it compares initial and rendered HTML is not a revelation. However, the official confirmation of the hashing mechanism brings welcome clarity. Until now, it was assumed that Google could 'see' the rendered output, without knowing exactly how it arbitrated between the two versions.

On the other hand, Martin Splitt remains vague on timing. How long does Google wait before considering rendering as stable? What crawl budget is allocated to rendering versus initial HTML? These questions remain unanswered. [To verify]: it is impossible to know if Google systematically recalculates hashes with every crawl or if it caches certain fingerprints to save resources.

What risks does this hashing logic pose for JS-heavy sites?

The main danger is the inconsistency between intentions and reality. You think you've canonicalized a URL in the initial HTML, but a third-party script (misconfigured tag manager, A/B test running in the background) alters the DOM afterwards. Google calculates a new hash, and boom: your canonical changes without your consent.

Another trap: sites that load different content based on geolocation or device. If the initial HTML is identical but the JS injects device-specific content, Google will detect a hash divergence. It may then canonicalize the mobile version when you intended to prioritize the desktop, or vice versa. This is particularly insidious on AMP sites where rendering can vary greatly.

Should we give up the initial HTML in favor of rendering for canonicalization?

No, and in fact, it’s the opposite. This statement does not say 'trust the JS'; it says 'Google considers rendering when necessary'. The best practice remains to serve all critical signals in the initial HTML: canonical, hreflang, structured data, textual content. This way, there’s no latency related to rendering, no risk of JS failure, and no dependence on the crawl budget of the Caffeine rendering queue.

But — and here's where it gets tricky — if your architecture does not allow it (headless commerce, pure SPA), this statement at least gives you the assurance that Google can read your post-render signals. It’s a safety net, not an excuse to neglect the initial HTML. [To verify]: no public data confirms that Google systematically crawls all pages in rendered mode, especially on large sites with a limited crawl budget.

Practical impact and recommendations

What should you prioritize checking on your own site?

Start by audiiting the canonical tags present in the initial HTML versus those present after rendering. Use Chrome DevTools or Screaming Frog in JavaScript mode to compare. If you notice divergences, identify which script is causing them (often a tag manager, a JS framework, or a poorly configured WordPress plugin). Correct at the source: the canonical should be consistent on both sides.

Next, examine high-traffic or strategic pages (key product sheets, SEO landing pages). Inspect them in Search Console with the 'URL Inspection' tool: Google shows you the version it indexed. If the displayed content differs from the initial HTML, it means rendering has taken precedence. Then check that the signals from the rendering are indeed what you want to convey.

How to prevent JavaScript from sabotaging canonicalization?

Golden rule: never inject or modify a canonical tag in JavaScript unless absolutely necessary. If your site is an SPA and you have no choice, ensure that server-side rendering (SSR or static pre-rendering) generates the tags before sending them to the client. Next.js, Nuxt.js, and others do this natively — leverage them.

For WordPress sites or traditional CMSs, disable plugins that manipulate the DOM afterwards to 'optimize' the canonicals. Some automated SEO tools add or modify these tags via JS, thinking they are doing the right thing, while they create a hash divergence. Always prefer a server-side modification (theme files, PHP hooks).

What tools to use to monitor HTML initial/rendered divergences?

When crawling, Screaming Frog in JavaScript mode enabled allows you to compare the two states. Configure a crawl without JS, then a second one with JS, and export the canonicals from both. Any discrepancies = red alert. OnCrawl and Botify offer similar features, with visual dashboards that facilitate spotting.

For ongoing monitoring, Google Search Console remains your best ally. The 'Coverage' tab and the 'URL Inspection' tool show you what Googlebot actually saw. If strategic pages are excluded or indexed with an unexpected canonical, it’s often an indication that rendering has taken precedence. Cross-reference this data with server logs (crawl budget, user-agent Googlebot) to get a complete picture.

Compare canonical tags in the initial HTML and after rendering with Chrome DevTools or Screaming Frog
Inspect strategic pages in Search Console (URL inspection tab) to verify the version indexed by Google
Eliminate third-party scripts (tag managers, plugins) that modify the DOM and can cause hash divergences
Prioritize server-side rendering (SSR) or static pre-rendering for SPAs to serve critical signals in the initial HTML
Set up ongoing monitoring (Search Console + regular crawls) to detect canonicalization anomalies
Document the technical architecture (JS frameworks, rendering method) to anticipate impacts on canonicalization

Canonicalization based on hash comparison between initial and rendered HTML imposes increased technical rigor. Each divergence is a potential risk of unintentional canonicalization. Auditing, correcting, and monitoring become recurring tasks, especially on modern JS architectures. These optimizations can quickly become complex to orchestrate internally, between dev, marketing, and SEO teams. If you lack resources or expertise on these subjects, turning to an SEO agency specialized in JavaScript architectures can save you precious time and secure your indexing.

❓ Frequently Asked Questions

Google recalcule-t-il les hash à chaque crawl ou met-il en cache certaines empreintes ?

Google ne l'a jamais précisé officiellement. On peut supposer qu'il met en cache les hash pour économiser le budget crawl, mais aucune donnée publique ne le confirme. Les pages fréquemment modifiées sont probablement recalculées plus souvent.

Une balise canonical ajoutée en JavaScript après le chargement initial sera-t-elle prise en compte par Google ?

Oui, si Google exécute le JavaScript et que le hash du HTML rendu diffère de celui du HTML initial. Mais cela dépend du budget crawl alloué au rendu et de la stabilité du DOM au moment où Googlebot prend son snapshot.

Si le HTML initial et le HTML rendu ont le même hash, Google ignore-t-il complètement le rendu ?

Probablement. Si les hash sont identiques, Google n'a aucune raison de privilégier le rendu. Il utilisera alors les signaux du HTML initial, ce qui économise des ressources de traitement.

Les hreflang et structured data injectés en JavaScript sont-ils concernés par ce mécanisme de hash ?

Oui. La déclaration parle de canonicalisation mais le principe s'applique à tous les signaux : si le rendu modifie le DOM (hreflang, JSON-LD, balises meta), Google compare les hash et peut privilégier la version rendue.

Comment savoir si Google a indexé la version HTML initial ou rendu de ma page ?

Utilise l'outil « Inspection d'URL » de la Search Console. Google affiche le HTML tel qu'il l'a crawlé et rendu. Compare-le avec ton HTML initial pour repérer les divergences.

🏷 Related Topics

canonicalisation HTML rendu JavaScript SEO déduplication hash contenu crawl budget indexation SPA

Content Crawl & Indexing AI & SEO

🎥 From the same video 28

Other SEO insights extracted from this same Google Search Central video · duration 46 min · published on 25/11/2020

🎥 Watch the full video on YouTube →

Related statements

« Previous

Launch Date of the Page Experience Ranking Factor...

JavaScript Rendering and Google's Decision...

« Back to results