Why does Google ignore your canonical tags when the raw HTML contradicts the rendered output?

Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

Having a different canonical URL in the raw HTML and in the rendered HTML creates mixed signals for Google. This may lead Google to choose a completely different canonical or to alternate between the two versions, making reports in Search Console difficult to interpret.

🎥 Source video

Extracted from a Google Search Central video

💬 EN 📅 26/04/2021 ✂ 26 statements

Watch on YouTube →

✂ Other statements from this video 25 ▾

📅

Official statement from April 26, 2021 (5 years ago)

⚠ A more recent statement exists on this topic Why does Google index rendered HTML instead of source HTML? Martin Splitt · July 6, 2022 View statement →

TL;DR

When a canonical URL differs between raw HTML (server-side) and rendered HTML (after JavaScript), Google receives mixed signals. As a result, the engine may ignore both versions and choose a completely arbitrary canonical, or worse, alternate between the two URLs depending on crawls. In concrete terms, your Search Console reports become unusable and your authority consolidation efforts go up in smoke.

What you need to understand

Where does this confusion between raw HTML and rendered HTML come from? 

The raw HTML corresponds to the source code sent directly by the server when a browser (or Googlebot) makes an HTTP request. This is what you see if you view the source of a page via 'View Page Source' in Chrome.

The rendered HTML corresponds to the state of the DOM after executing the client-side JavaScript. If your framework (React, Vue, Angular, Next.js in partial CSR mode) injects, modifies, or replaces a <link rel="canonical"> tag via JS, Google first sees one version, then another after rendering indexation.

Why does this divergence pose a problem for Google? 

Google first crawls the raw HTML, then queues the page for JavaScript rendering. This process is neither instantaneous nor guaranteed: some pages may wait for days or even weeks before being rendered. In the meantime, Googlebot has already extracted signals from the raw HTML — including the canonical.

When the canonical changes after rendering, Google inherits two contradictory instructions for the same URL. The engine has no way of knowing which one is 'the right one' — which breaks the consolidation logic. Martin Splitt claims that in this case, Google may choose a third URL as canonical or toggle between the two versions depending on crawl cycles.

What does 'toggle between the two versions' actually mean? 

This means that during a crawl, Google retains the canonical from the raw HTML, then on a subsequent crawl (after rendering), switches to the canonical from the rendered HTML. This instability fragments ranking signals: backlinks, anchors, click history, everything gets diluted across multiple URLs.

In Search Console, you observe inconsistent coverage reports: a URL marked 'Duplicate – canonical URL different from that defined by the user', then reclassified as 'Indexed', then marked duplicate again. It's impossible to manage your indexing properly under these conditions.

Mixed signals = Google can't decide which URL to consolidate as the reference.
Canonical toggling = your GSC metrics become unusable for performance tracking.
Arbitrary choice = Google may select a URL that you've never defined as canonical, diluting your authority.
Crawl budget impact = the bot wastes time crawling and reprocessing unstable variants instead of discovering fresh content.

SEO Expert opinion

Is this statement consistent with field observations? 

Yes, and it confirms a phenomenon that many SEO practitioners mistakenly attributed to 'bugs' from Google. In reality, it's a poorly managed front-end architecture issue. Sites migrating to modern JS frameworks (Next.js, Nuxt, SvelteKit) without strict SSR often fall into this trap.

This behavior is particularly observed on e-commerce sites where the canonical is managed via a React component that loads after the initial HTML. As a result, Google first indexes the product page with an empty or generic canonical, then switches to the correct URL after rendering — but in the meantime, backlinks have landed on the wrong variant.

What nuances should be added to this claim? 

Martin Splitt mentions a 'completely different' choice of canonical by Google, but he doesn’t specify the exact criteria for this choice. [To be verified]: it can be assumed that Google uses other signals (sitemaps, majority internal links, external backlinks) to arbitrate, but no official confirmation on the exact weighting.

Another unclear point is the frequency of toggling. Does Google switch each crawl? Only during rendering cycles? Or randomly based on the load of the rendering servers? Again, no publicly available data. We are navigating in a dependent manner, in an empirical observation mode.

Attention: If your site uses JavaScript to dynamically modify canonicals based on user parameters (A/B testing, geolocation, personalization), you are likely in a gray area. Google can interpret these variations as mixed signals even if your intent is legitimate. Always check the final rendering via the URL inspection tool in GSC.

In what cases does this rule not strictly apply? 

If your site is 100% static (complete SSG, without client-side JS hydration modifying meta tags), you are safe. Raw HTML = rendered HTML, so no divergence. This is the case for well-configured Gatsby, Hugo, or Jekyll sites.

Also, if you are using strict SSR (Server-Side Rendering) where the server sends the final HTML directly with the correct canonical, and the client-side JS never touches this tag, you remain safe. But as soon as a third-party library (tracking, consent, CMP) injects or modifies tags in the <head>, the risk reappears.

Practical impact and recommendations

What should be done concretely to avoid this problem? 

First, audit your raw vs rendered HTML for all your strategic pages. Compare the server source code (curl or 'View Page Source') with the DOM state after full loading (DevTools → Elements). If the canonical tags differ, you are in the red zone.

Next, prioritize server-side rendering for critical tags: canonical, hreflang, meta robots, structured data. Never allow JavaScript to modify these elements after the first paint. If your framework requires it, configure strict SSR or SSG for indexable pages.

What errors should be absolutely avoided? 

Never inject a canonical tag via a useEffect in React, a mounted() in Vue, or a script that runs after the DOMContentLoaded. Google crawls the raw HTML as a priority — your JS may take seconds to execute, and in the meantime, the signal is already sent.

Another classic trap: headless CMS (Contentful, Strapi, Prismic) that generate canonicals client-side via asynchronous API requests. If the API takes 500 ms to respond, your initial HTML is missing a canonical, which then appears after rendering. Google sees two incompatible states.

How can I check if my site is compliant? 

Use the URL inspection tool in Search Console: compare the 'Raw HTML' tab with 'Screenshot' (which reflects the rendering). If the canonicals diverge, you have a problem. Do this check on 10-15 standard pages (home, categories, products, articles).

Complement this with a Screaming Frog crawl in JavaScript mode: compare the 'Canonical Link Element 1' (raw HTML) and 'Rendered Canonical' (after JS) columns. Any divergence = potential mixed signal. Prioritize fixing high-traffic organic pages and backlinks.

Audit raw vs rendered HTML on 15 strategic pages using the GSC inspection tool
Configure strict SSR for all canonical tags, hreflang, and meta robots
Never inject a canonical tag via client-side JavaScript (useEffect, mounted, etc.)
Crawl the site with Screaming Frog in JS rendering mode and compare raw vs rendered canonicals
Verify that headless CMS send canonicals in the initial HTML, not via asynchronous API requests
Monitor GSC coverage reports to spot canonical switches between two crawls

The divergence between raw and rendered HTML on canonical tags creates a chronic instability in indexing. Google no longer knows which URL to consolidate, which fragments your ranking signals and renders your GSC reports unusable. The only viable workaround: strict server-side rendering for all critical tags. If your current front-end architecture doesn’t allow this natively, a technical overhaul may be necessary. These projects affecting both infrastructure, code, and SEO are rarely trivial to manage internally — enlisting the help of a SEO agency specialized in JavaScript SEO and SSR can expedite compliance while securing the transition.

❓ Frequently Asked Questions

Peut-on forcer Google à ignorer la canonique du HTML rendu et ne considérer que celle du HTML brut ?

Non, Google n'offre aucun paramètre pour désactiver le rendu JavaScript ou privilégier exclusivement l'HTML brut. Le moteur traite les deux états et tente d'arbitrer. La seule solution est d'aligner les deux versions.

Si Google alterne entre deux canoniques, est-ce que je perds définitivement l'autorité de l'une des deux URL ?

Pas définitivement, mais l'autorité se dilue. Les backlinks pointant vers l'URL non retenue lors d'un crawl donné ne sont pas consolidés vers la canonique active à ce moment-là. À long terme, cela fragmente le PageRank et affaiblit le potentiel de ranking.

Les frameworks modernes comme Next.js 13+ (App Router) ou Remix règlent-ils ce problème nativement ?

Partiellement. Next.js App Router avec SSR activé génère bien le HTML côté serveur, mais si tu utilises des composants clients (`'use client'`) qui modifient le `<head>`, le risque persiste. Remix impose un SSR strict par défaut, ce qui limite les divergences, mais reste vigilant sur les librairies tierces.

Est-ce que les balises hreflang et meta robots sont aussi concernées par ce problème de signal mixte ?

Oui, absolument. Toute balise critique modifiée par JavaScript après le HTML initial crée une divergence. Google peut ignorer les hreflang injectées en JS ou interpréter un `noindex` ajouté après rendu comme un signal contradictoire avec l'indexation initiale.

Comment savoir si Google a choisi une canonique différente de celles que j'ai définies ?

Dans Search Console, regarde la colonne « URL canonique sélectionnée par Google » dans le rapport de couverture. Si elle diffère de ta balise canonique (HTML brut ou rendu), c'est que Google a arbitré autrement — souvent en se basant sur les sitemaps, liens internes ou backlinks.

🏷 Related Topics

canonique HTML rendu JavaScript SEO indexation SSR crawl Search Console signaux mixtes

Crawl & Indexing AI & SEO Images & Videos Domain Name Search Console

🎥 From the same video 25

Other SEO insights extracted from this same Google Search Central video · published on 26/04/2021

🎥 Watch the full video on YouTube →

Related statements

« Previous

Empty canonical tag filled via JavaScript: the ris...

JavaScript Links: Delayed Discovery Unveiled...

« Back to results