Does Google really catch duplicate content after JavaScript rendering?

Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

Google computes content hashes on the initial HTML for deduplication, but subsequently compares these hashes with those obtained after JavaScript rendering. The final decision regarding duplication and canonicalization takes into account the rendered HTML, not just the initial one.

30:01

🎥 Source video

Extracted from a Google Search Central video

⏱ 46:02 💬 EN 📅 25/11/2020 ✂ 29 statements

Watch on YouTube (30:01) →

✂ Other statements from this video 28 ▾

📅

Official statement from November 25, 2020 (5 years ago)

⚠ A more recent statement exists on this topic Does technical duplicate content really penalize your site? John Mueller · January 15, 2021 View statement →

TL;DR

Google computes content hashes on the initial HTML to detect duplications, but doesn't stop there. It then compares these fingerprints with those obtained after executing JavaScript. The final decision of canonicalization is based on the rendered HTML, which changes the game for client-side rendering sites: your pre-JS content can go unnoticed if the final render differs.

What you need to understand

What is a content hash, and why does Google use it twice?

A content hash is a unique digital fingerprint generated from a page's HTML. Google uses this technique to quickly identify identical or nearly identical pages without having to compare line by line across millions of documents.

Martin Splitt's revelation specifies that Google generates a first hash on the raw HTML — the one the server sends directly — and then a second hash after executing JavaScript. This double pass allows for the detection of duplications that would only appear after client-side rendering, a common scenario with modern frameworks like React, Vue, or Angular.

Why does this distinction between initial HTML and rendered content change the game?

For years, the SEO industry has debated the exact moment when Google detects duplications: before or after JavaScript rendering? This statement clarifies: the final decision is based on the rendered HTML, not the initial one.

In practice, if two pages display different initial HTML but produce identical content after JS execution, Google will consider them as potential duplicates. Conversely, two pages with identical HTML shells but different JavaScript content will be treated as unique.

When does this mechanism become critical?

Single Page Application (SPA) sites are the primary concern. Their initial HTML is often skeletal — a simple generic container — while the actual content is injected via JavaScript. Without rendering, all pages of an SPA would have nearly identical hashes.

E-commerce sites with client-side filters are also at risk. If the initial HTML is the same for all filter combinations, but the JS generates different product listings, Google must wait for rendering to distinguish these variants.

Hash on initial HTML: quick first pass, gross detection of obvious duplications
Hash on rendered HTML: final decision, considering JavaScript-generated content
Canonicalization: based on rendered content, not on the raw HTML received from the server
Implication for SPAs: the generic shell is not enough to differentiate pages; JS becomes crucial
Rendering budget: this double detection consumes time and resources, especially on large sites

SEO Expert opinion

Is this statement consistent with real-world observations?

Yes, it confirms what several empirical tests have suggested since the rise of SPAs. Documented cases showed that Google indexed content that was absent from the source HTML but present after rendering — proof that it executed JavaScript before finalizing canonicalization.

However, one point remains unclear: what is the delay between the two hashes? Martin Splitt does not specify whether rendering occurs immediately or if Google might let several days pass between crawling the initial HTML and executing JavaScript. On high-volume sites, this delay can create transitional situations where Google temporarily treats pages as duplicates before differentiating them post-render. [To be verified]

What nuances should be added to this claim?

Google does not guarantee that all pages will go through JavaScript rendering. The budget allocated for JS execution remains limited, especially for lesser authoritative or poorly structured sites. If Google determines that a page is a duplicate based on the initial HTML, it may choose never to render it — thereby saving resources.

Another nuance: the speed of rendering influences the decision. A page that takes 10 seconds to load its content via JS risks that Google abandons before completion, capturing an incomplete state. The resulting hash would then not reflect the final content, skewing duplication detection.

In what cases could this rule not fully apply?

Orphaned pages — without internal or external links — are unlikely to benefit from JavaScript rendering if Google discovers them solely via the sitemap. The engine may apply a hash on the initial HTML without going further, lacking sufficient quality signals.

Sites with critical JavaScript errors face the same treatment. If JS execution fails, Google resorts to the initial HTML. In this scenario, two pages with the same HTML shell but different JS would be considered duplicates, even if rendering should have differentiated them.

Caution: Google does not publicly document rendering budget thresholds. Claims about "all pages are rendered" remain theoretical. In practice, only pages deemed priority by the algorithm consistently benefit from rendering.

Practical impact and recommendations

What should be prioritized in an audit of a JavaScript site?

Start by comparing the source HTML and the rendered DOM on a representative sample of pages. Use Chrome's DevTools (Ctrl+U for source, Inspect for rendered DOM). If the main content differs significantly, you are entirely dependent on JavaScript rendering to avoid duplications.

Next, ensure that canonical tags are present from the initial HTML. Google reads these tags before rendering, which influences crawl prioritization. A canonical tag missing from the raw HTML but injected via JS arrives too late for certain budget decisions.

How can I ensure Google actually renders my JS pages?

Use the URL Inspection Tool from Search Console. This shows the rendered HTML as Googlebot sees it. Compare it with your source HTML: if the expected differences appear, rendering is functioning. If not, you have a JavaScript execution problem.

Monitor server logs to detect calls from Googlebot to your APIs or JavaScript resources. An SPA site without API requests in the logs indicates that Google is not rendering the content — it is sticking with the initial shell. This indicates either an insufficient rendering budget or blocking errors.

What technical errors increase the risk of duplicate content?

Content generated with a delay is problematic. If your JavaScript waits 3 seconds before injecting the main content, Google may capture an unfinished or empty intermediate state. The resulting hash will be identical for several pages, creating false duplicates.

Unmanaged asynchronous requests exacerbate the issue. If Google renders your page before your fetch() calls return, it hashes an incomplete DOM. Implement a mechanism signaling to Google that the content is ready — for example, a custom event or a DOM indicator.

Audit the difference between source HTML and rendered DOM on 50-100 key URLs
Place canonical tags in the initial HTML, not just in JavaScript
Test the Search Console URL Inspection to confirm effective rendering
Analyze server logs to detect Googlebot calls to JS/API resources
Avoid content injection delays exceeding 2 seconds
Implement a DOM indicator signaling the end of asynchronous loading

Detecting duplicate content after JavaScript rendering introduces significant technical complexity. Between monitoring the rendering budget, optimizing JS execution times, and comparative hash analysis, modern sites require advanced expertise levels. Faced with these multi-layered issues — server, client, crawl, indexing — enlisting an SEO agency specialized in JavaScript architectures helps avoid costly mistakes and ensures every page receives proper treatment by Google.

❓ Frequently Asked Questions

Google calcule-t-il un hash sur toutes les pages ou seulement certaines ?

Google calcule un hash sur toutes les pages crawlées, mais le rendu JavaScript n'est pas systématique. Les pages prioritaires, bien liées et sur des sites autoritaires ont plus de chances de bénéficier du rendu complet.

Si mon HTML initial est identique sur toutes les pages, suis-je pénalisé ?

Pas directement, mais tu dépends entièrement du rendu JavaScript pour différencier tes pages. Si Google ne rend pas certaines URLs, elles seront traitées comme duplicates du shell générique.

La balise canonical doit-elle être dans le HTML initial ou le JS suffit-il ?

Elle doit impérativement figurer dans le HTML initial. Google la lit avant le rendu pour prioriser le crawl. Une canonical uniquement en JS arrive trop tard pour certaines décisions d'indexation.

Comment savoir si Google a rendu ma page ou s'est arrêté au HTML brut ?

Utilise l'outil d'inspection d'URL de la Search Console. Il affiche le DOM rendu tel que Googlebot l'a capturé. Compare-le avec ton HTML source pour identifier les différences.

Un site en server-side rendering échappe-t-il à ce problème ?

Oui, en grande partie. Le SSR envoie du HTML complet dès la réponse serveur, évitant la dépendance au rendu JavaScript. Le hash initial et le hash rendu sont alors quasi-identiques, simplifiant la détection de duplications.

🏷 Related Topics

duplicate content rendu JavaScript hash contenu canonicalisation SPA crawl budget HTML rendu indexation Google

Content Crawl & Indexing AI & SEO JavaScript & Technical SEO

🎥 From the same video 28

Other SEO insights extracted from this same Google Search Central video · duration 46 min · published on 25/11/2020

🎥 Watch the full video on YouTube →

Related statements

« Previous

Keeping Old Resources for Pre-rendering...

Performance Optimization: User-Centric Approach...

« Back to results