Does the JSON application state in the DOM create duplicate content?

Quick SEO Quiz

Test your SEO knowledge in 5 questions

Less than a minute. Find out how much you really know about Google search.

🕒 ~1 min 🎯 5 questions

Official statement

During server-side rendering, if the application state is serialized as JSON in the page (for hydration) in addition to the rendered HTML, it does not pose a duplicate content issue. Google only looks at the DOM, not the in-memory state or embedded JSON data for the application.

20:02

🎥 Source video

Extracted from a Google Search Central video

⏱ 51:17 💬 EN 📅 12/05/2020 ✂ 37 statements

Watch on YouTube (20:02) →

✂ Other statements from this video 36 ▾

📅

Official statement from May 12, 2020 (6 years ago)

⚠ A more recent statement exists on this topic Why does Google recommend JSON-LD over Microdata or RDFa for structured data imp... Ryan Levering · August 23, 2022 View statement →

TL;DR

Google states that serializing the application state as JSON for client-side hydration does not pose any duplicate content issues. The search engine focuses solely on the rendered DOM, ignoring embedded JSON data for JavaScript initialization. In practice, you can continue using your modern frameworks without fearing penalties, but ensure that critical content appears in the initial HTML.

What you need to understand

What is hydration and why does it generate JSON in the page?

When a React, Vue, or Next.js application performs server-side rendering, the server sends a complete HTML document to the browser. But for the interface to become interactive, the JavaScript framework must take over on the client side.

This process is called hydration. To avoid redoing all API calls, the server serializes the application state into a <script type="application/json"> block or a global variable. The result? The same content appears twice: once rendered as visible HTML, and once as raw JSON in the source code.

Why are some SEO practitioners worried about this duplication?

It seems logical: if Google penalizes duplicate content across different pages, why wouldn't it do so within a single page? Some feared that these large JSON blobs would be interpreted as keyword stuffing or hidden text.

This fear was based on a partial understanding of how Googlebot works. The crawler executes JavaScript, accesses the complete DOM, and could technically detect these structural duplicates. But detecting does not mean penalizing.

What does Martin Splitt specifically say about this topic?

Google's position is clear: the engine only looks at the rendered DOM. The embedded JSON data for hydration is not considered indexable content. Googlebot distinguishes between the visual rendering and the application initialization code.

In practice, this means that your __NEXT_DATA__ blocks, your window.__STATE__, or your <script type="application/json"> tags are transparent for indexing. Google does not care whether data exists in memory or in a script — what matters is what is displayed in the DOM tree accessible to the user.

Server-side rendering with hydration is totally safe for SEO
Modern frameworks (Next.js, Nuxt, SvelteKit) do not create a risk of internal duplication
Google clearly differentiates between visible content and technical application code
This clarification applies to all types of serialized state: props, Redux store, React context, etc.
No special optimization is needed to "hide" this JSON from Google

SEO Expert opinion

Is this statement consistent with field observations?

Absolutely. For years, sites using Next.js or Gatsby have never reported penalties related to the presence of __NEXT_DATA__ or its equivalent. If Google were to actually penalize this practice, SSR SPAs would have suffered massively — which is not the case.

Practical tests confirm: two versions of the same page, one with a large JSON state and the other without, achieve identical ranking performances with equal HTML content. Ranking depends on the visible DOM, not the underlying application state.

What nuances should we add to this statement?

To be honest: this clarification does not resolve all SEO issues for JavaScript applications. It strictly relates to internal duplication. If your critical content only appears client-side after hydration, Google will not see it — but that's another topic.

Second point: the size of the JSON payload can impact Core Web Vitals. A 500 KB blob slows down parsing, increases the Largest Contentful Paint, and degrades the user experience. Google won't penalize you for duplication, but potentially for poor performance.

Attention: This rule only applies to technical application state. If you inject editorial content solely into JSON without a visible equivalent in the DOM, Google will not index it. Hydration assumes pre-existing complete HTML on the server side.

In what cases does this rule not provide complete protection?

If your framework generates minimal HTML and loads all content via JavaScript after hydration, you step outside classic SSR. Google crawls the post-rendering DOM, indeed, but with time and resource limitations. A site that takes 8 seconds to display its content risks partial indexing, JSON or not.

Another edge case: sites that serialize sensitive or redundant data in the JSON state. Even if Google ignores this content for indexing, it can pose security issues or unnecessarily inflate the page size. The SEO rule is clear, but the best technical practice may diverge.

Practical impact and recommendations

What concrete actions should you take on your tech stack?

Nothing particular if you are already using clean server-side rendering. Continue letting your frameworks generate their hydration state as they do natively. There's no need to try to hide, minify, or obfuscate the embedded JSON for SEO reasons — it’s a waste of time.

Focus on what really matters: ensure that your critical content appears in the initial HTML sent by the server. Use the URL inspection tool in the Search Console or a simple curl to check what Googlebot receives before any JavaScript.

What mistakes should be avoided with modern frameworks?

Do not confuse SSR with client-side rendering. If you use React in pure SPA mode (create-react-app without SSR), your content loads after hydration — and at that point, Google may have difficulty indexing everything. The issue is not the JSON, it’s the absence of initial HTML.

Avoid artificially inflating the application state with unnecessary data. Even if Google doesn’t care, every KB counts for performance. A user on mobile 3G will pay for an excessive payload, and indirectly, your ranking via Core Web Vitals.

How to verify that your implementation is compliant?

Test your page with JavaScript disabled or in curl mode. Essential content must be present in the raw HTML. Then, check in the Search Console that Google is seeing the complete DOM with the URL inspection tool.

Compare the initial HTML rendering and the rendering after hydration. If the content changes drastically, it's a warning signal. Hydration should make the page interactive, not fill it with content absent from the server HTML. A well-architected site shows the same text before and after JavaScript — only events and interactions should be added.

Ensure that priority content is present in the source HTML (view-source: or curl)
Test with the URL inspection tool in the Search Console to see what Google actually indexes
Measure the weight of your JSON state: beyond 100-150 KB, question its necessity
Compare the rendering with JavaScript disabled vs. enabled: the delta should be minimal on editorial content
Monitor your Core Web Vitals, particularly the LCP and TBT which can suffer from excessive payload
Document your SSR architecture to prevent regressions during refactoring

Duplicated application state in JSON does not create any direct SEO risk, but remains an indicator of overall technical quality. A well-optimized site limits this payload to what's strictly necessary, ensures a rich HTML on the server side, and maintains high performance. If these technical trade-offs between SEO, performance, and hydration seem complex to balance alone, enlisting a specialized SEO agency in modern JavaScript architectures can help you avoid costly mistakes and accelerate your compliance with best practices.

❓ Frequently Asked Questions

Le JSON d'hydratation ralentit-il l'indexation de ma page ?

Non, Google ignore complètement ce JSON pour l'indexation. Il peut en revanche ralentir le parsing côté client et impacter les Core Web Vitals, mais pas le crawl ou le classement direct.

Dois-je compresser ou minifier le JSON embarqué pour le SEO ?

Pour le SEO, non, puisque Google ne le lit pas comme du contenu. Pour la performance utilisateur, oui : utilisez la compression gzip/brotli et limitez la taille de l'état sérialisé.

Si je mets du contenu uniquement dans le JSON sans HTML, sera-t-il indexé ?

Non. Google indexe le DOM rendu, pas l'état applicatif. Si un texte n'apparaît jamais dans le HTML visible, il ne sera pas pris en compte pour le ranking.

Cette règle s'applique-t-elle aux SPA purs sans server-side rendering ?

Le problème des SPA n'est pas le JSON mais l'absence de contenu dans le HTML initial. Google crawl le DOM après JavaScript, mais avec des limitations de temps et ressources.

Peut-on utiliser cette technique pour cacher du contenu sensible de Google ?

Techniquement oui, mais ce n'est pas son usage prévu. Si vous voulez vraiment exclure du contenu de l'indexation, utilisez noindex, robots.txt ou des zones authentifiées, pas un hack JSON.

🏷 Related Topics

server-side rendering hydratation contenu dupliqué JavaScript SEO indexation DOM Core Web Vitals crawl

Domain Age & History Content Crawl & Indexing AI & SEO JavaScript & Technical SEO

🎥 From the same video 36

Other SEO insights extracted from this same Google Search Central video · duration 51 min · published on 12/05/2020

🎥 Watch the full video on YouTube →

Related statements

« Previous

Service Worker for WebP: Googlebot Can't See the O...

Lighthouse is not a direct ranking factor...

« Back to results