What does Google say about SEO? /

Official statement

During SSR, application state is often serialized in JSON on the page, which duplicates the content once in the JSON and once in the DOM. Google does not consider this problematic duplicate content because only the DOM is taken into account for indexing.
19:22
🎥 Source video

Extracted from a Google Search Central video

⏱ 51:17 💬 EN 📅 12/05/2020 ✂ 37 statements
Watch on YouTube (19:22) →
Other statements from this video 36
  1. 1:02 Should you overlook the Lighthouse score to optimize your SEO?
  2. 1:02 Is page speed really a Google ranking factor?
  3. 1:42 Do Lighthouse and PageSpeed Insights really have no impact on rankings?
  4. 2:38 Do Google's Web Vitals really model user experience?
  5. 3:40 Is it true that page speed is as crucial a ranking factor as claimed?
  6. 7:07 Is it really a good idea to inject the canonical tag through JavaScript?
  7. 7:27 Can you really inject the canonical tag via JavaScript without risking your SEO?
  8. 8:28 Does Google Tag Manager really slow down your site, and should you abandon it?
  9. 8:31 Is GTM really sabotaging your loading time?
  10. 9:35 Is serving a 404 to Googlebot while showing a 200 to visitors really cloaking?
  11. 10:06 Is it really cloaking when Googlebot sees a 404 while users see a 200?
  12. 16:16 Are 301, 302, and JavaScript redirects really equivalent for SEO?
  13. 16:58 Are JavaScript redirects truly equivalent to 301 redirects for Google?
  14. 17:18 Is server-side rendering truly essential for Google SEO?
  15. 17:58 Should you really invest in server-side rendering for SEO?
  16. 20:02 Does the JSON application state in the DOM create duplicate content?
  17. 20:24 Is Cloudflare Rocket Loader passing Googlebot's SEO test?
  18. 20:44 Should you test Cloudflare Rocket Loader and third-party tools before activating them for SEO?
  19. 21:58 Should you worry about 'Other Error' messages in Search Console and Mobile Friendly Test?
  20. 23:18 Should you really be concerned about the 'Other Error' status in Google's testing tools?
  21. 27:58 Should you choose one JavaScript framework over another for your SEO?
  22. 31:27 Does JavaScript really consume crawl budget?
  23. 31:32 Does JavaScript rendering really consume crawl budget?
  24. 33:07 Should you ditch dynamic rendering for better SEO results?
  25. 33:17 Is it really time to move on from dynamic rendering for SEO?
  26. 34:01 Should you really abandon client-side JavaScript for indexing product links?
  27. 34:21 Does asynchronous JavaScript post-load really hinder Google indexing?
  28. 36:05 Is it really necessary to switch to a dedicated server to improve your SEO?
  29. 36:25 Shared or Dedicated Server: Does Google really make a difference?
  30. 40:06 Is client-side hydration really a SEO concern?
  31. 40:06 Is SSR + client hydration really safe for Google SEO?
  32. 42:12 Should you stop monitoring the overall Lighthouse score to focus on the Core Web Vitals metrics that matter for your site?
  33. 42:47 Is striving for 100 on Lighthouse really worth your time?
  34. 45:24 Is it true that 5G will accelerate your site, or is it just a mirage?
  35. 49:09 Does Googlebot really ignore your WebP images served through Service Workers?
  36. 49:09 Is it true that Googlebot overlooks your WebP images served by Service Worker?
📅
Official statement from (5 years ago)
TL;DR

Google confirms that serializing application state in JSON within the DOM (a common practice in SSR) does not constitute penalizing duplicate content. Only the visible DOM rendering is indexed, not the embedded JSON in script tags. For React, Vue, or Angular sites with SSR, this means you can continue to hydrate state without fear of semantic dilution or penalties.

What you need to understand

Why is this question arising in the first place?

When you're doing Server-Side Rendering (SSR) with a modern framework, your server generates a complete HTML page with the content already rendered. Up to this point, nothing complicated.

However, for your app to take over on the client side without re-fetching everything, you serialize the application state (the data used for rendering) in a <script type="application/json"> tag. The result: the same content appears twice in the source HTML — once in the visible DOM, once in the JSON.

And naturally, SEO practitioners wondered whether Google would consider this as internal duplicate content, with all the implications: semantic dilution, potential cannibalization, or even a spam signal in extreme cases.

What exactly does Google say about this?

Martin Splitt cuts to the chase: Google only indexes the rendered DOM, not the serialized JSON. Even though the crawler technically sees both, only the visible content in the DOM tree after parsing counts for indexing.

In practical terms, if your SSR injects a 50 KB JSON block with all your React props, Google completely ignores it for semantic ranking. It looks at what displays in the browser after hydration, end of story.

This is a welcome clarification because it removes uncertainty that led some to fiddle with suboptimal solutions — like loading state via a separate endpoint, which disrupts user experience and increases Time to Interactive.

What are the technical implications for a production site?

If you're working on a Next.js, Nuxt, or SvelteKit site with SSR, you can continue to use __NEXT_DATA__, __NUXT__, or the equivalent without worrying. These mechanisms are specifically designed for this: transferring server state to the client.

However, that doesn’t mean anything goes. If your serialized JSON contains sensitive data (tokens, private user info), it remains visible in the HTML source. This is a security issue, not an SEO one, but it's good to keep in mind.

Another nuance: Google does not see the JSON for indexing, but it can still crawl and store it. If you're dumping 200 KB of JSON on every page, it consumes crawl budget for nothing. Optimize the size of your payload even if it doesn't directly impact ranking.

  • Google only indexes the rendered DOM, never the JSON serialized in script tags.
  • Modern frameworks (Next, Nuxt, Gatsby, etc.) can continue to hydrate the state without fear of penalizing duplication.
  • Watch out for sensitive data in the JSON: it remains exposed in the HTML source.
  • An oversized JSON payload can impact crawl budget even if it doesn’t affect semantic indexing.
  • This rule applies to <script type="application/json"> tags or similar, not to executable scripts that could modify the DOM.

SEO Expert opinion

Does this statement align with what we observe on the ground?

Honestly? Yes, and it’s consistent with what we've known about Google's rendering pipeline for years. Googlebot parses the HTML, builds the DOM, executes JS if necessary, and indexes the final result. Non-executable script tags (like type="application/json") are ignored.

What’s new is that Martin Splitt formalizes this clearly. Previously, we had to deduce this behavior from empirical tests and scattered bits of information. Now, we have an official position — and it changes everything for SPA/SSR projects that were still hesitating.

However, I have seen cases where developers serialized JSON in visible tags (like a <div style="display:none"> with JSON inside). In this case, Google can indeed see it as hidden content, and that could pose a problem. The key is that the JSON should be in a non-DOM context (script tag).

What limits should you keep in mind?

First, this rule only applies to content serialized in script tags. If you're duplicating your content elsewhere — like in oversized data-* attributes, within hidden HTML comments, or in hidden iframes — that's another story.

Next, be careful not to confuse "not a problem for indexing" with "no performance impact." A 300 KB JSON slows down Time to First Byte, increases page size, and can degrade Core Web Vitals. It’s not penalized as duplicate content, but it can hurt you on other criteria.

Last point: Google says it doesn’t index JSON, but what about other engines? Do Bing, Yandex, Baidu follow the same logic? [To be verified] — we don't have an equivalent official statement from them. If you're optimizing for a multilingual market or alternative engines, keep a margin of caution.

Are there cases where this rule no longer holds?

Yes. If your serialized JSON contains structured content different from what displays in the DOM, it can create inconsistencies. For example, if your JSON lists 50 products but your DOM only displays 10, Google will index the 10, not the 50.

Another edge case: sites using JSON-LD for Schema.org markup. In this case, it’s intended — Google reads this JSON to extract structured data. But if you mix valid JSON-LD with serialized application JSON, make sure both remain in separate, correctly typed tags.

Note: If your SSR generates different content on the server and client side (e.g., an incomplete server rendering followed by hydration that loads more content), Google may see a depleted version. Ensure that your final DOM after hydration matches what you want to index.

Practical impact and recommendations

What should you check on your site?

First step: open the source HTML (Ctrl+U or view source) of your SSR pages and locate the <script> tags containing your serialized state. Check that they have a type="application/json" attribute or equivalent — never type="text/javascript" if it’s just passive JSON.

Next, use the URL inspection tool in Search Console and compare the rendered DOM ("More info" tab > "Rendered page") with your source HTML. If you see major differences between the two, it means your hydration modifies the content — and that’s a potential problem.

Also check the size of your JSON payload. If a page weighs 150 KB with 100 KB of serialized JSON, you have an architectural problem. Optimize by only serializing the data strictly necessary for hydration, not your entire Redux or Vuex state.

What mistakes should you absolutely avoid?

Never serialize your JSON in a visible DOM element (even hidden with CSS). This includes <div style="display:none">, misused <template>, or oversized data-* attributes. Google can interpret this as cloaking or hidden content, with the penalties that come with it.

Also avoid serializing redundant data. If your JSON contains the exact same text as your DOM, word for word, you're wasting bandwidth and crawl budget. The idea is to serialize the application state (IDs, flags, small props), not to re-duplicate all the textual content.

Last classic pitfall: badly encoded inline scripts. If your JSON contains special characters (quotes, angle brackets, slashes) and you inject it incorrectly without escaping, you risk breaking the HTML or opening XSS vulnerabilities. Use your framework’s escaping functions (serialize-javascript, JSON.stringify + escape, etc.).

How can you ensure everything is working correctly?

Run an audit with Screaming Frog or Sitebulb with JavaScript mode enabled. Check that the indexable content (titles, text, structured data) is identical between the source HTML and the rendered DOM. If you see differences, dig deeper: it’s either a hydration issue or an SSR bug.

Also use Lighthouse or WebPageTest to measure the impact of serialized JSON on performance. If your Time to Interactive explodes due to a massive JSON payload, optimization is needed — either by lazy-loading certain data or moving state server-side (sessions, cookies).

Finally, test with varied User-Agents. Google says it ignores JSON, but what about Googlebot-Mobile vs Desktop? Third-party bots? A quick test with curl -A "Googlebot" will show you exactly what the crawler receives.

  • Ensure your serialized JSON is in

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.