Le JSON sérialisé dans une balise script compte-t-il dans la limite de taille de page pour le crawl ?

Oui, il compte dans la taille totale du HTML téléchargé par Googlebot. Un JSON trop lourd peut consommer du crawl budget inutilement, même s'il n'est pas indexé. Optimise la taille du payload pour limiter ce gaspillage.

Si mon JSON contient des liens, sont-ils suivis par Googlebot ?

Non. Les liens dans un JSON non exécutable (type="application/json") ne sont pas extraits ni suivis par Googlebot. Seuls les liens présents dans le DOM rendu comptent pour le crawl et le PageRank.

Puis-je sérialiser du contenu SEO-critique uniquement dans le JSON et le charger en JS côté client ?

Non, mauvaise idée. Si ton contenu clé n'apparaît que dans le JSON et est injecté après coup en JS, Google risque de ne pas le voir ou de le voir avec retard. Toujours rendre le contenu SEO-critique dans le DOM initial.

Cette règle s'applique-t-elle aussi aux balises template en HTML5 ?

Les balises <template> ne sont pas rendues par défaut dans le DOM, donc leur contenu n'est pas indexé non plus. Si tu caches du contenu SEO là-dedans, Google ne le verra pas. À éviter pour le contenu indexable.

Bing et les autres moteurs ont-ils la même approche que Google sur le JSON sérialisé ?

On n'a pas de déclaration officielle équivalente chez Bing, Yandex ou Baidu. Par prudence, assume que la logique peut varier. Teste avec leurs outils de validation (Bing Webmaster Tools, etc.) pour vérifier le comportement.

Does serialized JSON in your JavaScript apps count as duplicate content?

Official statement

During SSR, application state is often serialized in JSON on the page, which duplicates the content once in the JSON and once in the DOM. Google does not consider this problematic duplicate content because only the DOM is taken into account for indexing.

19:22

🎥 Source video

Extracted from a Google Search Central video

⏱ 51:17 💬 EN 📅 12/05/2020 ✂ 37 statements

Watch on YouTube (19:22) →

✂ Other statements from this video 36 ▾

📅

Official statement from May 12, 2020 (5 years ago)

⚠ A more recent statement exists on this topic Why does Google recommend JSON-LD over Microdata or RDFa for structured data imp... Ryan Levering · August 23, 2022 View statement →

TL;DR

Google confirms that serializing application state in JSON within the DOM (a common practice in SSR) does not constitute penalizing duplicate content. Only the visible DOM rendering is indexed, not the embedded JSON in script tags. For React, Vue, or Angular sites with SSR, this means you can continue to hydrate state without fear of semantic dilution or penalties.

What you need to understand

Why is this question arising in the first place?

When you're doing Server-Side Rendering (SSR) with a modern framework, your server generates a complete HTML page with the content already rendered. Up to this point, nothing complicated.

However, for your app to take over on the client side without re-fetching everything, you serialize the application state (the data used for rendering) in a <script type="application/json"> tag. The result: the same content appears twice in the source HTML — once in the visible DOM, once in the JSON.

And naturally, SEO practitioners wondered whether Google would consider this as internal duplicate content, with all the implications: semantic dilution, potential cannibalization, or even a spam signal in extreme cases.

What exactly does Google say about this?

Martin Splitt cuts to the chase: Google only indexes the rendered DOM, not the serialized JSON. Even though the crawler technically sees both, only the visible content in the DOM tree after parsing counts for indexing.

In practical terms, if your SSR injects a 50 KB JSON block with all your React props, Google completely ignores it for semantic ranking. It looks at what displays in the browser after hydration, end of story.

This is a welcome clarification because it removes uncertainty that led some to fiddle with suboptimal solutions — like loading state via a separate endpoint, which disrupts user experience and increases Time to Interactive.

What are the technical implications for a production site?

If you're working on a Next.js, Nuxt, or SvelteKit site with SSR, you can continue to use __NEXT_DATA__, __NUXT__, or the equivalent without worrying. These mechanisms are specifically designed for this: transferring server state to the client.

However, that doesn’t mean anything goes. If your serialized JSON contains sensitive data (tokens, private user info), it remains visible in the HTML source. This is a security issue, not an SEO one, but it's good to keep in mind.

Another nuance: Google does not see the JSON for indexing, but it can still crawl and store it. If you're dumping 200 KB of JSON on every page, it consumes crawl budget for nothing. Optimize the size of your payload even if it doesn't directly impact ranking.

Google only indexes the rendered DOM, never the JSON serialized in script tags.
Modern frameworks (Next, Nuxt, Gatsby, etc.) can continue to hydrate the state without fear of penalizing duplication.
Watch out for sensitive data in the JSON: it remains exposed in the HTML source.
An oversized JSON payload can impact crawl budget even if it doesn’t affect semantic indexing.
This rule applies to <script type="application/json"> tags or similar, not to executable scripts that could modify the DOM.

SEO Expert opinion

Does this statement align with what we observe on the ground?

Honestly? Yes, and it’s consistent with what we've known about Google's rendering pipeline for years. Googlebot parses the HTML, builds the DOM, executes JS if necessary, and indexes the final result. Non-executable script tags (like type="application/json") are ignored.

What’s new is that Martin Splitt formalizes this clearly. Previously, we had to deduce this behavior from empirical tests and scattered bits of information. Now, we have an official position — and it changes everything for SPA/SSR projects that were still hesitating.

However, I have seen cases where developers serialized JSON in visible tags (like a <div style="display:none"> with JSON inside). In this case, Google can indeed see it as hidden content, and that could pose a problem. The key is that the JSON should be in a non-DOM context (script tag).

What limits should you keep in mind?

First, this rule only applies to content serialized in script tags. If you're duplicating your content elsewhere — like in oversized data-* attributes, within hidden HTML comments, or in hidden iframes — that's another story.

Next, be careful not to confuse "not a problem for indexing" with "no performance impact." A 300 KB JSON slows down Time to First Byte, increases page size, and can degrade Core Web Vitals. It’s not penalized as duplicate content, but it can hurt you on other criteria.

Last point: Google says it doesn’t index JSON, but what about other engines? Do Bing, Yandex, Baidu follow the same logic? [To be verified] — we don't have an equivalent official statement from them. If you're optimizing for a multilingual market or alternative engines, keep a margin of caution.

Are there cases where this rule no longer holds?

Yes. If your serialized JSON contains structured content different from what displays in the DOM, it can create inconsistencies. For example, if your JSON lists 50 products but your DOM only displays 10, Google will index the 10, not the 50.

Another edge case: sites using JSON-LD for Schema.org markup. In this case, it’s intended — Google reads this JSON to extract structured data. But if you mix valid JSON-LD with serialized application JSON, make sure both remain in separate, correctly typed tags.

Note: If your SSR generates different content on the server and client side (e.g., an incomplete server rendering followed by hydration that loads more content), Google may see a depleted version. Ensure that your final DOM after hydration matches what you want to index.

Practical impact and recommendations

What should you check on your site?

First step: open the source HTML (Ctrl+U or view source) of your SSR pages and locate the <script> tags containing your serialized state. Check that they have a type="application/json" attribute or equivalent — never type="text/javascript" if it’s just passive JSON.

Next, use the URL inspection tool in Search Console and compare the rendered DOM ("More info" tab > "Rendered page") with your source HTML. If you see major differences between the two, it means your hydration modifies the content — and that’s a potential problem.

Also check the size of your JSON payload. If a page weighs 150 KB with 100 KB of serialized JSON, you have an architectural problem. Optimize by only serializing the data strictly necessary for hydration, not your entire Redux or Vuex state.

What mistakes should you absolutely avoid?

Never serialize your JSON in a visible DOM element (even hidden with CSS). This includes <div style="display:none">, misused <template>, or oversized data-* attributes. Google can interpret this as cloaking or hidden content, with the penalties that come with it.

Also avoid serializing redundant data. If your JSON contains the exact same text as your DOM, word for word, you're wasting bandwidth and crawl budget. The idea is to serialize the application state (IDs, flags, small props), not to re-duplicate all the textual content.

Last classic pitfall: badly encoded inline scripts. If your JSON contains special characters (quotes, angle brackets, slashes) and you inject it incorrectly without escaping, you risk breaking the HTML or opening XSS vulnerabilities. Use your framework’s escaping functions (serialize-javascript, JSON.stringify + escape, etc.).

How can you ensure everything is working correctly?

Run an audit with Screaming Frog or Sitebulb with JavaScript mode enabled. Check that the indexable content (titles, text, structured data) is identical between the source HTML and the rendered DOM. If you see differences, dig deeper: it’s either a hydration issue or an SSR bug.

Also use Lighthouse or WebPageTest to measure the impact of serialized JSON on performance. If your Time to Interactive explodes due to a massive JSON payload, optimization is needed — either by lazy-loading certain data or moving state server-side (sessions, cookies).

Finally, test with varied User-Agents. Google says it ignores JSON, but what about Googlebot-Mobile vs Desktop? Third-party bots? A quick test with curl -A "Googlebot" will show you exactly what the crawler receives.

Ensure your serialized JSON is in

💬 Comments (0)

Be the first to comment.

🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.