What does Google say about SEO? /
Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

Blocking the crawl of JSON files via robots.txt will prevent the indexing of content that is visible only after rendering on pages that require these JSON files, both on your site and on third-party sites using your APIs.
717:14
🎥 Source video

Extracted from a Google Search Central video

⏱ 912h44 💬 EN 📅 05/03/2021 ✂ 20 statements
Watch on YouTube (717:14) →
Other statements from this video 19
  1. 27:21 Pourquoi vos Core Web Vitals mettent-ils 28 jours à se mettre à jour dans Search Console ?
  2. 36:39 Faut-il vraiment tester ses Core Web Vitals en laboratoire pour éviter les régressions ?
  3. 98:33 Les animations CSS pénalisent-elles vraiment vos Core Web Vitals ?
  4. 121:49 Les Core Web Vitals vont-ils encore changer et comment anticiper les prochaines mises à jour ?
  5. 146:15 Les pages par ville sont-elles vraiment toutes des doorway pages condamnées par Google ?
  6. 185:36 Le crawl budget dépend-il vraiment de la vitesse de votre serveur ?
  7. 203:58 Faut-il vraiment commencer petit pour débloquer son crawl budget ?
  8. 228:24 Faut-il vraiment régénérer vos sitemaps pour retirer les URLs obsolètes ?
  9. 259:19 Pourquoi Google refuse-t-il de fournir des données Voice Search dans Search Console ?
  10. 295:52 Comment forcer Google à rafraîchir vos fichiers JavaScript et CSS lors du rendering ?
  11. 317:32 Comment mapper les URLs et vérifier les redirects en migration pour ne pas perdre le ranking ?
  12. 353:48 Faut-il vraiment renseigner les dates dans les données structurées ?
  13. 390:26 Faut-il vraiment modifier la date d'un article à chaque mise à jour ?
  14. 432:21 Faut-il vraiment limiter le nombre de balises H1 sur une page ?
  15. 450:30 Les headings ont-ils vraiment autant d'importance que le pense Google ?
  16. 555:58 Les mots-clés LSI sont-ils vraiment utiles pour le référencement Google ?
  17. 585:16 Combien de liens par page faut-il pour optimiser le PageRank interne ?
  18. 674:32 Les requêtes JSON grèvent-elles vraiment votre crawl budget ?
  19. 789:13 Google peut-il deviner qu'une URL est dupliquée sans même la crawler ?
📅
Official statement from (5 years ago)
TL;DR

Blocking JSON via robots.txt prevents Google from indexing content that relies on these files after rendering. This rule applies to both your own site and third-party sites using your public APIs. Specifically, if your visible content requires JavaScript loaded via JSON, blocking these resources renders your pages invisible to Google.

What you need to understand

Why does blocking JSON cause indexing problems?<\/h3>

Google operates in two stages: initial crawl<\/strong> and then JavaScript rendering<\/strong>. When Googlebot retrieves your raw HTML, it then initiates a rendering process to execute the JS and load dynamic resources.<\/p>

If your JSON files are blocked in robots.txt, the bot can download your HTML but cannot retrieve the data needed for the final rendering. The result: it indexes an empty or incomplete page, even if everything works visually on the user side.<\/p>

How does this rule impact sites using modern frameworks?<\/h3>

Applications using React, Vue, or Angular<\/strong> often load their content via JSON API calls. If you block \/api\/*.json, for example, Google will never see the content generated after hydration.<\/p>

This is particularly critical for e-commerce sites<\/strong> where product listings, prices, and availability are loaded dynamically. Without access to the JSON, Google indexes product pages without descriptions or prices — essentially invisible in the results.<\/p>

Are third-party sites using your APIs affected as well?<\/h3>

Yes, and it's less intuitive. If you provide a public API<\/strong> that is consumed by other sites, blocking your JSON endpoints prevents the indexing of content displayed on those third-party sites.<\/p>

Imagine a review aggregator using your API: if you block \/reviews.json, the aggregated content will not be indexable by Google, even if it's not your own site. You indirectly penalize your partners.<\/p>

  • Blocking via robots.txt<\/strong> applies to all crawlers respecting this file, not just Googlebot<\/li>
  • Blocked JSON files<\/strong> are not rendered, hence the dependent content remains invisible to the index<\/li>
  • This rule concerns<\/strong> both your site and third-party sites consuming your public APIs<\/li>
  • Recommended alternative<\/strong>: only block JSON containing sensitive data, never those used to display public content<\/li><\/ul>

SEO Expert opinion

Does this statement truly reflect observed behavior in the field?<\/h3>

Yes, absolutely. Technical audits regularly show sites with misconfigured robots.txt<\/strong> blocking \/wp-json\/, \/api\/, or \/\*.json out of excessive caution.<\/p>

The problem is that many developers believe they are "protecting" their data by blocking these endpoints without realizing they are sabotaging their own indexing. I've seen Shopify stores lose 40% of their organic traffic after mistakenly blocking their collection JSONs.<\/p>

Are there cases where blocking JSON remains legitimate?<\/h3>

Of course. If your JSON contains sensitive data<\/strong> (user info, B2B pricing, internal stock), it should be blocked — but then, do not use it to display indexable public content.<\/p>

The distinction is simple: JSON used for client-side rendering<\/strong> of visible content = do not block. Purely backend or admin JSON = it's up to you. [To verify]<\/strong>: Google has never specified whether authentication mechanisms (tokens, headers) are sufficient to circumvent this issue without blocking in robots.txt.<\/p>

What is the acceptable margin of error in this configuration?<\/h3>

None. Unlike other SEO signals where you can compensate (weak backlinks but excellent content), blocking a critical JSON equates to making your page invisible<\/strong>. It’s binary.<\/p>

Always test your robots.txt modifications with Search Console > URL Inspection > Test live URL<\/strong>. If the rendered output is empty while your page functions normally, you've blocked an essential resource.<\/p>

Caution: some CMS platforms (notably WordPress) generate default robots.txt files blocking \/wp-json\/ — check this rule if you use a modern theme loading content via REST API.<\/div>

Practical impact and recommendations

How can you quickly audit your current robots.txt rules?<\/h3>

Download your robots.txt and look for all lines containing .json<\/strong>, \api\/<\/strong>, \data\/<\/strong>, or \content\/<\/strong>. For each Disallow rule found, ask yourself: "Does this file serve to display visible content for users?”<\/p>

Then use the Test robots.txt tool<\/strong> in Search Console. Paste a JSON URL that you suspect is blocked and check if Googlebot can access it. If it’s blocked while that JSON loads your product listings, you’ve found your culprit.<\/p>

What should you do if you discover critical blocked JSON for indexing?<\/h3>

Immediately remove the corresponding Disallow rule in robots.txt. Then, force a quick reindexing<\/strong> via Search Console by requesting inspection of the affected pages.<\/p>

Monitor your server logs in the following days: you should see Googlebot crawling the previously blocked JSONs. If not within 72 hours, it may indicate that this rule wasn’t the only cause (also check HTTP headers, X-Robots-Tag, etc.).<\/p>

What strategy should you adopt to secure your APIs without blocking indexing?<\/h3>

For public data<\/strong> (product listings, articles, reviews), keep JSONs accessible without restriction. For sensitive data, consider using token authentication<\/strong> or serving these JSONs from a non-public subdomain.<\/p>

You can also implement server-side rendering (SSR)<\/strong> or static site generation (SSG) to ensure that critical content is present in the initial HTML, without relying on JavaScript rendering. Less elegant technically, but much more robust from an SEO perspective.<\/p>

  • Audit robots.txt to identify all rules blocking .json or \/api\/
  • Test each blocked JSON URL using the Search Console robots.txt tool
  • Remove Disallow rules affecting JSONs serving visible content
  • Verify actual rendering with “Test live URL” after modification
  • Monitor Googlebot logs to confirm the crawling of unblocked JSONs
  • Consider SSR/SSG to reduce reliance on JavaScript rendering
  • <\/ul>
    Blocking JSON via robots.txt is a common mistake with serious consequences: indexed empty pages, loss of visibility, and traffic decline. Auditing your robots.txt rules should be a priority in any technical SEO diagnosis. If your technical stack relies heavily on JSON APIs and you lack the expertise to secure these flows while preserving indexing, consulting a specialized SEO agency in JavaScript architecture will help you avoid costly mistakes and ensure optimal configuration.<\/div>

❓ Frequently Asked Questions

Bloquer un JSON dans robots.txt affecte-t-il uniquement Googlebot ou aussi les autres moteurs ?
Tous les crawlers respectant robots.txt (Bing, Yandex, etc.) seront impactés. Si vous bloquez un JSON, aucun moteur ne pourra indexer le contenu qui en dépend.
Peut-on bloquer partiellement les JSON, par exemple uniquement pour certains crawlers ?
Oui, robots.txt permet des règles par User-agent. Vous pouvez théoriquement autoriser Googlebot tout en bloquant d'autres bots, mais c'est rarement pertinent pour du contenu public indexable.
Si mon JSON est accessible mais retourne une 401 ou 403, est-ce équivalent à un blocage robots.txt ?
Non. Un code 401/403 signale une restriction d'accès au niveau HTTP, que Google peut interpréter différemment. Robots.txt est un signal explicite de non-crawl volontaire.
Les JSON chargés via fetch() côté client sont-ils concernés par cette règle ?
Oui, absolument. Peu importe la méthode (fetch, XMLHttpRequest, axios), si le JSON est bloqué dans robots.txt, Googlebot ne pourra pas le récupérer lors du rendering.
Comment savoir si mes pages sont indexées avec ou sans le contenu JSON ?
Utilisez l'outil « Inspection d'URL » dans Search Console et comparez le rendu capturé par Google avec votre page réelle. Si des sections entières manquent, vérifiez vos JSON.

🎥 From the same video 19

Other SEO insights extracted from this same Google Search Central video · duration 912h44 · published on 05/03/2021

🎥 Watch the full video on YouTube →

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.