What does Google say about SEO? /
Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

If a JavaScript request to an API (like /api/cats) is blocked by robots.txt, Googlebot will not be able to load it even if it works in browsers. Browsers ignore robots.txt, but Google respects it, which can create empty pages in the index.
20:07
🎥 Source video

Extracted from a Google Search Central video

⏱ 46:02 💬 EN 📅 25/11/2020 ✂ 29 statements
Watch on YouTube (20:07) →
Other statements from this video 28
  1. 1:02 Google rend-il vraiment toutes les pages JavaScript, quelle que soit leur architecture ?
  2. 1:02 Google rend-il vraiment TOUT le JavaScript, même sans contenu initial server-side ?
  3. 2:05 Comment vérifier que Googlebot crawle vraiment votre site ?
  4. 2:05 Comment vérifier que Googlebot est vraiment Googlebot et pas un imposteur ?
  5. 2:36 Google limite-t-il vraiment le temps CPU lors du rendu JavaScript ?
  6. 2:36 Google limite-t-il vraiment le temps CPU lors du rendu JavaScript ?
  7. 3:09 Faut-il arrêter d'optimiser pour les bots et se concentrer uniquement sur l'utilisateur ?
  8. 5:17 La propriété CSS content-visibility impacte-t-elle le rendu dans Google ?
  9. 8:53 Comment mesurer les Core Web Vitals sur Firefox et Safari sans API native ?
  10. 11:00 Combien de temps Google attend-il vraiment avant d'abandonner le rendu JavaScript ?
  11. 11:00 Combien de temps Googlebot attend-il vraiment pour le rendu JavaScript ?
  12. 20:07 AJAX fonctionne en SEO, mais faut-il vraiment l'utiliser ?
  13. 21:10 Le JavaScript bloquant peut-il vraiment empêcher Google d'indexer tout le contenu de vos pages ?
  14. 24:48 Le prérendu dynamique est-il devenu un piège pour l'indexation ?
  15. 26:25 Pourquoi vos ressources supprimées peuvent-elles détruire votre indexation en prérendu ?
  16. 26:47 Que fait vraiment Google avec votre HTML initial avant le rendu JavaScript ?
  17. 27:28 Google analyse-t-il vraiment tout dans le HTML initial avant le rendu ?
  18. 27:59 Pourquoi Google ignore-t-il le rendu JavaScript si votre balise noindex apparaît dans le HTML initial ?
  19. 27:59 Pourquoi une page 404 avec JavaScript peut-elle faire désindexer tout votre site ?
  20. 28:30 Pourquoi Google refuse-t-il de rendre le JavaScript si le HTML initial contient un meta noindex ?
  21. 30:00 Google compare-t-il vraiment le HTML initial ET rendu pour la canonicalisation ?
  22. 30:01 Google détecte-t-il vraiment le duplicate content après le rendu JavaScript ?
  23. 31:36 Les APIs GET sont-elles vraiment mises en cache par Google comme les autres ressources ?
  24. 31:36 Google cache-t-il vraiment les requêtes POST lors du rendu JavaScript ?
  25. 34:47 Est-ce que Google indexe vraiment toutes les pages après rendu JavaScript ?
  26. 35:19 Google rend-il vraiment 100% des pages JavaScript avant indexation ?
  27. 36:51 Pourquoi vos APIs défaillantes sabotent-elles votre indexation Google ?
  28. 37:12 Les données structurées sur pages noindex sont-elles vraiment perdues pour Google ?
📅
Official statement from (5 years ago)
TL;DR

Googlebot strictly adheres to the robots.txt file, including for JavaScript API requests. If you block /api/ in robots.txt, your pages won't load data on Google's side, even if they display normally in Chrome. The result: empty pages in the index while everything seems functional during your browser tests.

What you need to understand

How do browsers and Googlebot handle robots.txt differently?

Modern browsers completely ignore the robots.txt file. When you test your site in Chrome, Firefox, or Safari, each JavaScript request to your APIs goes through unfiltered. That’s why your page correctly displays the list of products, customer reviews, or price data.

Googlebot, on the other hand, strictly follows the robots.txt directives before executing any script. If an API URL is blocked, the bot doesn’t access it — period. The rendering on Google’s side then fails to load dynamic data, leaving you with empty HTML in the index.

How does this blocking create ghost pages in the index?

Imagine an e-commerce site that loads its product listings via fetch('/api/products/12345'). If robots.txt contains Disallow: /api/, Googlebot downloads the initial HTML, executes the JavaScript... but outright blocks the API request.

The DOM therefore remains skeletal: no product title, no description, no price. Google indexes this empty shell. When you test it in your browser, you see the complete page and think everything is fine. This is the classic manual testing trap that doesn’t reflect the realities of crawling.

Why are so many sites unknowingly blocking their APIs?

Many robots.txt files are automatically generated by CMSs or frameworks with default “security” rules. Developers block /api/ thinking they are protecting their data or preventing unnecessary crawling.

Other times, it’s a historical remnant: the site was built using pure server-side PHP and then migrated to React/Vue/Angular without cleaning up robots.txt. The result: critical endpoints remain blocked even though they are now essential for client rendering.

  • Browsers never check robots.txt — your manual tests always pass
  • Googlebot blocks any API request listed in Disallow, even for JavaScript rendering
  • A poorly configured robots.txt generates empty pages in the index despite a functional site
  • The problem is invisible without testing via Search Console or a Google rendering tool
  • Modern frameworks amplify this risk by increasing client-side API calls

SEO Expert opinion

Is this statement consistent with what we observe in the field?

Absolutely. We regularly see SPAs or Jamstack sites with catastrophic indexing rates — 30% of pages indexed while the sitemap lists 10,000. When inspecting via the Search Console URL Testing tool, the HTML rendering shows empty <div id="app"></div>.

The diagnosis? A Disallow: /api/ or Disallow: /_next/data/ in robots.txt. Developers do not think “SEO” when configuring these rules — they think security, performance, or they copy-paste a template. And it breaks indexing without anyone realizing it for months.

What nuances should be added to this rule?

First point: if your content is already present in the initial HTML (SSR, pre-rendering, progressive hydration), blocking APIs has less impact. Google reads server-side content, even if subsequent JavaScript enhancements fail. But it remains a risky game — certain elements (dynamic prices, stock, reviews) may be missing.

Second nuance: some blocks /api/analytics, /api/tracking, or /api/user-prefs have no SEO impact and can legitimately remain blocked. The issue is that we often see Disallow: /api/ rules that are too broad, blocking everything. [To be checked] on a case-by-case basis: each endpoint should be assessed for its role in visible rendering.

When does this rule not apply?

If you are using strict server-side rendering (Next.js getServerSideProps, Nuxt asyncData server-side, classic PHP), robots.txt blocks nothing since data is injected before the HTML is sent. Googlebot receives complete content without executing JavaScript.

Another exception: sites that load non-indexable content by design (member areas, carts, user preferences). There, blocking /api/ is intentional and has no negative SEO impact. But let’s be honest — most of the time, it’s a configuration accident, not a carefully thought-out strategy.

Attention: If you migrate from a server-side site to a modern JavaScript framework, audit your robots.txt IMMEDIATELY. The inherited rules can destroy your visibility overnight without you detecting it through browser tests.

Practical impact and recommendations

How can you check if your APIs are not blocked?

First step: open your robots.txt and look for any line containing Disallow: /api, Disallow: /_next, Disallow: /data or equivalent. If you find this, it's an immediate red flag.

Next, use the URL inspection tool in Search Console. Click on ‘Test URL live’, then ‘View crawled page’ > ‘More info’ > ‘JavaScript’. Compare the final rendering with your actual page in Chrome. If entire sections are missing (products, articles, data), you’ve found the culprit.

What mistakes should be avoided when configuring robots.txt?

Never block an entire path like /api/ without thinking. If you need to protect certain endpoints, list them individually: Disallow: /api/admin, Disallow: /api/user-settings. Let through what serves public rendering.

Another classic pitfall: poorly ordered cascading rules. If you write Disallow: /api/ and then Allow: /api/products, the order matters for certain bots. Google handles this correctly, but it’s best to avoid confusion — be explicit and minimalist.

What should you do specifically to fix this issue?

Identify all API endpoints essential for the rendering of your indexable pages. Create a list: /api/products, /api/posts, /api/categories, etc. Ensure none of these paths appear in a Disallow directive.

If you must block some APIs for security reasons, instead use server-side authentication (tokens, headers, strict CORS) rather than relying on robots.txt. This file is not a firewall — it’s a guideline for cooperative bots.

  • Audit robots.txt and remove any Disallow: /api/ rule that is too broad
  • Test JavaScript rendering via Search Console on 10-20 strategic pages
  • Compare the HTML crawled by Google with the actual browser rendering
  • List critical API endpoints and explicitly allow their crawling if necessary
  • Set up an indexing tracking alert to detect sudden drops
  • Document robots.txt rules and their justification in your SEO runbook
This type of robots.txt × JavaScript rendering audit can quickly become a headache if your tech stack mixes several frameworks, external APIs, and historical rules. The interdependencies can sometimes be opaque, and an improperly balanced fix can block something else without warning. If you lack time or internal expertise to effectively map your rendering architecture, bringing in a technical SEO agency that masters these issues can save you months of blind diagnosis and fixes.

❓ Frequently Asked Questions

Est-ce que Googlebot exécute JavaScript si l'API est bloquée dans robots.txt ?
Oui, Googlebot exécute le JavaScript, mais il bloque la requête fetch() ou XMLHttpRequest vers l'API interdite. Le script tourne, mais ne reçoit jamais les données, ce qui produit un DOM vide ou incomplet.
Comment savoir si mes pages sont indexées vides à cause de robots.txt ?
Utilisez l'outil d'inspection d'URL dans Search Console. Testez l'URL en direct, affichez le rendu HTML final et comparez-le avec ce que vous voyez dans votre navigateur. Si des blocs de contenu manquent côté Google, vérifiez robots.txt.
Puis-je bloquer /api/ pour économiser du crawl budget sans impact SEO ?
Non, pas si ces API servent à charger du contenu indexable. Bloquer /api/ économise zéro crawl budget réel — Googlebot ne crawle ces endpoints que quand le JavaScript les appelle. Vous cassez juste le rendu.
Les frameworks comme Next.js ou Nuxt sont-ils concernés par ce problème ?
Ça dépend. Si vous utilisez SSR (getServerSideProps, asyncData serveur), le contenu est injecté avant l'envoi HTML et robots.txt n'intervient pas. Mais en mode CSR ou ISR avec revalidation client, les API doivent rester accessibles.
Faut-il autoriser explicitement les API dans robots.txt ou simplement ne pas les bloquer ?
Par défaut, tout ce qui n'est pas interdit est autorisé. Vous n'avez pas besoin d'un Allow: /api/ explicite sauf si une règle Disallow plus large entre en conflit. Restez minimaliste : ne bloquez que ce qui doit l'être.
🏷 Related Topics
Domain Age & History Crawl & Indexing AI & SEO JavaScript & Technical SEO

🎥 From the same video 28

Other SEO insights extracted from this same Google Search Central video · duration 46 min · published on 25/11/2020

🎥 Watch the full video on YouTube →

Related statements

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.