Why does Google display empty pages even when your JavaScript site is working perfectly?

Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

If a JavaScript request to an API (like /api/cats) is blocked by robots.txt, Googlebot will not be able to load it even if it works in browsers. Browsers ignore robots.txt, but Google respects it, which can create empty pages in the index.

20:07

🎥 Source video

Extracted from a Google Search Central video

⏱ 46:02 💬 EN 📅 25/11/2020 ✂ 29 statements

Watch on YouTube (20:07) →

✂ Other statements from this video 28 ▾

📅

Official statement from November 25, 2020 (5 years ago)

⚠ A more recent statement exists on this topic Why is noindexing empty internal search pages essential for SEO? John Mueller · October 1, 2021 View statement →

TL;DR

Googlebot strictly adheres to the robots.txt file, including for JavaScript API requests. If you block /api/ in robots.txt, your pages won't load data on Google's side, even if they display normally in Chrome. The result: empty pages in the index while everything seems functional during your browser tests.

What you need to understand

How do browsers and Googlebot handle robots.txt differently?

Modern browsers completely ignore the robots.txt file. When you test your site in Chrome, Firefox, or Safari, each JavaScript request to your APIs goes through unfiltered. That’s why your page correctly displays the list of products, customer reviews, or price data.

Googlebot, on the other hand, strictly follows the robots.txt directives before executing any script. If an API URL is blocked, the bot doesn’t access it — period. The rendering on Google’s side then fails to load dynamic data, leaving you with empty HTML in the index.

How does this blocking create ghost pages in the index?

Imagine an e-commerce site that loads its product listings via fetch('/api/products/12345'). If robots.txt contains Disallow: /api/, Googlebot downloads the initial HTML, executes the JavaScript... but outright blocks the API request.

The DOM therefore remains skeletal: no product title, no description, no price. Google indexes this empty shell. When you test it in your browser, you see the complete page and think everything is fine. This is the classic manual testing trap that doesn’t reflect the realities of crawling.

Why are so many sites unknowingly blocking their APIs?

Many robots.txt files are automatically generated by CMSs or frameworks with default “security” rules. Developers block /api/ thinking they are protecting their data or preventing unnecessary crawling.

Other times, it’s a historical remnant: the site was built using pure server-side PHP and then migrated to React/Vue/Angular without cleaning up robots.txt. The result: critical endpoints remain blocked even though they are now essential for client rendering.

Browsers never check robots.txt — your manual tests always pass
Googlebot blocks any API request listed in Disallow, even for JavaScript rendering
A poorly configured robots.txt generates empty pages in the index despite a functional site
The problem is invisible without testing via Search Console or a Google rendering tool
Modern frameworks amplify this risk by increasing client-side API calls

SEO Expert opinion

Is this statement consistent with what we observe in the field?

Absolutely. We regularly see SPAs or Jamstack sites with catastrophic indexing rates — 30% of pages indexed while the sitemap lists 10,000. When inspecting via the Search Console URL Testing tool, the HTML rendering shows empty <div id="app"></div>.

The diagnosis? A Disallow: /api/ or Disallow: /_next/data/ in robots.txt. Developers do not think “SEO” when configuring these rules — they think security, performance, or they copy-paste a template. And it breaks indexing without anyone realizing it for months.

What nuances should be added to this rule?

First point: if your content is already present in the initial HTML (SSR, pre-rendering, progressive hydration), blocking APIs has less impact. Google reads server-side content, even if subsequent JavaScript enhancements fail. But it remains a risky game — certain elements (dynamic prices, stock, reviews) may be missing.

Second nuance: some blocks /api/analytics, /api/tracking, or /api/user-prefs have no SEO impact and can legitimately remain blocked. The issue is that we often see Disallow: /api/ rules that are too broad, blocking everything. [To be checked] on a case-by-case basis: each endpoint should be assessed for its role in visible rendering.

When does this rule not apply?

If you are using strict server-side rendering (Next.js getServerSideProps, Nuxt asyncData server-side, classic PHP), robots.txt blocks nothing since data is injected before the HTML is sent. Googlebot receives complete content without executing JavaScript.

Another exception: sites that load non-indexable content by design (member areas, carts, user preferences). There, blocking /api/ is intentional and has no negative SEO impact. But let’s be honest — most of the time, it’s a configuration accident, not a carefully thought-out strategy.

Attention: If you migrate from a server-side site to a modern JavaScript framework, audit your robots.txt IMMEDIATELY. The inherited rules can destroy your visibility overnight without you detecting it through browser tests.

Practical impact and recommendations

How can you check if your APIs are not blocked?

First step: open your robots.txt and look for any line containing Disallow: /api, Disallow: /_next, Disallow: /data or equivalent. If you find this, it's an immediate red flag.

Next, use the URL inspection tool in Search Console. Click on ‘Test URL live’, then ‘View crawled page’ > ‘More info’ > ‘JavaScript’. Compare the final rendering with your actual page in Chrome. If entire sections are missing (products, articles, data), you’ve found the culprit.

What mistakes should be avoided when configuring robots.txt?

Never block an entire path like /api/ without thinking. If you need to protect certain endpoints, list them individually: Disallow: /api/admin, Disallow: /api/user-settings. Let through what serves public rendering.

Another classic pitfall: poorly ordered cascading rules. If you write Disallow: /api/ and then Allow: /api/products, the order matters for certain bots. Google handles this correctly, but it’s best to avoid confusion — be explicit and minimalist.

What should you do specifically to fix this issue?

Identify all API endpoints essential for the rendering of your indexable pages. Create a list: /api/products, /api/posts, /api/categories, etc. Ensure none of these paths appear in a Disallow directive.

If you must block some APIs for security reasons, instead use server-side authentication (tokens, headers, strict CORS) rather than relying on robots.txt. This file is not a firewall — it’s a guideline for cooperative bots.

Audit robots.txt and remove any Disallow: /api/ rule that is too broad
Test JavaScript rendering via Search Console on 10-20 strategic pages
Compare the HTML crawled by Google with the actual browser rendering
List critical API endpoints and explicitly allow their crawling if necessary
Set up an indexing tracking alert to detect sudden drops
Document robots.txt rules and their justification in your SEO runbook

This type of robots.txt × JavaScript rendering audit can quickly become a headache if your tech stack mixes several frameworks, external APIs, and historical rules. The interdependencies can sometimes be opaque, and an improperly balanced fix can block something else without warning. If you lack time or internal expertise to effectively map your rendering architecture, bringing in a technical SEO agency that masters these issues can save you months of blind diagnosis and fixes.

❓ Frequently Asked Questions

Est-ce que Googlebot exécute JavaScript si l'API est bloquée dans robots.txt ?

Oui, Googlebot exécute le JavaScript, mais il bloque la requête fetch() ou XMLHttpRequest vers l'API interdite. Le script tourne, mais ne reçoit jamais les données, ce qui produit un DOM vide ou incomplet.

Comment savoir si mes pages sont indexées vides à cause de robots.txt ?

Utilisez l'outil d'inspection d'URL dans Search Console. Testez l'URL en direct, affichez le rendu HTML final et comparez-le avec ce que vous voyez dans votre navigateur. Si des blocs de contenu manquent côté Google, vérifiez robots.txt.

Puis-je bloquer /api/ pour économiser du crawl budget sans impact SEO ?

Non, pas si ces API servent à charger du contenu indexable. Bloquer /api/ économise zéro crawl budget réel — Googlebot ne crawle ces endpoints que quand le JavaScript les appelle. Vous cassez juste le rendu.

Les frameworks comme Next.js ou Nuxt sont-ils concernés par ce problème ?

Ça dépend. Si vous utilisez SSR (getServerSideProps, asyncData serveur), le contenu est injecté avant l'envoi HTML et robots.txt n'intervient pas. Mais en mode CSR ou ISR avec revalidation client, les API doivent rester accessibles.

Faut-il autoriser explicitement les API dans robots.txt ou simplement ne pas les bloquer ?

Par défaut, tout ce qui n'est pas interdit est autorisé. Vous n'avez pas besoin d'un Allow: /api/ explicite sauf si une règle Disallow plus large entre en conflit. Restez minimaliste : ne bloquez que ce qui doit l'être.

🏷 Related Topics

robots.txt JavaScript SEO crawl API rendu Google indexation SPA fetch blocking Search Console Googlebot

Domain Age & History Crawl & Indexing AI & SEO JavaScript & Technical SEO

🎥 From the same video 28

Other SEO insights extracted from this same Google Search Central video · duration 46 min · published on 25/11/2020

🎥 Watch the full video on YouTube →

Related statements

« Previous

Initial HTML Analysis: Links, Errors, Meta Tags...

Performance Optimization: User-Centric Approach...

« Back to results