Official statement
Other statements from this video 28 ▾
- 1:02 Does Google really render all JavaScript pages, regardless of their architecture?
- 1:02 Does Google really render ALL JavaScript, even without initial server-side content?
- 2:05 How can you ensure that Googlebot is truly crawling your site?
- 2:05 How can you ensure that Googlebot is genuinely Googlebot and not an imposter?
- 2:36 Does Google really limit CPU time during JavaScript rendering?
- 2:36 Is it true that Google actually limits CPU time during JavaScript rendering?
- 3:09 Should we stop optimizing for bots and focus solely on the user?
- 5:17 Does the CSS content-visibility property really affect rendering in Google?
- 8:53 How can you measure Core Web Vitals on Firefox and Safari without native API support?
- 11:00 How long does Google really wait before giving up on JavaScript rendering?
- 11:00 How long does Googlebot really wait for JavaScript rendering?
- 20:07 Does AJAX really work for SEO, or should you think twice before using it?
- 21:10 Can blocking JavaScript really stop Google from indexing all the content on your pages?
- 24:48 Has dynamic prerendering become a trap for indexing?
- 26:25 Could your deleted resources be harming your pre-render indexing?
- 26:47 What does Google really do with your initial HTML before JavaScript rendering?
- 27:28 Is it true that Google really analyzes everything in the initial HTML before rendering?
- 27:59 Is it true that Google ignores JavaScript rendering if your noindex tag appears in the initial HTML?
- 27:59 Could a 404 page with JavaScript lead to the complete deindexing of your site?
- 28:30 Why does Google refuse to render JavaScript if the initial HTML contains a meta noindex?
- 30:00 Does Google really compare the initial HTML AND rendered content for canonicalization?
- 30:01 Does Google really catch duplicate content after JavaScript rendering?
- 31:36 Are GET APIs really cached by Google just like any other resource?
- 31:36 Does Google really ignore POST requests during JavaScript rendering?
- 34:47 Does Google really index all pages after JavaScript rendering?
- 35:19 Does Google really render 100% of JavaScript pages before indexing?
- 36:51 How do your failing APIs sabotage your Google indexing?
- 37:12 Are structured data on noindexed pages really lost to Google?
Googlebot strictly adheres to the robots.txt file, including for JavaScript API requests. If you block /api/ in robots.txt, your pages won't load data on Google's side, even if they display normally in Chrome. The result: empty pages in the index while everything seems functional during your browser tests.
What you need to understand
How do browsers and Googlebot handle robots.txt differently?
Modern browsers completely ignore the robots.txt file. When you test your site in Chrome, Firefox, or Safari, each JavaScript request to your APIs goes through unfiltered. That’s why your page correctly displays the list of products, customer reviews, or price data.
Googlebot, on the other hand, strictly follows the robots.txt directives before executing any script. If an API URL is blocked, the bot doesn’t access it — period. The rendering on Google’s side then fails to load dynamic data, leaving you with empty HTML in the index.
How does this blocking create ghost pages in the index?
Imagine an e-commerce site that loads its product listings via fetch('/api/products/12345'). If robots.txt contains Disallow: /api/, Googlebot downloads the initial HTML, executes the JavaScript... but outright blocks the API request.
The DOM therefore remains skeletal: no product title, no description, no price. Google indexes this empty shell. When you test it in your browser, you see the complete page and think everything is fine. This is the classic manual testing trap that doesn’t reflect the realities of crawling.
Why are so many sites unknowingly blocking their APIs?
Many robots.txt files are automatically generated by CMSs or frameworks with default “security” rules. Developers block /api/ thinking they are protecting their data or preventing unnecessary crawling.
Other times, it’s a historical remnant: the site was built using pure server-side PHP and then migrated to React/Vue/Angular without cleaning up robots.txt. The result: critical endpoints remain blocked even though they are now essential for client rendering.
- Browsers never check robots.txt — your manual tests always pass
- Googlebot blocks any API request listed in Disallow, even for JavaScript rendering
- A poorly configured robots.txt generates empty pages in the index despite a functional site
- The problem is invisible without testing via Search Console or a Google rendering tool
- Modern frameworks amplify this risk by increasing client-side API calls
SEO Expert opinion
Is this statement consistent with what we observe in the field?
Absolutely. We regularly see SPAs or Jamstack sites with catastrophic indexing rates — 30% of pages indexed while the sitemap lists 10,000. When inspecting via the Search Console URL Testing tool, the HTML rendering shows empty <div id="app"></div>.
The diagnosis? A Disallow: /api/ or Disallow: /_next/data/ in robots.txt. Developers do not think “SEO” when configuring these rules — they think security, performance, or they copy-paste a template. And it breaks indexing without anyone realizing it for months.
What nuances should be added to this rule?
First point: if your content is already present in the initial HTML (SSR, pre-rendering, progressive hydration), blocking APIs has less impact. Google reads server-side content, even if subsequent JavaScript enhancements fail. But it remains a risky game — certain elements (dynamic prices, stock, reviews) may be missing.
Second nuance: some blocks /api/analytics, /api/tracking, or /api/user-prefs have no SEO impact and can legitimately remain blocked. The issue is that we often see Disallow: /api/ rules that are too broad, blocking everything. [To be checked] on a case-by-case basis: each endpoint should be assessed for its role in visible rendering.
When does this rule not apply?
If you are using strict server-side rendering (Next.js getServerSideProps, Nuxt asyncData server-side, classic PHP), robots.txt blocks nothing since data is injected before the HTML is sent. Googlebot receives complete content without executing JavaScript.
Another exception: sites that load non-indexable content by design (member areas, carts, user preferences). There, blocking /api/ is intentional and has no negative SEO impact. But let’s be honest — most of the time, it’s a configuration accident, not a carefully thought-out strategy.
Practical impact and recommendations
How can you check if your APIs are not blocked?
First step: open your robots.txt and look for any line containing Disallow: /api, Disallow: /_next, Disallow: /data or equivalent. If you find this, it's an immediate red flag.
Next, use the URL inspection tool in Search Console. Click on ‘Test URL live’, then ‘View crawled page’ > ‘More info’ > ‘JavaScript’. Compare the final rendering with your actual page in Chrome. If entire sections are missing (products, articles, data), you’ve found the culprit.
What mistakes should be avoided when configuring robots.txt?
Never block an entire path like /api/ without thinking. If you need to protect certain endpoints, list them individually: Disallow: /api/admin, Disallow: /api/user-settings. Let through what serves public rendering.
Another classic pitfall: poorly ordered cascading rules. If you write Disallow: /api/ and then Allow: /api/products, the order matters for certain bots. Google handles this correctly, but it’s best to avoid confusion — be explicit and minimalist.
What should you do specifically to fix this issue?
Identify all API endpoints essential for the rendering of your indexable pages. Create a list: /api/products, /api/posts, /api/categories, etc. Ensure none of these paths appear in a Disallow directive.
If you must block some APIs for security reasons, instead use server-side authentication (tokens, headers, strict CORS) rather than relying on robots.txt. This file is not a firewall — it’s a guideline for cooperative bots.
- Audit robots.txt and remove any Disallow: /api/ rule that is too broad
- Test JavaScript rendering via Search Console on 10-20 strategic pages
- Compare the HTML crawled by Google with the actual browser rendering
- List critical API endpoints and explicitly allow their crawling if necessary
- Set up an indexing tracking alert to detect sudden drops
- Document robots.txt rules and their justification in your SEO runbook
❓ Frequently Asked Questions
Est-ce que Googlebot exécute JavaScript si l'API est bloquée dans robots.txt ?
Comment savoir si mes pages sont indexées vides à cause de robots.txt ?
Puis-je bloquer /api/ pour économiser du crawl budget sans impact SEO ?
Les frameworks comme Next.js ou Nuxt sont-ils concernés par ce problème ?
Faut-il autoriser explicitement les API dans robots.txt ou simplement ne pas les bloquer ?
🎥 From the same video 28
Other SEO insights extracted from this same Google Search Central video · duration 46 min · published on 25/11/2020
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.