Official statement
Other statements from this video 4 ▾
- 7:08 Faut-il vraiment limiter le nombre de ressources HTTP par page pour le SEO ?
- 10:35 Faut-il vraiment cacher les commentaires utilisateurs de Google ?
- 13:49 Un taux de crawl faible est-il vraiment un problème pour votre SEO ?
- 14:51 Comment débloquer une page blanche dans Google avec la méthode de bissection ?
Google claims that a robots noindex header on an API endpoint does not block Googlebot from calling this resource during rendering. Unlike robots.txt, which completely prevents access, noindex only pertains to final indexing. For an SEO working on JavaScript sites, this means that blocking an API with noindex does not necessarily protect your data from crawling — and importantly, it doesn't stop the bot from retrieving the content necessary for displaying your pages.
What you need to understand
What’s the difference between noindex and robots.txt for Googlebot?
The confusion arises from the fact that both mechanisms seem to block access, but they operate at completely different levels. Robots.txt acts like a doorman: it prevents Googlebot from entering, from downloading the resource. End of story.
The noindex, on the other hand, lets Googlebot in, retrieves the data, processes it — but tells it, ‘don’t store that in your index.’ It’s a post-retrieval directive. For a typical HTML page, the distinction is often negligible. But for an API endpoint that feeds client-side rendering, it’s a different story.
Why does this clarification from Google change the game for JavaScript sites?
Thousands of modern sites operate with decoupled architectures: an empty HTML shell, and JavaScript that calls APIs to load the actual content. If you placed a noindex on your endpoint /api/products.json thinking, ‘that way Google won’t see my data,’ you’re mistaken.
Googlebot will still call that API during rendering, retrieve the JSON, inject it into the DOM, and index the final result on the rendered page. The noindex only protects the URL of the API itself — not its content used elsewhere. It’s a classic trap for teams that confuse ‘not indexing a resource’ with ‘preventing its use.’
In what practical scenarios does this confusion cause problems?
First scenario: you have a documented public API that you don’t want to see appear in SERPs as a page. You place a noindex — perfect, it works. But if that same API feeds your product pages, Googlebot will continue to call it to render those pages. The noindex has no effect on that process.
Second scenario, more insidious: you thought you were saving crawl budget by putting noindex on resource-heavy endpoints. Wrong. Googlebot still crawls them for rendering — you haven’t saved anything, you’ve just removed a URL from the index without impacting actual server resource consumption. If your goal was to limit calls, you needed to use robots.txt or server-side access controls.
- Robots.txt blocks retrieval: Googlebot never downloads the resource
- Noindex allows crawling: Googlebot retrieves the resource but does not index it as a standalone page
- For JavaScript rendering, Googlebot calls the necessary APIs even if they are marked with noindex — only their indexing as a distinct URL is blocked
- To truly protect data, you need to combine robots.txt, authentication, or server-side disabling — not just a noindex
- No crawl budget is saved by a noindex on a resource used in rendering
SEO Expert opinion
Is this statement consistent with field observations?
Yes, and it’s even a welcome reminder in light of a widespread conceptual error. In the field, we regularly see technical teams adding noindex to sensitive endpoints thinking, ‘that way Google won’t touch them.’ Mistake. Server logs clearly show that Googlebot continues to crawl these resources as soon as they are called by the JavaScript of an indexable page.
What’s interesting is that Google does not always explicitly document this nuance. Most official guides talk about noindex in the context of full HTML pages, not APIs. Martin Splitt fills a documentation gap here — and likely avoids unnecessary support tickets.
What nuances should be added to this statement?
The statement is correct but incomplete. It does not specify whether Googlebot will always crawl the endpoint or only if it detects it is necessary for rendering. On complex sites with dozens of APIs, Googlebot does not call them all — there is a form of prioritization. [To be verified]: does Googlebot respect a noindex as a signal of ‘this resource is not important’ and reduce its crawl frequency, even if it does not block it entirely?
Another gray area: APIs with authentication. If your endpoint returns a 401 or 403, Googlebot will obviously not be able to crawl it for rendering — but that has nothing to do with the noindex. If you have a noindex AND an auth barrier, it's the latter that does the work. The noindex then becomes purely cosmetic.
In what cases does this rule not apply?
If the API endpoint is blocked by robots.txt, then yes, Googlebot will not be able to call it — and the rendering will fail if the page depends on it. This is the expected and documented behavior. But beware: blocking a critical API in robots.txt breaks the indexing of all pages that depend on it. This is a beginner's mistake, but it remains common.
Another exception: APIs called only on the server side (SSR, SSG). If your Next.js or Nuxt generates HTML server-side by calling an internal API, Googlebot never sees that call — it receives the final HTML directly. In this case, the noindex on the API is completely transparent. The problem only concerns CSR (Client-Side Rendering) architectures where the bot must execute JavaScript to trigger API calls.
Practical impact and recommendations
What should you do if you expose APIs used for rendering?
First rule: never use noindex as a security mechanism. If your APIs contain data that you absolutely do not want crawled (PII, internal pricing, competitive data), noindex is not enough. Implement OAuth authentication, tokens, or IP whitelists. Yes, it complicates rendering for Googlebot — that's the price to pay.
Second point: if your goal is merely to prevent indexing of the API URL as a standalone page in Google, then the noindex does the job. But be aware that Googlebot will still crawl this resource if it needs it to render your pages. That’s not a problem in itself, it’s just important to understand what’s actually happening server-side.
What mistakes should be avoided when configuring headers on APIs?
Classic mistake: blocking APIs in robots.txt without realizing the impact. You think you're saving crawl budget, but you’re breaking the rendering of all your product pages. Result: massive deindexation, plummeting traffic, and a dev team scratching their heads for hours wondering why 'Google can’t see anything anymore.'
Another trap: putting an X-Robots-Tag: noindex on an API while leaving a sitemap.xml that references this URL. Google crawls it, sees the noindex, deindexes it — but continues to crawl because of the sitemap. You are wasting crawl budget for nothing. If you place a noindex, remove the URL from the sitemaps.
How to check that your configuration does not impact rendering?
Use Google Search Console, in the
❓ Frequently Asked Questions
Si je mets un noindex sur mon API, Googlebot va-t-il quand même la crawler pour rendre mes pages ?
Quelle est la différence entre bloquer une API avec noindex et avec robots.txt ?
Est-ce que mettre noindex sur mes API réduit mon crawl budget ?
Comment protéger une API contenant des données sensibles si le noindex ne suffit pas ?
Si mon API retourne un 401 ou 403, Googlebot peut-il quand même la crawler pour le rendu ?
🎥 From the same video 4
Other SEO insights extracted from this same Google Search Central video · duration 19 min · published on 11/06/2020
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.