Official statement
Other statements from this video 9 ▾
- 1:00 Les positions Search Console reflètent-elles vraiment le classement de vos pages ?
- 18:16 La migration HTTPS fait-elle encore perdre du PageRank avec une 301 ?
- 21:56 Faut-il vraiment configurer hreflang sur un blog multilingue ?
- 23:41 Le HTTPS est-il vraiment un signal de classement faible ou faut-il le prioriser pour ranker ?
- 38:52 La qualité globale de votre site bloque-t-elle vos extraits enrichis ?
- 47:29 Le fichier robots.txt protège-t-il vraiment vos pages de l'indexation Google ?
- 51:40 Google peut-il vraiment identifier ta marque sans espace dans les balises title ?
- 52:51 Est-ce qu'une redirection 302 dilue vraiment le PageRank ?
- 55:05 Comment Google compte-t-il vraiment les impressions et clics dans vos rapports Search Console ?
Google ignores X-Robots-Tags returned in AJAX responses, which can be a problem if you rely on this method to control the indexing of dynamically loaded content. To effectively exclude content, prefer using robots.txt or inject meta robots directives via JavaScript directly into the DOM. This technical limitation requires rethinking the indexing strategy on AJAX-heavy sites.
What you need to understand
Why does Google ignore X-Robots-Tags in AJAX calls?
X-Robots-Tags are HTTP headers that allow for controlling indexing without altering HTML. They work perfectly for classic resources (pages, PDFs, images). However, when it comes to AJAX requests, Google simply doesn't read them.
The reason? Google's rendering engine executes JavaScript, retrieves the final content injected into the DOM, but does not check the HTTP headers of background XHR or Fetch requests. These calls remain invisible to indexing, even though their content ends up being displayed on the page. Google focuses on what appears in the DOM after the full execution of JS.
What does this change for an AJAX-loaded site?
If your architecture relies on Single Page Applications (SPAs) or massive dynamic loading, you cannot count on X-Robots-Tags on the server side to block certain portions of content. Google will index what is rendered in the final HTML, regardless of what your HTTP headers state during internal requests.
Specifically, a search filter module that loads results via fetch() with a X-Robots-Tag: noindex will not be followed. Google will see the injected content and will index it normally unless something else prevents it. This is a classic trap for e-commerce sites with combined filters or complex applications.
What are the functional technical alternatives?
Mueller mentions two solutions: robots.txt to block the JSON/API endpoints serving AJAX content, or injecting meta robots via JavaScript directly into the document head after loading. The first blocks the crawl of resources, while the second adds indexing directives in the final DOM that Google can read.
The robots.txt works well if you want to prevent Google from crawling the URLs that return JSON data. But be careful: blocking in robots.txt stops the crawl, not the indexing if the URL is linked elsewhere. For fine control, injecting JS meta tags remains the most reliable on an SPA.
- Google only reads the final DOM after executing JavaScript, not the HTTP headers from internal AJAX requests
- X-Robots-Tags in XHR/Fetch responses are completely ignored by the indexing engine
- Solution 1: block JSON endpoints in robots.txt to prevent resource crawling
- Solution 2: inject meta robots tags (noindex, nofollow) via JavaScript into the head after loading
- This limitation mainly affects SPA architectures and sites with large dynamic content loading
SEO Expert opinion
Is this statement consistent with field observations?
Yes, and it actually confirms a behavior that many SEOs suspected without having official validation. In practice, it is regularly observed that AJAX-loaded content ends up indexed despite restrictive HTTP headers. Google renders JavaScript, assembles the DOM, and indexes what it sees on the screen. The headers from sub-requests play no role in this decision.
What is missing here is the distinction between crawling and indexing. Mueller states that Google ignores these headers, but does not clarify whether Googlebot still crawls the AJAX endpoints for other reasons (link discovery, structure analysis). [To be verified] on server logs: are the JSON URLs called in AJAX directly visited by Googlebot, or only through JS rendering?
In what cases does this rule really create problems?
Headless or JAMstack architectures that serve content via API and relied on X-Robots-Tags to manage indexing at the API level are directly affected. If you have tens of thousands of filter combinations, paginated result pages, or variants of dynamically loaded content, the lack of fine control via HTTP headers seriously complicates the management of crawl budget.
Another problematic case: sites that load blocks of duplicate content (customer reviews, product descriptions shared between variants) and hoped to use noindex on these fragments. Google will see everything in the final DOM and may index duplicate content that you thought you had neutralized on the server side.
What are the gray areas not covered by Mueller?
Mueller remains vague on the exact behavior of Googlebot regarding JSON endpoints blocked in robots.txt but linked elsewhere. Does Google index these URLs without crawling them, solely based on external links? [To be verified] because the official documentation states that a URL blocked in robots.txt can still be indexed if it receives backlinks.
Another unclear point: what happens if you inject a noindex meta robots via JS, but Google does not fully execute the JavaScript for some reason (timeout, error, exhausted crawl budget)? Will the content be indexed by default? Probably yes, and this is a non-negligible risk on large sites where JS rendering is never guaranteed at 100%.
Practical impact and recommendations
What should you concretely do on an AJAX-heavy site?
First step: audit all endpoints that serve dynamic content (JSON APIs, HTML fragments, modules loaded via XHR). Identify those that return content you do not want indexed. Then, choose your method: block these URLs in robots.txt if they should never be crawled, or inject meta robots directives via JS if you want page-by-page control.
For SPAs, prefer a hybrid approach: robots.txt for pure API routes (endpoints /api/*, /data/*), and JS injection of meta robots for application pages where content is assembled client-side. Systematically test the rendering with the URL inspection tool in Search Console to ensure your directives appear in the final DOM.
How can I check if my indexing directives are properly taken into account?
Use the "URL Inspection" tool in Search Console and look at the rendered HTML code. If you injected a noindex meta robots via JavaScript, it should appear in the head of the HTML displayed by Google. If it does not, your script is not executing correctly or too late in the page lifecycle.
Also check the server logs to see if Googlebot is crawling your JSON endpoints directly. If so, ensure they are properly blocked in robots.txt or return a 404/410 if the content is meaningless out of the application context. An endpoint that returns raw JSON will usually not be indexed, but an endpoint that returns fragmentary HTML might be.
What errors should be absolutely avoided?
Never rely on X-Robots-Tags in AJAX responses to control indexing. This is the main lesson here. If you have an X-Robots-Tag: noindex header on your fetch() requests, remove it: it serves no purpose and creates a false sense of security.
Also avoid blocking critical JS/CSS resources needed for rendering in robots.txt. Google needs to execute your JavaScript to see the final content and the indexing directives you inject. Blocking resources necessary for rendering means preventing Google from reading your meta robots injected via JS, and thus losing all control over indexing.
- Audit all API/JSON endpoints serving dynamic content and identify those to exclude from indexing
- Block in robots.txt the API routes that should never be crawled (/api/*, /data/*, etc.)
- Inject meta robots tags (noindex, nofollow) via JavaScript into the head for page-by-page control
- Test the rendering with the URL inspection tool in Search Console to verify the presence of directives in the DOM
- Check server logs to detect direct crawling of JSON endpoints by Googlebot
- Never block in robots.txt critical JS/CSS resources necessary for page rendering
❓ Frequently Asked Questions
Est-ce que Google crawle quand même les URLs JSON appelées en AJAX ?
Les X-Robots-Tag fonctionnent-ils encore pour les ressources classiques (images, PDF) ?
Peut-on utiliser les X-Robots-Tag sur la page principale d'une SPA ?
Que se passe-t-il si Google n'exécute pas mon JavaScript qui injecte le meta robots ?
Bloquer en robots.txt empêche-t-il totalement l'indexation des endpoints JSON ?
🎥 From the same video 9
Other SEO insights extracted from this same Google Search Central video · duration 57 min · published on 12/02/2016
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.