Are X-Robots-Tags in AJAX really ignored by Google?

Official statement

Google generally ignores X-Robots-Tags in AJAX responses. To exclude content from indexing, use a robots.txt file or integrate tags into the main content via JavaScript.

8:50

🎥 Source video

Extracted from a Google Search Central video

⏱ 57:33 💬 EN 📅 12/02/2016 ✂ 10 statements

Watch on YouTube (8:50) →

✂ Other statements from this video 9 ▾

1:00 Les positions Search Console reflètent-elles vraiment le classement de vos pages ?
18:16 La migration HTTPS fait-elle encore perdre du PageRank avec une 301 ?
21:56 Faut-il vraiment configurer hreflang sur un blog multilingue ?
23:41 Le HTTPS est-il vraiment un signal de classement faible ou faut-il le prioriser pour ranker ?
38:52 La qualité globale de votre site bloque-t-elle vos extraits enrichis ?
47:29 Le fichier robots.txt protège-t-il vraiment vos pages de l'indexation Google ?
51:40 Google peut-il vraiment identifier ta marque sans espace dans les balises title ?
52:51 Est-ce qu'une redirection 302 dilue vraiment le PageRank ?
55:05 Comment Google compte-t-il vraiment les impressions et clics dans vos rapports Search Console ?

What you need to understand

Why does Google ignore X-Robots-Tags in AJAX calls?

X-Robots-Tags are HTTP headers that allow for controlling indexing without altering HTML. They work perfectly for classic resources (pages, PDFs, images). However, when it comes to AJAX requests, Google simply doesn't read them.

The reason? Google's rendering engine executes JavaScript, retrieves the final content injected into the DOM, but does not check the HTTP headers of background XHR or Fetch requests. These calls remain invisible to indexing, even though their content ends up being displayed on the page. Google focuses on what appears in the DOM after the full execution of JS.

What does this change for an AJAX-loaded site?

If your architecture relies on Single Page Applications (SPAs) or massive dynamic loading, you cannot count on X-Robots-Tags on the server side to block certain portions of content. Google will index what is rendered in the final HTML, regardless of what your HTTP headers state during internal requests.

Specifically, a search filter module that loads results via fetch() with a X-Robots-Tag: noindex will not be followed. Google will see the injected content and will index it normally unless something else prevents it. This is a classic trap for e-commerce sites with combined filters or complex applications.

What are the functional technical alternatives?

Mueller mentions two solutions: robots.txt to block the JSON/API endpoints serving AJAX content, or injecting meta robots via JavaScript directly into the document head after loading. The first blocks the crawl of resources, while the second adds indexing directives in the final DOM that Google can read.

The robots.txt works well if you want to prevent Google from crawling the URLs that return JSON data. But be careful: blocking in robots.txt stops the crawl, not the indexing if the URL is linked elsewhere. For fine control, injecting JS meta tags remains the most reliable on an SPA.

Google only reads the final DOM after executing JavaScript, not the HTTP headers from internal AJAX requests
X-Robots-Tags in XHR/Fetch responses are completely ignored by the indexing engine
Solution 1: block JSON endpoints in robots.txt to prevent resource crawling
Solution 2: inject meta robots tags (noindex, nofollow) via JavaScript into the head after loading
This limitation mainly affects SPA architectures and sites with large dynamic content loading

SEO Expert opinion

Is this statement consistent with field observations?

Yes, and it actually confirms a behavior that many SEOs suspected without having official validation. In practice, it is regularly observed that AJAX-loaded content ends up indexed despite restrictive HTTP headers. Google renders JavaScript, assembles the DOM, and indexes what it sees on the screen. The headers from sub-requests play no role in this decision.

What is missing here is the distinction between crawling and indexing. Mueller states that Google ignores these headers, but does not clarify whether Googlebot still crawls the AJAX endpoints for other reasons (link discovery, structure analysis). [To be verified] on server logs: are the JSON URLs called in AJAX directly visited by Googlebot, or only through JS rendering?

In what cases does this rule really create problems?

Headless or JAMstack architectures that serve content via API and relied on X-Robots-Tags to manage indexing at the API level are directly affected. If you have tens of thousands of filter combinations, paginated result pages, or variants of dynamically loaded content, the lack of fine control via HTTP headers seriously complicates the management of crawl budget.

Another problematic case: sites that load blocks of duplicate content (customer reviews, product descriptions shared between variants) and hoped to use noindex on these fragments. Google will see everything in the final DOM and may index duplicate content that you thought you had neutralized on the server side.

What are the gray areas not covered by Mueller?

Mueller remains vague on the exact behavior of Googlebot regarding JSON endpoints blocked in robots.txt but linked elsewhere. Does Google index these URLs without crawling them, solely based on external links? [To be verified] because the official documentation states that a URL blocked in robots.txt can still be indexed if it receives backlinks.

Another unclear point: what happens if you inject a noindex meta robots via JS, but Google does not fully execute the JavaScript for some reason (timeout, error, exhausted crawl budget)? Will the content be indexed by default? Probably yes, and this is a non-negligible risk on large sites where JS rendering is never guaranteed at 100%.

Attention: Do not confuse "Google ignores AJAX X-Robots-Tags" with "Google does not crawl endpoints". Googlebot can very well visit your JSON URLs directly if it discovers them elsewhere, and index them if they return HTML or interpretable content. Blocking in robots.txt remains the safest method to prevent complete crawling.

Practical impact and recommendations

What should you concretely do on an AJAX-heavy site?

First step: audit all endpoints that serve dynamic content (JSON APIs, HTML fragments, modules loaded via XHR). Identify those that return content you do not want indexed. Then, choose your method: block these URLs in robots.txt if they should never be crawled, or inject meta robots directives via JS if you want page-by-page control.

For SPAs, prefer a hybrid approach: robots.txt for pure API routes (endpoints /api/*, /data/*), and JS injection of meta robots for application pages where content is assembled client-side. Systematically test the rendering with the URL inspection tool in Search Console to ensure your directives appear in the final DOM.

How can I check if my indexing directives are properly taken into account?

Use the "URL Inspection" tool in Search Console and look at the rendered HTML code. If you injected a noindex meta robots via JavaScript, it should appear in the head of the HTML displayed by Google. If it does not, your script is not executing correctly or too late in the page lifecycle.

Also check the server logs to see if Googlebot is crawling your JSON endpoints directly. If so, ensure they are properly blocked in robots.txt or return a 404/410 if the content is meaningless out of the application context. An endpoint that returns raw JSON will usually not be indexed, but an endpoint that returns fragmentary HTML might be.

What errors should be absolutely avoided?

Never rely on X-Robots-Tags in AJAX responses to control indexing. This is the main lesson here. If you have an X-Robots-Tag: noindex header on your fetch() requests, remove it: it serves no purpose and creates a false sense of security.

Also avoid blocking critical JS/CSS resources needed for rendering in robots.txt. Google needs to execute your JavaScript to see the final content and the indexing directives you inject. Blocking resources necessary for rendering means preventing Google from reading your meta robots injected via JS, and thus losing all control over indexing.

Audit all API/JSON endpoints serving dynamic content and identify those to exclude from indexing
Block in robots.txt the API routes that should never be crawled (/api/*, /data/*, etc.)
Inject meta robots tags (noindex, nofollow) via JavaScript into the head for page-by-page control
Test the rendering with the URL inspection tool in Search Console to verify the presence of directives in the DOM
Check server logs to detect direct crawling of JSON endpoints by Googlebot
Never block in robots.txt critical JS/CSS resources necessary for page rendering

Managing indexing on modern AJAX-oriented architectures requires a sharp technical approach and constant monitoring. Between injective directives via JavaScript, selective blocking in robots.txt, and systematic validation of rendering on Google's side, there are many traps. If your site relies on a complex SPA architecture or massive streams of dynamic content, hiring an SEO agency specialized in JavaScript SEO can help you avoid costly mistakes in crawl budget and unwanted indexing of undesired content. A thorough technical audit and a tailored implementation plan will ensure fine control over what Google truly indexes.

❓ Frequently Asked Questions

Est-ce que Google crawle quand même les URLs JSON appelées en AJAX ?

Googlebot peut découvrir et crawler directement ces URLs s'il les trouve dans des liens ou des sitemaps. Bloquer en robots.txt reste la méthode la plus sûre pour empêcher ce crawl.

Les X-Robots-Tag fonctionnent-ils encore pour les ressources classiques (images, PDF) ?

Oui, absolument. Cette limitation concerne uniquement les réponses AJAX (XHR, Fetch). Pour les ressources servies directement (HTML, PDF, images), les X-Robots-Tag dans les en-têtes HTTP fonctionnent normalement.

Peut-on utiliser les X-Robots-Tag sur la page principale d'une SPA ?

Oui, si la page principale est servie en SSR ou pré-rendue. Mais pour le contenu chargé dynamiquement ensuite via AJAX, ces en-têtes ne s'appliquent pas. Seul le DOM final compte.

Que se passe-t-il si Google n'exécute pas mon JavaScript qui injecte le meta robots ?

Le contenu sera probablement indexé par défaut, car Google ne verra pas la directive noindex. C'est un risque réel sur les sites où le rendu JS est complexe ou lent.

Bloquer en robots.txt empêche-t-il totalement l'indexation des endpoints JSON ?

Non, bloquer en robots.txt empêche le crawl mais pas l'indexation si l'URL reçoit des backlinks. Pour garantir la non-indexation, combinez robots.txt et une réponse HTTP 404 ou 410 sur ces endpoints.

🎥 From the same video 9

Other SEO insights extracted from this same Google Search Central video · duration 57 min · published on 12/02/2016

🎥 Watch the full video on YouTube →