Are structured data on noindexed pages really lost to Google?

Official statement

Google probably does not take into account structured data on pages marked noindex, as the processing halts before it analyzes the structured data. Regarding link extraction, it can happen in parallel, and some links may still be discovered even on noindexed pages.

37:12

🎥 Source video

Extracted from a Google Search Central video

⏱ 46:02 💬 EN 📅 25/11/2020 ✂ 29 statements

Watch on YouTube (37:12) →

✂ Other statements from this video 28 ▾

📅

Official statement from November 25, 2020 (5 years ago)

⚠ A more recent statement exists on this topic Is Noindex Enough, or Should You Use Noindex+Nofollow to Block SEO Signals? John Mueller · October 7, 2021 View statement →

TL;DR

Google likely ignores structured data on noindexed pages, as its processing stops before analyzing them. This means your rich snippets and Schema.org tags will have no effect on these pages. However, link extraction can occur alongside — some links present on these pages may still be discovered and followed despite the noindex.

What you need to understand

Why doesn't Google process structured data on noindexed pages?

The Google processing pipeline operates in distinct steps. When a page has a noindex tag, the engine interrupts the process before reaching the in-depth content analysis phase. It is at this stage that the structured data (Schema.org, JSON-LD, microformats) are normally extracted and interpreted.

In practical terms, Google decides very early on if a page can be indexed or not. If the answer is negative, it does not invest resources in comprehensive semantic analysis — which includes parsing structured data. The result: your Schema.org Article, Product, FAQ, or other tags will never be taken into account to generate rich snippets or enhance the Knowledge Graph.

Does link extraction work differently on these pages?

Splitt emphasizes that link extraction can occur alongside the main processing. In other words, even if a page is marked noindex and its content is not thoroughly analyzed, Googlebot can still discover and follow the links present in the HTML code.

This nuance is crucial: a noindex page is not necessarily isolated from the rest of the crawl. It can serve as a throughway to discover other URLs, especially if it is part of an important navigation structure (pagination, filters, duplicate categories). The bot traverses the links but does not utilize the rest of the content.

What implications are there for sites with many noindex pages?

If you heavily use noindex to manage duplicate content, filters, or low-value pages, your structured data on those pages is wasted. No SEO benefit will come from it — no stars in the SERPs, no enhancement of product listings, no FAQ directly displayed.

However, these pages can still contribute to internal linking and content discovery. If they point to strategic indexable pages, they facilitate crawling. But beware: multiplying noindex pages with many links can dilute the crawl budget and complicate how Google understands your architecture.

Structured data is not processed on noindexed pages — they generate neither rich snippets nor contribute to the Knowledge Graph.
Links can be extracted and followed, even if the page is excluded from the index.
Noindex interrupts processing before in-depth semantic analysis of content.
Using Schema.org on noindex pages is pointless and constitutes a technical effort with no return.
Internal linking through noindex pages still works, but it must be managed to avoid dispersing the crawl budget.

SEO Expert opinion

Is this statement consistent with field observations?

Yes, it is. It is regularly observed that noindex pages with Schema.org never generate rich snippets in search results. Google's internal logic is simple: why analyze a page deeply that will never be displayed? This also corresponds to the modular architecture of the crawl and indexing pipeline, where each step is optimized to save resources.

However, Splitt's wording remains cautious: "probably" is not a firm commitment. We can imagine edge cases where some data is still captured, for instance during an initial crawl before detecting the noindex, or if the page switches from indexable to noindex after initial processing. [To verify]: no public data precisely documents the exact timing of this interruption in the pipeline.

What nuances should be added to this rule?

The "parallel" link extraction is an important but vague piece of information. Splitt does not specify whether this process is systematic or conditioned by other factors (page popularity, available crawl budget, depth in the site hierarchy). In practice, we find that some deeply noindexed or poorly linked pages do not effectively contribute to the discovery of new URLs.

Furthermore, this rule only applies to noindex via meta tag or HTTP header. If you block a page via robots.txt, Google does not crawl the content at all — thus it cannot extract links, parse structured data, or even detect a potential noindex in the code. This is a fundamental difference often misunderstood.

In what cases could this logic cause problems?

Imagine an e-commerce site with filter pages marked noindex to avoid duplicate content. These pages often contain detailed Schema.org Product, thinking to enhance Google's overall understanding of the catalog. It is a wasted technical effort: this data will never be utilized, neither for rich snippets, nor for the Merchant Center, nor for any other semantic processing.

Another frequent case: noindex AMP pages with Schema.org Article, used as alternative versions. If only the AMP version carries the structured data and is noindex, Google will never see it. It is important to ensure that the indexable canonical version contains the same tags.

Warning: do not confuse noindex with disallow in robots.txt. The former allows crawling but prohibits indexing, while the latter completely blocks access to the content — and thus any extraction of links or structured data.

Practical impact and recommendations

What should you concretely do with existing noindex pages?

First action: audit your noindex pages to identify those containing Schema.org or other structured data. Use Screaming Frog or a similar crawler to cross the noindex directive with the presence of JSON-LD, microdata, or RDFa tags. If you find any, ask yourself why this data is there — in 99% of cases, it's an oversight or a misunderstanding of how Google works.

Next, decide whether these pages should remain noindex. If they should (duplicate content, low quality, pagination), remove the structured data to lighten the code and simplify maintenance. Conversely, if these pages have value and deserve to be indexed, remove the noindex and ensure they meet Google’s content quality criteria.

How to optimize internal linking via noindex pages?

Since Google can extract links even on noindexed pages, use them strategically to distribute the crawl to your priority pages. For example, a noindex filter page can point to high-value indexable product listings. However, be careful not to create a link maze: too many intermediate noindex pages lengthen crawl paths and dilute effectiveness.

Also, limit the number of outgoing links on these pages. If a noindex page contains 200 links, Google will have to evaluate all of them, consuming crawl budget with no direct return on indexing. Prefer targeted links to a few strategic URLs rather than exhaustive navigation.

What mistakes should be absolutely avoided?

Never mark a noindex page from which you expect a rich snippet — this seems obvious, yet it still occurs regularly with FAQs, recipes, or customer reviews. Also, check that your canonical tags correctly point to indexable pages: a canonical pointing to a noindex page creates an inconsistency that can block indexing.

Also, avoid blocking pages you want to mark as noindex via robots.txt. Google will not see the noindex directive and will continue to attempt to index the URL (without content), resulting in errors in Search Console. The rule is simple: robots.txt blocks crawling, noindex blocks indexing — the two do not effectively accumulate.

Audit all noindex pages to detect unnecessary structured data presence
Remove Schema.org on pages permanently excluded from the index
Ensure that strategic pages with structured data are indeed indexable
Use noindex pages as carriers for targeted internal linking, not as hubs for exhaustive navigation
Never block via robots.txt a page you wish to mark as noindex
Check that the canonical tags point to indexable URLs

Structured data on noindexed pages is a technical effort without return. Clean up your code, streamline your use of noindex, and reserve Schema.org for pages that can genuinely benefit from it in the SERPs. If your site has hundreds of pages with this configuration, a thorough audit is necessary — and the assistance of a specialized SEO agency can save you valuable time by quickly identifying inconsistencies and prioritizing corrective actions based on their real impact on your performance.

❓ Frequently Asked Questions

Les données structurées sur une page noindex peuvent-elles quand même aider au référencement ?

Non, Google ne les traite probablement pas. Elles ne génèrent ni rich snippet, ni contribution au Knowledge Graph. C'est un effort technique perdu.

Google suit-il les liens présents sur une page noindex ?

Oui, l'extraction de liens peut se produire en parallèle du traitement principal. Certains liens peuvent donc être découverts et suivis malgré le noindex.

Faut-il supprimer le Schema.org de mes pages filtres noindex ?

Oui, c'est inutile de le conserver. Cela alourdit le code sans aucun bénéfice SEO, puisque Google ne traite pas les données structurées sur ces pages.

Une page bloquée par robots.txt peut-elle transmettre des liens ou des données structurées ?

Non, robots.txt empêche totalement le crawl. Google ne voit ni le contenu, ni les liens, ni les données structurées. C'est différent du noindex.

Peut-on utiliser des pages noindex pour optimiser le maillage interne ?

Oui, mais avec modération. Ces pages peuvent servir de relais de crawl vers des URLs indexables, à condition de ne pas créer de labyrinthes de liens qui dispersent le crawl budget.

🏷 Related Topics

noindex données structurées Schema.org crawl budget maillage interne indexation rich snippets extraction liens

Domain Age & History Crawl & Indexing Structured Data AI & SEO Links & Backlinks Pagination & Structure

🎥 From the same video 28

Other SEO insights extracted from this same Google Search Central video · duration 46 min · published on 25/11/2020

🎥 Watch the full video on YouTube →

Related statements

« Previous

Duplicate Detection on Both Initial HTML and Rende...

« Back to results