Does Google really utilize ALL of your structured data, including the invalid ones?

Official statement

Google extracts all forms of structured data present on a page, not just those validated by the Rich Results Test. This data is used, among other things, to infer information about the entities present on the pages.

9:01

🎥 Source video

Extracted from a Google Search Central video

⏱ 32:02 💬 EN 📅 10/12/2020 ✂ 12 statements

Watch on YouTube (9:01) →

✂ Other statements from this video 11 ▾

3:47 Chrome evergreen pour le rendering : Google met-il vraiment à jour son moteur aussi vite qu'annoncé ?
4:49 Google rend-il vraiment TOUTES les pages crawlées avec JavaScript ?
11:40 Le PageRank fonctionne-t-il encore vraiment comme on le pense ?
13:49 Faut-il vraiment renoncer à acheter des liens de qualité pour son SEO ?
15:23 Safe Search s'applique-t-il vraiment pendant l'indexation ?
15:54 Comment Google détecte-t-il la localisation et la langue de vos pages à l'indexation ?
17:27 Tous les signaux d'indexation sont-ils vraiment des signaux de classement ?
21:22 JavaScript côté client : Google l'indexe, mais faut-il vraiment l'utiliser pour le SEO ?
23:38 Quelles erreurs JavaScript tuent votre crawl budget sans que vous le sachiez ?
24:41 Pourquoi les SEO doivent-ils s'imposer dès la phase d'architecture technique d'un projet web ?
27:18 Faut-il vraiment viser la perfection SEO pour ranker ?

What you need to understand

What’s the difference between extraction and validation of structured data?

The Rich Results Test only validates schemas eligible for rich results: recipes, FAQs, products, events, etc. But Google doesn’t wait for this green light to scrape and analyze all structured data it finds in your HTML.

In other words, a JSON-LD or Microdata markup present on your page will be extracted even if it does not trigger any rich results in the SERPs. Google parses it, stores it, and uses it to enhance its understanding of the page and the entities it mentions.

Why does Google extract data that it won’t display?

Because its goal is not limited to rich snippets. Google builds a knowledge graph, connects entities together, and uses these semantic signals to better understand the context of a page.

An Organization schema, even invalid for rich results, can help Google associate your site with a specific entity in its Knowledge Graph. An Person schema not eligible for rich results can still strengthen the author-content relationship.

Does this mean that a 'broken' markup still has value?

Broken depends on how. If your JSON-LD contains a syntax error that prevents parsing, Google can’t use it at all. But if the JSON is technically valid and only the eligibility for rich results is in question, then yes: this data is utilized.

Google can extract partial information even from an incomplete schema. A schema.org/Article without a datePublished will not qualify for rich results, but the author, title, and description remain parsed and usable to understand the content.

Google extracts all structured data present in the DOM, not just those validated by the Rich Results Test.
This data feeds the understanding of entities and the semantic context, beyond just rich results.
A technically valid schema but not eligible for rich snippets still has value for understanding the page.
Syntax errors block parsing — but compliance errors with rich results guidelines do not block extraction.
This extraction is notably used to feed the Knowledge Graph and enhance entity recognition.

SEO Expert opinion

Is this statement consistent with what we observe in the field?

Yes, totally. We have seen for years that Google parses schemas that trigger no rich snippets. A site that moves a complete Organization schema to its footer often sees Google better associate the brand with the corresponding entity in the Knowledge Graph, even without a visible rich result.

Similarly, BreadcrumbList schemas are extracted even when they do not appear in the SERPs as breadcrumbs. Google uses it to understand the hierarchical structure of the site, which can influence site links and the internal navigation perceived by the engine.

What nuances should we add to this statement?

Gary Illyes remains vague on the real impact of this non-eligible data for rich results. Extraction does not necessarily mean direct use in ranking. [To verify]: we do not know if a Product schema invalid for rich snippets but technically parsable influences rankings or topical relevance.

Another nuance: Google extracts, but that doesn’t mean it trusts the provided data. A AggregateRating schema full of fake reviews will be extracted, but Google may choose to ignore it or downrank it if it detects a statistical anomaly or inconsistency with other signals.

In what cases might this rule not fully apply?

If the crawl budget is tight, Google may settle for a partial or superficial parsing of structured data. On a site with millions of pages and a constrained Googlebot, exhaustive extraction of all JSON-LD is not guaranteed at every visit.

Another limitation: undocumented or exotic schemas. Google claims to extract “all forms,” but in practice, it prioritizes common schema.org types. A custom schema or proprietary extension may be extracted, but is likely ignored due to lack of mapping in Google’s systems.

Practical impact and recommendations

What should you do with this information?

First, don’t limit yourself to schemas eligible for rich results. Always mark up the main entities on your pages: Organization, Person, WebSite, BreadcrumbList, even if you’re not aiming for any immediate rich results.

Next, pay attention to the technical quality of your JSON-LD. A syntactically valid but incomplete schema is better than no schema at all. Google will extract what it can, and that will enhance the semantic understanding of your content.

What errors should you avoid in this logic?

Avoid overloading your pages with unnecessary or redundant schemas. Google extracts everything, sure, but an overloaded JSON-LD with dozens of irrelevant types can dilute the signal and introduce noise. Focus on the entities and properties that truly describe the content of the page.

Also, do not rely solely on schemas to communicate critical information. Google uses them as complementary signals, not as a single source of truth. If your HTML content does not mention information present in the schema, Google may doubt its validity.

How can you check that Google is extracting your structured data?

Use the URL Inspection Tool in Search Console. The “Structured Data” section shows you what Google has actually extracted and parsed, even if no rich results appear. That’s where you’ll see if a non-eligible schema has been captured.

You can also cross-reference with the Search Console reports on rich results: if a schema is present in the inspection but absent from the rich results reports, it means it is extracted but not eligible — which is normal and still useful.

Always mark up the main entities: Organization, Person, BreadcrumbList, even if not aiming for a rich snippet.
Check the JSON syntax with a validator before deployment — a parsing error blocks everything.
Monitor the URL Inspection Tool to confirm that Google is extracting your schemas, even those not eligible for rich results.
Avoid saturating your pages with unnecessary schemas: prioritize relevance and precision over quantity.
Ensure the information in schema.org is consistent with the visible HTML content — Google cross-references signals.
Document deployed schemas in an internal repository for easier maintenance and audits.

Google extracts and analyzes all structured data from your pages, regardless of whether they trigger rich results or not. This is often an underutilized opportunity: marking up entities and semantic context enhances the engine's understanding of your content. Let’s be honest, orchestrating a coherent structured data strategy at the scale of a complex site requires sharp expertise and rigorous follow-up. If you want to maximize the impact of your schemas without risking introducing noise or critical errors, hiring a specialized SEO agency may be beneficial for tailored support.

❓ Frequently Asked Questions

Un schéma JSON-LD invalide pour les rich results est-il quand même extrait par Google ?

Oui, tant que la syntaxe JSON est valide. Google parse et exploite les données même si elles ne sont pas éligibles aux résultats enrichis, notamment pour comprendre les entités et le contexte de la page.

Faut-il absolument viser l'éligibilité aux rich results pour que le schema.org soit utile ?

Non. Les données structurées ont une valeur au-delà des rich snippets : elles alimentent la compréhension sémantique, le Knowledge Graph, et renforcent la reconnaissance d'entités par Google.

Google fait-il confiance à toutes les données structurées qu'il extrait ?

Non. Extraction ne signifie pas confiance aveugle. Google peut ignorer ou dévaluer des données incohérentes, suspectes, ou qui contredisent d'autres signaux de la page.

Où vérifier que Google a bien extrait mes données structurées non éligibles aux rich results ?

Dans l'URL Inspection Tool de la Search Console, section « Structured Data ». Tu y verras tous les schémas parsés, même ceux qui ne déclenchent aucun résultat enrichi.

Un schéma incomplet est-il mieux que pas de schéma du tout ?

Oui, à condition qu'il soit syntaxiquement valide. Google extraira les propriétés disponibles et s'en servira pour enrichir la compréhension de la page, même si le schéma n'est pas complet.

🎥 From the same video 11

Other SEO insights extracted from this same Google Search Central video · duration 32 min · published on 10/12/2020

🎥 Watch the full video on YouTube →