Official statement
Other statements from this video 11 ▾
- 3:47 Chrome evergreen pour le rendering : Google met-il vraiment à jour son moteur aussi vite qu'annoncé ?
- 4:49 Google rend-il vraiment TOUTES les pages crawlées avec JavaScript ?
- 11:40 Le PageRank fonctionne-t-il encore vraiment comme on le pense ?
- 13:49 Faut-il vraiment renoncer à acheter des liens de qualité pour son SEO ?
- 15:23 Safe Search s'applique-t-il vraiment pendant l'indexation ?
- 15:54 Comment Google détecte-t-il la localisation et la langue de vos pages à l'indexation ?
- 17:27 Tous les signaux d'indexation sont-ils vraiment des signaux de classement ?
- 21:22 JavaScript côté client : Google l'indexe, mais faut-il vraiment l'utiliser pour le SEO ?
- 23:38 Quelles erreurs JavaScript tuent votre crawl budget sans que vous le sachiez ?
- 24:41 Pourquoi les SEO doivent-ils s'imposer dès la phase d'architecture technique d'un projet web ?
- 27:18 Faut-il vraiment viser la perfection SEO pour ranker ?
Google extracts all structured data available on your pages, regardless of whether they have been validated by the Rich Results Test. This data helps feed the understanding of entities and the semantic context of your content. In practice, a 'broken' schema.org is not ignored: it is parsed, analyzed, and can influence how Google interprets your content.
What you need to understand
What’s the difference between extraction and validation of structured data?
The Rich Results Test only validates schemas eligible for rich results: recipes, FAQs, products, events, etc. But Google doesn’t wait for this green light to scrape and analyze all structured data it finds in your HTML.
In other words, a JSON-LD or Microdata markup present on your page will be extracted even if it does not trigger any rich results in the SERPs. Google parses it, stores it, and uses it to enhance its understanding of the page and the entities it mentions.
Why does Google extract data that it won’t display?
Because its goal is not limited to rich snippets. Google builds a knowledge graph, connects entities together, and uses these semantic signals to better understand the context of a page.
An Organization schema, even invalid for rich results, can help Google associate your site with a specific entity in its Knowledge Graph. An Person schema not eligible for rich results can still strengthen the author-content relationship.
Does this mean that a 'broken' markup still has value?
Broken depends on how. If your JSON-LD contains a syntax error that prevents parsing, Google can’t use it at all. But if the JSON is technically valid and only the eligibility for rich results is in question, then yes: this data is utilized.
Google can extract partial information even from an incomplete schema. A schema.org/Article without a datePublished will not qualify for rich results, but the author, title, and description remain parsed and usable to understand the content.
- Google extracts all structured data present in the DOM, not just those validated by the Rich Results Test.
- This data feeds the understanding of entities and the semantic context, beyond just rich results.
- A technically valid schema but not eligible for rich snippets still has value for understanding the page.
- Syntax errors block parsing — but compliance errors with rich results guidelines do not block extraction.
- This extraction is notably used to feed the Knowledge Graph and enhance entity recognition.
SEO Expert opinion
Is this statement consistent with what we observe in the field?
Yes, totally. We have seen for years that Google parses schemas that trigger no rich snippets. A site that moves a complete Organization schema to its footer often sees Google better associate the brand with the corresponding entity in the Knowledge Graph, even without a visible rich result.
Similarly, BreadcrumbList schemas are extracted even when they do not appear in the SERPs as breadcrumbs. Google uses it to understand the hierarchical structure of the site, which can influence site links and the internal navigation perceived by the engine.
What nuances should we add to this statement?
Gary Illyes remains vague on the real impact of this non-eligible data for rich results. Extraction does not necessarily mean direct use in ranking. [To verify]: we do not know if a Product schema invalid for rich snippets but technically parsable influences rankings or topical relevance.
Another nuance: Google extracts, but that doesn’t mean it trusts the provided data. A AggregateRating schema full of fake reviews will be extracted, but Google may choose to ignore it or downrank it if it detects a statistical anomaly or inconsistency with other signals.
In what cases might this rule not fully apply?
If the crawl budget is tight, Google may settle for a partial or superficial parsing of structured data. On a site with millions of pages and a constrained Googlebot, exhaustive extraction of all JSON-LD is not guaranteed at every visit.
Another limitation: undocumented or exotic schemas. Google claims to extract “all forms,” but in practice, it prioritizes common schema.org types. A custom schema or proprietary extension may be extracted, but is likely ignored due to lack of mapping in Google’s systems.
Practical impact and recommendations
What should you do with this information?
First, don’t limit yourself to schemas eligible for rich results. Always mark up the main entities on your pages: Organization, Person, WebSite, BreadcrumbList, even if you’re not aiming for any immediate rich results.
Next, pay attention to the technical quality of your JSON-LD. A syntactically valid but incomplete schema is better than no schema at all. Google will extract what it can, and that will enhance the semantic understanding of your content.
What errors should you avoid in this logic?
Avoid overloading your pages with unnecessary or redundant schemas. Google extracts everything, sure, but an overloaded JSON-LD with dozens of irrelevant types can dilute the signal and introduce noise. Focus on the entities and properties that truly describe the content of the page.
Also, do not rely solely on schemas to communicate critical information. Google uses them as complementary signals, not as a single source of truth. If your HTML content does not mention information present in the schema, Google may doubt its validity.
How can you check that Google is extracting your structured data?
Use the URL Inspection Tool in Search Console. The “Structured Data” section shows you what Google has actually extracted and parsed, even if no rich results appear. That’s where you’ll see if a non-eligible schema has been captured.
You can also cross-reference with the Search Console reports on rich results: if a schema is present in the inspection but absent from the rich results reports, it means it is extracted but not eligible — which is normal and still useful.
- Always mark up the main entities: Organization, Person, BreadcrumbList, even if not aiming for a rich snippet.
- Check the JSON syntax with a validator before deployment — a parsing error blocks everything.
- Monitor the URL Inspection Tool to confirm that Google is extracting your schemas, even those not eligible for rich results.
- Avoid saturating your pages with unnecessary schemas: prioritize relevance and precision over quantity.
- Ensure the information in schema.org is consistent with the visible HTML content — Google cross-references signals.
- Document deployed schemas in an internal repository for easier maintenance and audits.
❓ Frequently Asked Questions
Un schéma JSON-LD invalide pour les rich results est-il quand même extrait par Google ?
Faut-il absolument viser l'éligibilité aux rich results pour que le schema.org soit utile ?
Google fait-il confiance à toutes les données structurées qu'il extrait ?
Où vérifier que Google a bien extrait mes données structurées non éligibles aux rich results ?
Un schéma incomplet est-il mieux que pas de schéma du tout ?
🎥 From the same video 11
Other SEO insights extracted from this same Google Search Central video · duration 32 min · published on 10/12/2020
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.