Official statement
Other statements from this video 10 ▾
- 1:47 Comment baliser correctement vos carrousels de recettes sans risquer une pénalité spam ?
- 7:28 Le balisage sémantique incorrect peut-il déclencher une pénalité manuelle ?
- 10:26 Comment gérer efficacement les pages Soft 404 sans pénaliser votre crawl budget ?
- 19:06 Les URLs parlantes sont-elles vraiment inutiles pour le SEO ?
- 21:59 Faut-il vraiment éviter de modifier plusieurs fois la structure de vos URLs ?
- 33:28 La longueur des URLs impacte-t-elle vraiment le classement SEO ou seulement la canonicalisation ?
- 36:55 La structure de site importe-t-elle vraiment plus que la profondeur des URLs ?
- 50:13 Pourquoi la date visible d'un contenu d'actualités impacte-t-elle votre référencement Google ?
- 55:24 L'intention de recherche remplace-t-elle désormais le matching exact des mots-clés ?
- 79:01 Les algorithmes de Google varient-ils vraiment selon les pays ?
Google states that product URLs equipped with structured data must be linked from other pages on the site to be properly crawled and indexed. Without internal links pointing to these URLs, there's a risk that Google may never crawl them, rendering the structured data completely ineffective. This statement reminds us that perfect Schema.org markup does not compensate for a faulty site architecture.
What you need to understand
Why does Google require internal links to utilize structured data?
Johannes Müller's statement highlights a common misunderstanding among e-commerce businesses: marking up product sheets in Schema.org does not guarantee their indexing. Google operates through discovery via crawling, and this crawling primarily follows hyperlinks.
If a product URL is not connected to any other page on the site — neither from navigation, nor from a category, nor from a blog article — Googlebot has no reason to visit it. The sitemap.xml file can signal these URLs, but its role is advisory: Google does not guarantee the systematic crawling of all declared URLs. Structured data is only valuable if the page carrying it is actually crawled and indexed.
What’s the difference between “being in the sitemap” and “being linked”?
An XML sitemap is a list of suggested URLs, without real hierarchy or priority for Googlebot. Google may choose to follow these URLs or not, depending on the crawl budget allocated to the site.
In contrast, internal linking conveys PageRank and indicates a logical structure: a product sheet accessible from the main navigation or a category page receives a clear relevance signal. Without this signal, even a URL declared in the sitemap can remain ignored for weeks, or even permanently, if the site has a limited crawl budget.
Can Google index an orphaned product page with perfect structured data?
Technically, yes — if Google stumbles upon it through another means (external link, crawl history, redirection). But relying on this is a risky bet. Product, Review, or Offer structured data does not serve as a priority call for crawling.
Johannes Müller emphasizes the risk of “not being followed”: in other words, a product sheet without an internal link is a ghost sheet. It exists in your CMS, it may be declared in your sitemap, but it never enters Google’s index — and thus never appears in enhanced search results, regardless of its Schema.org markup.
- Structured data is not a crawl signal — it enhances the display of a page that has already been crawled and indexed.
- Internal linking dictates discovery — an orphaned URL receives no priority crawling, even with Schema.org.
- The XML sitemap is advisory — it suggests URLs but does not force Google to crawl or index them.
- Internal PageRank plays a key role — a product linked from multiple category pages or articles receives more crawl budget than an isolated sheet.
- Site architecture takes precedence over markup — good Schema.org on a poor structure won’t save anything.
SEO Expert opinion
Is this recommendation consistent with what we observe on the ground?
Absolutely. It's regularly observed that e-commerce sites with thousands of product sheets marked up in Schema.org display a catastrophic indexing rate — sometimes fewer than 30% of the URLs. The audit consistently reveals the same issue: orphan sheets, accessible only through the site's internal search engine or through the direct URL.
Google does not crawl “randomly.” It follows logical paths from the homepage, categories, and blog articles. If a product sheet is only accessible through deep pagination (page 12 of a category) or via a non-crawlable JavaScript filter, it essentially does not exist for Googlebot. Schema.org markup then becomes completely useless, regardless of its technical quality.
What architectural errors cause this problem?
First, filter facets without HTML links: many e-commerce sites generate dynamic product URLs via JavaScript, with no equivalent in traditional hyperlinks. Google can sometimes execute JS, but there’s no guarantee of systematic crawling.
Second, out-of-stock products excluded from the site: some CMSs automatically remove unavailable product sheets from internal linking while keeping them indexable via sitemap. The result: Google loses track and eventually de-indexes these pages. Third, overloaded mega-menus that drown important categories under hundreds of secondary links — the crawl budget becomes diluted, and deep product sheets are never visited.
In what cases does this rule not strictly apply?
If your site has an enormous crawl budget (high authority, massive backlinks, fresh content daily), Google may afford to crawl orphaned URLs discovered via the sitemap. But this luxury is reserved for big players — Amazon, Cdiscount, etc.
For most e-commerce sites, counting on this exception is a strategic error. Even an average site (10,000 to 50,000 products) must structure its internal linking rigorously. [To be verified]: Google has never published a precise threshold for crawl budget by site type, so it is impossible to know exactly where the line is between “guaranteed crawl” and “random crawl.”
Practical impact and recommendations
What should be audited first on an e-commerce site?
First action: identify orphan product sheets. Use Screaming Frog or Oncrawl to crawl the site from the homepage, then compare with the list of product URLs declared in the sitemap. Any URL present in the sitemap but absent from the internal crawl is potentially invisible to Google.
Second check: analyze the click depth of product sheets. Ideally, no product should be more than 3 clicks from the homepage. If your best-sellers are buried 5-6 clicks deep, that's a clear signal that your internal architecture does not prioritize the right pages.
How can you fix a flawed architecture without redesigning the whole site?
Add related product blocks on all high-traffic pages: homepage, main categories, popular blog articles. These blocks (“Our bestsellers”, “Recent products”, “Selection of the week”) create crawl paths to isolated product sheets.
Integrate contextual links in your editorial content: every blog article, buying guide, or FAQ should point to 3 to 5 relevant product sheets. This improves the internal linking and passes PageRank to monetizable pages. Finally, revisit your pagination: if it only works in JavaScript or with “Load more” buttons, replace it with classic HTML pagination featuring crawlable <a href> links.
What errors should be avoided when optimizing product linking?
Don’t multiply links to the same products from all pages — this dilutes the signal. Prioritize strategic products (high margin, available stock, strong search) and vary the link anchors to avoid over-optimization.
Avoid creating nofollow links to your own product sheets: this is a mistake still seen too often on sites that “want to keep the juice for other pages.” Internal nofollow blocks crawling and indexing — exactly the opposite of what is sought. Finally, do not neglect semantic consistency: a link from a “Running Shoes” category to a “Hiking Bag” product will disrupt crawling and muddle thematic relevance signals.
- Crawl the site and compare with the sitemap to detect orphan sheets
- Check that each strategic product is accessible within 3 clicks from the homepage
- Add related product blocks on all high-traffic pages
- Integrate contextual product links in blog articles and buying guides
- Replace JavaScript pagination with crawlable HTML pagination
- Prioritize links to high-margin or high SEO potential products
❓ Frequently Asked Questions
Les données structurées produit peuvent-elles compenser l'absence de liens internes ?
Le sitemap XML suffit-il pour faire indexer des fiches produits orphelines ?
Combien de liens internes minimum faut-il vers une fiche produit pour qu'elle soit crawlée ?
Un produit en rupture de stock doit-il rester lié sur le site ?
Les filtres JavaScript empêchent-ils Google de découvrir les fiches produits ?
🎥 From the same video 10
Other SEO insights extracted from this same Google Search Central video · duration 55 min · published on 10/01/2020
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.