What does Google say about SEO? /
Quick SEO Quiz

Test your SEO knowledge in 5 questions

Less than a minute. Find out how much you really know about Google search.

🕒 ~1 min 🎯 5 questions

Official statement

For Google to connect structured data with its index, product URLs must be linked on the site; otherwise, they may not be followed.
30:02
🎥 Source video

Extracted from a Google Search Central video

⏱ 55:00 💬 EN 📅 10/01/2020 ✂ 11 statements
Watch on YouTube (30:02) →
Other statements from this video 10
  1. 1:47 Comment baliser correctement vos carrousels de recettes sans risquer une pénalité spam ?
  2. 7:28 Le balisage sémantique incorrect peut-il déclencher une pénalité manuelle ?
  3. 10:26 Comment gérer efficacement les pages Soft 404 sans pénaliser votre crawl budget ?
  4. 19:06 Les URLs parlantes sont-elles vraiment inutiles pour le SEO ?
  5. 21:59 Faut-il vraiment éviter de modifier plusieurs fois la structure de vos URLs ?
  6. 33:28 La longueur des URLs impacte-t-elle vraiment le classement SEO ou seulement la canonicalisation ?
  7. 36:55 La structure de site importe-t-elle vraiment plus que la profondeur des URLs ?
  8. 50:13 Pourquoi la date visible d'un contenu d'actualités impacte-t-elle votre référencement Google ?
  9. 55:24 L'intention de recherche remplace-t-elle désormais le matching exact des mots-clés ?
  10. 79:01 Les algorithmes de Google varient-ils vraiment selon les pays ?
📅
Official statement from (6 years ago)
TL;DR

Google states that product URLs equipped with structured data must be linked from other pages on the site to be properly crawled and indexed. Without internal links pointing to these URLs, there's a risk that Google may never crawl them, rendering the structured data completely ineffective. This statement reminds us that perfect Schema.org markup does not compensate for a faulty site architecture.

What you need to understand

Why does Google require internal links to utilize structured data?

Johannes Müller's statement highlights a common misunderstanding among e-commerce businesses: marking up product sheets in Schema.org does not guarantee their indexing. Google operates through discovery via crawling, and this crawling primarily follows hyperlinks.

If a product URL is not connected to any other page on the site — neither from navigation, nor from a category, nor from a blog article — Googlebot has no reason to visit it. The sitemap.xml file can signal these URLs, but its role is advisory: Google does not guarantee the systematic crawling of all declared URLs. Structured data is only valuable if the page carrying it is actually crawled and indexed.

What’s the difference between “being in the sitemap” and “being linked”?

An XML sitemap is a list of suggested URLs, without real hierarchy or priority for Googlebot. Google may choose to follow these URLs or not, depending on the crawl budget allocated to the site.

In contrast, internal linking conveys PageRank and indicates a logical structure: a product sheet accessible from the main navigation or a category page receives a clear relevance signal. Without this signal, even a URL declared in the sitemap can remain ignored for weeks, or even permanently, if the site has a limited crawl budget.

Can Google index an orphaned product page with perfect structured data?

Technically, yes — if Google stumbles upon it through another means (external link, crawl history, redirection). But relying on this is a risky bet. Product, Review, or Offer structured data does not serve as a priority call for crawling.

Johannes Müller emphasizes the risk of “not being followed”: in other words, a product sheet without an internal link is a ghost sheet. It exists in your CMS, it may be declared in your sitemap, but it never enters Google’s index — and thus never appears in enhanced search results, regardless of its Schema.org markup.

  • Structured data is not a crawl signal — it enhances the display of a page that has already been crawled and indexed.
  • Internal linking dictates discovery — an orphaned URL receives no priority crawling, even with Schema.org.
  • The XML sitemap is advisory — it suggests URLs but does not force Google to crawl or index them.
  • Internal PageRank plays a key role — a product linked from multiple category pages or articles receives more crawl budget than an isolated sheet.
  • Site architecture takes precedence over markup — good Schema.org on a poor structure won’t save anything.

SEO Expert opinion

Is this recommendation consistent with what we observe on the ground?

Absolutely. It's regularly observed that e-commerce sites with thousands of product sheets marked up in Schema.org display a catastrophic indexing rate — sometimes fewer than 30% of the URLs. The audit consistently reveals the same issue: orphan sheets, accessible only through the site's internal search engine or through the direct URL.

Google does not crawl “randomly.” It follows logical paths from the homepage, categories, and blog articles. If a product sheet is only accessible through deep pagination (page 12 of a category) or via a non-crawlable JavaScript filter, it essentially does not exist for Googlebot. Schema.org markup then becomes completely useless, regardless of its technical quality.

What architectural errors cause this problem?

First, filter facets without HTML links: many e-commerce sites generate dynamic product URLs via JavaScript, with no equivalent in traditional hyperlinks. Google can sometimes execute JS, but there’s no guarantee of systematic crawling.

Second, out-of-stock products excluded from the site: some CMSs automatically remove unavailable product sheets from internal linking while keeping them indexable via sitemap. The result: Google loses track and eventually de-indexes these pages. Third, overloaded mega-menus that drown important categories under hundreds of secondary links — the crawl budget becomes diluted, and deep product sheets are never visited.

In what cases does this rule not strictly apply?

If your site has an enormous crawl budget (high authority, massive backlinks, fresh content daily), Google may afford to crawl orphaned URLs discovered via the sitemap. But this luxury is reserved for big players — Amazon, Cdiscount, etc.

For most e-commerce sites, counting on this exception is a strategic error. Even an average site (10,000 to 50,000 products) must structure its internal linking rigorously. [To be verified]: Google has never published a precise threshold for crawl budget by site type, so it is impossible to know exactly where the line is between “guaranteed crawl” and “random crawl.”

Attention: Do not confuse “being in Google's index” and “being eligible for product rich snippets.” A page can be indexed without its structured data being utilized — and conversely, a non-indexed page can never generate enriched results, even with perfect markup.

Practical impact and recommendations

What should be audited first on an e-commerce site?

First action: identify orphan product sheets. Use Screaming Frog or Oncrawl to crawl the site from the homepage, then compare with the list of product URLs declared in the sitemap. Any URL present in the sitemap but absent from the internal crawl is potentially invisible to Google.

Second check: analyze the click depth of product sheets. Ideally, no product should be more than 3 clicks from the homepage. If your best-sellers are buried 5-6 clicks deep, that's a clear signal that your internal architecture does not prioritize the right pages.

How can you fix a flawed architecture without redesigning the whole site?

Add related product blocks on all high-traffic pages: homepage, main categories, popular blog articles. These blocks (“Our bestsellers”, “Recent products”, “Selection of the week”) create crawl paths to isolated product sheets.

Integrate contextual links in your editorial content: every blog article, buying guide, or FAQ should point to 3 to 5 relevant product sheets. This improves the internal linking and passes PageRank to monetizable pages. Finally, revisit your pagination: if it only works in JavaScript or with “Load more” buttons, replace it with classic HTML pagination featuring crawlable <a href> links.

What errors should be avoided when optimizing product linking?

Don’t multiply links to the same products from all pages — this dilutes the signal. Prioritize strategic products (high margin, available stock, strong search) and vary the link anchors to avoid over-optimization.

Avoid creating nofollow links to your own product sheets: this is a mistake still seen too often on sites that “want to keep the juice for other pages.” Internal nofollow blocks crawling and indexing — exactly the opposite of what is sought. Finally, do not neglect semantic consistency: a link from a “Running Shoes” category to a “Hiking Bag” product will disrupt crawling and muddle thematic relevance signals.

  • Crawl the site and compare with the sitemap to detect orphan sheets
  • Check that each strategic product is accessible within 3 clicks from the homepage
  • Add related product blocks on all high-traffic pages
  • Integrate contextual product links in blog articles and buying guides
  • Replace JavaScript pagination with crawlable HTML pagination
  • Prioritize links to high-margin or high SEO potential products
Optimizing internal product linking requires a holistic view of architecture: it’s not just about adding a few links, but rethinking crawl paths, internal PageRank hierarchy, and crawl budget distribution. These optimizations can quickly become complex on a catalog of several thousand references, especially if the CMS imposes technical constraints. In such cases, consulting a specialized SEO agency in e-commerce allows for structuring a tailored linking strategy, adapted to the site's technical constraints and business priorities — support that avoids costly mistakes and accelerates visibility gains.

❓ Frequently Asked Questions

Les données structurées produit peuvent-elles compenser l'absence de liens internes ?
Non. Les données structurées enrichissent l'affichage d'une page déjà crawlée et indexée, mais ne déclenchent pas de crawl prioritaire. Sans lien interne, Google peut tout simplement ignorer l'URL, rendant le balisage Schema.org inutile.
Le sitemap XML suffit-il pour faire indexer des fiches produits orphelines ?
Le sitemap suggère des URLs à Google, mais ne garantit aucun crawl ni indexation. Sur un site avec un crawl budget limité, les URLs orphelines déclarées dans le sitemap sont souvent ignorées pendant des semaines, voire définitivement.
Combien de liens internes minimum faut-il vers une fiche produit pour qu'elle soit crawlée ?
Il n'y a pas de seuil officiel, mais une seule occurrence dans une catégorie accessible depuis la navigation principale suffit généralement. L'idéal reste de multiplier les chemins de crawl (catégories, articles, blocs liés) pour renforcer le signal de pertinence.
Un produit en rupture de stock doit-il rester lié sur le site ?
Oui, sauf si vous voulez qu'il soit désindexé. Retirer les liens internes vers un produit indisponible entraîne souvent sa disparition de l'index Google. Mieux vaut conserver le lien et afficher un message de réapprovisionnement sur la fiche.
Les filtres JavaScript empêchent-ils Google de découvrir les fiches produits ?
Si les filtres génèrent des URLs sans équivalent en lien HTML, oui. Google peut exécuter certains scripts, mais rien ne garantit le crawl systématique. Privilégiez toujours des liens hypertextes classiques pour les chemins critiques vers les fiches produits.
🏷 Related Topics
Domain Age & History Crawl & Indexing E-commerce Links & Backlinks Domain Name

🎥 From the same video 10

Other SEO insights extracted from this same Google Search Central video · duration 55 min · published on 10/01/2020

🎥 Watch the full video on YouTube →

Related statements

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.