Does Google really crawl URLs found in your structured data?

Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

Google can discover and crawl URLs found in your structured data, but this is not guaranteed. If you want a URL to be crawled, create a real HTML link with anchor text. If you don't want it to be crawled, block it via robots.txt or use rel=canonical.

🎥 Source video

Extracted from a Google Search Central video

💬 EN 📅 08/06/2022 ✂ 13 statements

Watch on YouTube →

✂ Other statements from this video 12 ▾

📅

Official statement from June 8, 2022 (3 years ago)

⚠ A more recent statement exists on this topic Should You Use Recipe Structured Data on Collection Pages? Danny Sullivan · January 30, 2024 View statement →

TL;DR

Google can discover and crawl URLs present in your structured data, but nothing guarantees it will. To ensure a page gets crawled, use a real HTML link with anchor text. If you want to block crawling, use robots.txt or rel=canonical.

What you need to understand

Does Google really crawl URLs found in your structured data?

The short answer: sometimes. Google can technically extract and follow URLs referenced in your Schema.org tags, but this capability is not a guarantee. The engine prioritizes classic HTML signals to determine which pages deserve to be crawled.

In practice, a URL mentioned only in JSON-LD or microformats has no guarantee of being discovered. The bot relies on priority criteria — and structured data doesn't rank at the top of that list.

Why doesn't Google commit more strongly to this?

Because the primary role of structured data is not crawling, but enriching search results. Schema.org serves to qualify the content of a page already discovered, not to point out new URLs to explore.

If Google sometimes follows these URLs, it's a side effect, not an official feature you can build a strategy around. The engine reserves the right to change this behavior at any time.

What happens if I want to block a URL mentioned in my structured data?

Two solutions: robots.txt to prohibit crawling upfront, or rel=canonical to indicate that another version has authority. Mueller is clear on this — structured data is not a crawl control mechanism.

If you don't want a page to be indexed, don't count on the absence of an HTML link to protect it. Block it explicitly.

Structured data can contain URLs, but crawling is not guaranteed
To ensure discovery, use HTML links with anchor text
To block, use robots.txt or rel=canonical, not the absence of a link
The role of structured data remains semantic enrichment, not navigation

SEO Expert opinion

Is this statement consistent with observed practices in the field?

Yes, but with troubling exceptions. We regularly observe that Google discovers pages mentioned only in structured data like BreadcrumbList or ItemList. These URLs then appear in Search Console without any internal HTML link referencing them.

The problem: this behavior is neither consistent nor documented. Some sites experience systematic crawling, others never do. It's impossible to predict when Google will activate this mechanism. [To verify] depending on Schema types and the site's crawl frequency.

What nuances should be added to this rule?

Mueller speaks of discovery and crawling, but doesn't mention indexing. Even if Google crawls a URL found in your structured data, nothing guarantees it will be indexed — especially without backlinks or internal linking to support it.

Furthermore, certain Schema types include URLs that are not meant to be crawled — think of the url property of Organization pointing to social networks or third-party platforms. Google must distinguish these cases, but the exact logic remains unclear.

In which cases does this rule not fully apply?

AMP pages and content structured for Google News or Google Shopping seem to receive different treatment. URLs referenced in XML feeds or specific tags (like amphtml) are systematically followed.

Similarly, certain crawls triggered by special events — product update, new content signaled via IndexNow or Search Console — can trigger the following of URLs found in structured data. But again, no contractual guarantee.

Warning: Never rely solely on structured data to make your strategic pages discoverable. The risk of oversight is too high.

Practical impact and recommendations

What should you concretely do to guarantee your pages get crawled?

The answer is simple: create classic HTML links with anchor text. This is the most reliable signal to indicate to Google that a page deserves to be crawled and indexed.

Integrate these links into your internal linking structure — main navigation, menu, editorial content, footer if relevant. The more quality internal links a page receives, the faster and more regularly it will be crawled.

What mistakes should you avoid with URLs in structured data?

First mistake: believing that mentioning a URL in your structured data is enough to get it indexed. If Google doesn't crawl it, you're wasting time.

Second mistake: using structured data as a crawl control mechanism. If you don't want a page to be explored, block it explicitly — don't just remove HTML links hoping structured data won't reveal it.

Third mistake: neglecting consistency between your HTML links and structured data. If a BreadcrumbList references a URL absent from the DOM, this creates confusion — and potentially contradictory signals for the engine.

How can you verify that your site complies with Google's recommendations?

Audit your strategic pages: each important URL must have at least one internal HTML link
Check that your structured data doesn't contain orphaned URLs (without corresponding HTML link)
Control your robots.txt file: any page you don't want crawled should be listed there
Use rel=canonical to signal priority versions and avoid crawling variants
Test your structured data with the Rich Results Test to detect errors
Monitor Search Console: identify pages discovered without apparent HTML link — this reveals crawling via structured data

Structured data enriches your pages, but never replaces a solid internal linking structure. To master crawling, combine classic HTML links, robots.txt, and canonical — and consider URLs in your schemas as an uncertain bonus, never a guarantee.

These cross-optimization between technical structure, linking, and semantics requires pointed expertise and an overall vision. If you want to avoid pitfalls and maximize your chances of optimal crawling, support from a specialized SEO agency can prove valuable to lay the right foundations from the start.

❓ Frequently Asked Questions

Les URLs dans mes données structurées de type BreadcrumbList sont-elles toujours crawlées ?

Non, ce n'est pas garanti. Google peut les découvrir, mais sans lien HTML correspondant, le crawl reste aléatoire et dépend de critères internes non documentés.

Dois-je retirer les URLs de mes données structurées si je ne veux pas qu'elles soient crawlées ?

Non, utilisez plutôt robots.txt ou rel=canonical pour bloquer explicitement le crawl. Retirer l'URL des données structurées ne suffit pas à garantir qu'elle ne sera pas découverte autrement.

Est-ce que Google indexe toutes les URLs qu'il crawle via les données structurées ?

Non, crawl et indexation sont deux étapes distinctes. Même crawlée, une page peut ne jamais être indexée si elle manque de signaux de pertinence (backlinks, maillage interne, contenu unique).

Les données structurées peuvent-elles remplacer le maillage interne pour la découverte de pages ?

Absolument pas. Le maillage interne reste le signal prioritaire pour le crawl. Les données structurées sont un complément sémantique, pas un mécanisme de navigation.

Faut-il éviter de mentionner des URLs externes dans mes données structurées ?

Non, c'est parfois nécessaire (ex: profils sociaux dans Organization). Google fait la distinction entre URLs internes et externes — mais vérifiez que ces liens externes sont bien intentionnels.

🏷 Related Topics

données structurées crawl maillage interne robots.txt canonical Schema.org indexation

Crawl & Indexing AI & SEO Links & Backlinks Domain Name

🎥 From the same video 12

Other SEO insights extracted from this same Google Search Central video · published on 08/06/2022

🎥 Watch the full video on YouTube →

Related statements

« Previous

Nofollow on Internal Links...

Handling Multiple HTTP Status Codes by Google...

« Back to results

Does Google really crawl URLs found in your structured data?

Test your SEO knowledge in 3 questions

Already played

Official statement

What you need to understand

Does Google really crawl URLs found in your structured data?

Why doesn't Google commit more strongly to this?

What happens if I want to block a URL mentioned in my structured data?

SEO Expert opinion

Is this statement consistent with observed practices in the field?

What nuances should be added to this rule?

In which cases does this rule not fully apply?

Practical impact and recommendations

What should you concretely do to guarantee your pages get crawled?

What mistakes should you avoid with URLs in structured data?

How can you verify that your site complies with Google's recommendations?

❓ Frequently Asked Questions

🎥 From the same video 12

Related statements

💬 Comments (0)

Get real-time analysis of the latest Google SEO declarations