What does Google say about SEO? /
Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

The URL inspection tool shows how Google discovered a page: either via a sitemap or via a referring page, specifying which page that was. This information helps you understand how Googlebot finds your content.
🎥 Source video

Extracted from a Google Search Central video

💬 EN 📅 02/08/2023 ✂ 9 statements
Watch on YouTube →
Other statements from this video 8
  1. Google indexe-t-il vraiment le HTML rendu plutôt que le code source ?
  2. Google respecte-t-il vraiment votre balise canonical ou décide-t-il seul ?
  3. Comment vérifier efficacement les directives X-Robots dans vos en-têtes HTTP ?
  4. Les ressources JavaScript bloquées par robots.txt sabotent-elles vraiment votre indexation ?
  5. Faut-il vraiment s'inquiéter des erreurs de ressources dans la Search Console ?
  6. Les messages console JavaScript sont-ils devenus un signal SEO à surveiller ?
  7. Pourquoi le test d'URL en direct de Google Search Console donne-t-il des résultats différents à chaque fois ?
  8. Faut-il vraiment ignorer les captures d'écran dans les outils de test de Google ?
📅
Official statement from (2 years ago)
TL;DR

Google Search Console's URL inspection tool now displays how Googlebot discovered each indexed page: via an XML sitemap or through a specific referring page. This transparency helps identify crawl issues and optimize how Google accesses your strategic content.

What you need to understand

What exactly does this new information reveal?

The URL inspection tool no longer just tells you whether a page is indexed. It now indicates the discovery method used by Googlebot: either an XML sitemap file or a specifically identified referring page.

Concretely, you know whether Google found your page because you actively submitted it via sitemap, or because it discovered it by following a link from another page on your site (or an external one). This distinction is far from trivial.

Why does this granularity change the game?

Until now, diagnosing why certain pages weren't being crawled was often a matter of trial and error. You had hypotheses: excessive click depth, orphaned links, crawl budget issues.

Now, you have factual evidence. If a strategic page only appears via sitemap and never through natural discovery, it's suffering from internal linking problems. Conversely, if Google finds it through discovery but not via sitemap, your XML file deserves an audit.

What's the practical difference between sitemap and referring page?

Sitemaps are a suggestion — not an order. Google consults them but doesn't guarantee crawling all listed URLs, especially if your crawl budget is tight.

Discovery via referring page indicates that Googlebot actively followed a link from an already-known page. It's a signal that your internal linking works and that the page benefits from a certain crawl authority transmitted by the source page.

  • The tool displays the precise source: sitemap URL or exact referring page URL
  • This allows you to trace Googlebot's discovery path through your site structure
  • Useful for identifying orphaned pages only discovered via sitemap
  • Helps prioritize internal linking to poorly discovered strategic pages
  • Enables detection of inconsistencies between your sitemap and your actual link structure

SEO Expert opinion

Is this transparency really new?

Let's be honest: Google has always known how it discovered your pages. What's new is that it's finally sharing this data through Search Console in an accessible way.

Previously, server logs allowed you to reconstruct this information by cross-referencing Googlebot requests with your sitemaps and link structure. But it was time-consuming and limited to sites with solid analytics infrastructure. Now it's democratized.

What limitations should you anticipate with this tool?

First point: the inspection tool displays the source of the most recent known discovery. If Googlebot discovered your page via sitemap six months ago and then re-crawled it via an internal link yesterday, the complete history isn't necessarily visible. [To verify] whether Google keeps a history of different discovery methods or only the most recent one.

Second nuance: this information says nothing about the quality or priority given to the page. Discovery via referring page isn't automatically synonymous with quick indexing or good rankings — it only confirms that a crawl path exists.

Caution: Don't confuse discovery source with ranking criteria. A page can be discovered via sitemap and rank perfectly well if its content and SEO signals are solid. The opposite is true: a page discovered through internal linking can remain invisible if it adds no value.

Does this data challenge current practices?

Not really. The fundamentals remain: coherent internal linking, clean XML sitemap, flat architecture, optimized crawl budget. What this tool does is make visible what was previously opaque.

But it offers a valuable diagnostic lever. If you notice that your strategic pages are only discovered via sitemap, it's a clear warning sign: your internal linking isn't doing its job. Conversely, if secondary or unnecessary pages are massively crawled through natural discovery, you're wasting crawl budget — and you need to review your nofollow or robots.txt.

Practical impact and recommendations

What should you audit first with this information?

Start with your strategic pages: flagship product sheets, SEO landing pages, pillar content. Inspect them one by one and note the discovery source displayed.

If they only appear via sitemap, it means Google isn't finding them naturally by browsing your site. This indicates they're either too deep in your architecture, poorly linked, or completely orphaned despite their sitemap presence.

How do you fix a failing discovery problem?

If a critical page is only discovered via sitemap, strengthen its internal linking. Add links from the homepage, from main category pages, from related blog articles.

Also check click depth: ideally, no important page should be more than 3 clicks from the root. If it is, revise your architecture.

Conversely, if parasite pages (date archives, unnecessary tags, auto-generated facets) are massively discovered via referring pages, ask yourself if they deserve to be crawled. Add noindex, add nofollow to links toward these sections, or block them in robots.txt if relevant.

What errors should you avoid when interpreting this data?

Don't jump to conclusions based on a single page. Inspect a representative sample: 20-30 URLs distributed among strategic content, transactional pages, editorial content.

Also don't overlook crawl delays. A recently published page might appear as discovered via sitemap simply because Googlebot hasn't found it through natural navigation yet. Wait a few weeks before panicking.

  • Audit the discovery source of 20-30 strategic pages via the inspection tool
  • Identify orphaned pages only discovered via sitemap
  • Strengthen internal linking to these isolated pages
  • Check click depth and real accessibility from the homepage
  • Detect secondary sections over-crawled through referring pages
  • Clean up unnecessary links or add nofollow/noindex if relevant
  • Cross-reference this data with server logs for a complete view of Googlebot behavior
  • Reassess regularly: discovery source can evolve after optimizations
The URL inspection tool becomes an essential diagnostic lever for understanding how Googlebot actually accesses your content. Leverage this data to correct internal linking flaws, prioritize crawl of strategic pages, and avoid wasting crawl budget on unnecessary sections. If orchestrating these technical optimizations seems complex on your own — between log analysis, internal linking overhaul, and architectural decisions — working with a specialized SEO agency can save you time and ensure secure implementation of these critical adjustments.

❓ Frequently Asked Questions

Est-ce que la source de découverte influence directement le classement d'une page ?
Non. La source de découverte (sitemap ou page référente) n'est pas un critère de ranking. Elle indique simplement comment Googlebot a trouvé la page. En revanche, une mauvaise découverte peut retarder l'indexation ou signaler un problème de maillage interne, ce qui impacte indirectement la visibilité.
Si une page n'apparaît que via sitemap, est-elle pénalisée par Google ?
Non, elle n'est pas pénalisée. Mais cela révèle un problème de maillage interne ou d'architecture. Google préfère découvrir les pages via navigation naturelle, car cela reflète mieux la structure logique de votre site et l'autorité transmise par les liens internes.
Peut-on forcer Google à découvrir une page via une méthode plutôt qu'une autre ?
Pas directement. Vous pouvez soumettre une page via sitemap pour accélérer sa découverte, mais si elle est bien maillée, Googlebot finira par la trouver naturellement via liens internes. L'inverse est vrai : une page orpheline ne sera découverte que via sitemap, quel que soit le reste.
L'outil affiche-t-il toutes les sources de découverte ou seulement la dernière ?
L'interface actuelle semble afficher la source de la dernière découverte connue. L'historique complet des méthodes de découverte successives n'est pas documenté officiellement. Croiser avec les logs serveur reste recommandé pour une vision exhaustive.
Faut-il supprimer les pages découvertes uniquement via sitemap de ce fichier XML ?
Pas nécessairement. Si ce sont des pages stratégiques, corrigez plutôt leur maillage interne pour qu'elles soient découvertes naturellement. Si ce sont des pages inutiles ou de faible valeur, envisagez de les retirer du sitemap et d'ajouter un noindex si pertinent.
🏷 Related Topics
Domain Age & History Content Crawl & Indexing AI & SEO Domain Name Search Console

🎥 From the same video 8

Other SEO insights extracted from this same Google Search Central video · published on 02/08/2023

🎥 Watch the full video on YouTube →

Related statements

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.