How does Google actually discover your new pages?

Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

Google primarily discovers new pages by following links (URLs) from already-known pages to new pages. Most newly discovered URLs originate from other known pages that Google has previously crawled.

🎥 Source video

Extracted from a Google Search Central video

💬 EN 📅 22/02/2024 ✂ 10 statements

Watch on YouTube →

✂ Other statements from this video 9 ▾

📅

Official statement from February 22, 2024 (2 years ago)

⚠ A more recent statement exists on this topic How does Google actually discover all the URLs on your website? Google · June 26, 2025 View statement →

TL;DR

Google discovers most of your new URLs by following links from pages it already knows. Link following remains the dominant discovery method, well ahead of XML sitemaps or the Indexing API. Your internal linking structure and backlinks directly determine how fast and comprehensively Googlebot discovers your content.

What you need to understand

This statement from Gary Illyes reinforces a fundamental principle of how Google works: crawling follows links. Googlebot navigates from page to page like a user, following the URLs it finds in HTML code.

This means your ability to get new content indexed quickly depends directly on the link structure leading to it from pages that are already crawled and cached in Google's index.

Why is this statement coming out now?

Google regularly reminds the SEO community of this mechanism because too many sites still rely exclusively on XML sitemaps to signal new content. Let's be honest: the sitemap is a safety net, not your primary strategy.

Field observations show that pages well-linked from the homepage or from active internal hubs are discovered within minutes, while those isolated in a sitemap may wait days—or even weeks—before their first crawl.

What does this actually change for my SEO?

If your new pages don't receive internal links from regularly-crawled pages, they'll remain invisible to Googlebot for an unpredictable amount of time. Crawl budget is not infinite, and Google prioritizes link paths it already knows.

This also means your backlink profile plays a dual role: it passes SEO juice, sure, but it also accelerates the discovery of new sections of your site if those external links point to pages that themselves link to your new content.

Crawling follows links: without an incoming link, rapid discovery won't happen
Internal linking is your primary lever for controlling discovery speed
XML sitemaps are a complement, not a first-tier solution
External backlinks accelerate discovery if they point to pages that link to your new content
Orphaned pages (with no internal or external incoming links) have little chance of being naturally discovered

Is the XML sitemap therefore useless?

No. But its role is often overestimated. It mainly serves as a safety net for URLs that Googlebot wouldn't have discovered otherwise—typically deep pages or recently published pages without optimal linking.

The sitemap rarely accelerates discovery as much as a good internal link from a high-crawl-frequency page does. It's insurance, not a turbo boost.

SEO Expert opinion

Is this statement consistent with observed practices?

Yes, absolutely. Server logs confirm it without ambiguity: new pages linked from the homepage, from active category pages, or from recent well-crawled articles appear in logs within minutes. Pages isolated in a sitemap may wait days.

I've observed on e-commerce sites that product pages linked from active SEO landing pages are crawled 10 to 50 times faster than those discovered solely via sitemap. The gap is huge.

What nuances should we add?

Gary Illyes says "most" new URLs, not "all." There are other discovery channels: XML sitemaps, the Indexing API (limited to certain content types), redirects, URL mentions in RSS feeds, etc.

But [To verify]: Google publishes no precise statistics on the actual share of each channel. We don't know if "most" means 60%, 80%, or 95%. This opacity is frustrating for those wanting to fine-tune crawl budget optimization.

Another point: this statement says nothing about crawl speed. Discovering a URL doesn't mean crawling it immediately. A URL can be discovered via a link but remain in queue if crawl budget is saturated or if Google deems the page low priority.

In what cases doesn't this rule apply completely?

Ephemeral or real-time content—job postings, events, breaking news—can benefit from the Indexing API, which bypasses traditional crawling. But this API is limited to specific use cases, and Google restricts it tightly.

Very large sites with millions of pages also generate discoveries through URL pattern analysis or extraction from external databases (for example, feed aggregators). But these methods remain minority and opaque.

Warning: Don't confuse discovery with indexation. A discovered URL may never be indexed if Google deems it duplicate, low quality, or canonicalized elsewhere.

Practical impact and recommendations

What should you concretely do to accelerate the discovery of your new pages?

Place internal links to your new pages from your most-crawled pages—typically the homepage, main category pages, and recent articles with high visibility. The closer the link is to the homepage in click distance, the faster the discovery.

Identify your high-crawl-frequency pages via your server logs (Search Console isn't enough here). These are your "entry doors" for Googlebot. Use them as springboards for your new content.

Avoid orphaned pages: every new page should receive at minimum one contextual internal link from an already-cached page. A link in a menu or footer is better than nothing, but an editorial link in the body of a recent article is far more effective.

What mistakes should you avoid at all costs?

Don't rely exclusively on your XML sitemap to get new pages discovered. It's a complement, not a strategy. Sites that publish content without internal linking and wait for Google to "do the work" via the sitemap lose days, even weeks.

Also avoid overloading your sitemaps with millions of low-priority URLs. Google does crawl sitemaps, but with limited budget. If your sitemap contains 90% worthless pages, the important 10% will be buried.

Another classic mistake: creating internal links from pages that are themselves never crawled. A link from a zombie page does nothing. Verify that your "relay" pages are actually visited by Googlebot.

How do you verify that your discovery strategy is working?

Analyze your server logs. Track the delay between publishing a new page and its first crawl by Googlebot. If this delay exceeds 24-48 hours for strategic pages, that's a red flag: your internal linking or crawl budget is failing.

In Search Console, monitor the coverage report and cross-reference it with your publication dates. Pages marked "Discovered but not explored" or "Explored but not indexed" often reveal internal linking or perceived quality issues.

Link every new page from at least one high-crawl-frequency page
Identify your "entry door" pages for Googlebot via server logs
Verify your sitemaps aren't overloaded with useless URLs
Avoid orphaned pages: zero incoming links = random discovery
Track discovery delay in your logs to adjust linking strategy
Don't rely solely on XML sitemaps for strategic content

URL discovery by Google relies primarily on link following. Your internal linking structure and backlink profile determine the speed and comprehensiveness of this discovery. XML sitemaps are a safety net, not your primary weapon. Optimizing this mechanism requires careful server log analysis, a coherent link architecture, and regular monitoring. These optimizations can be complex to orchestrate alone, especially on large sites or with specific technical architectures. Working with a specialized SEO agency often allows you to quickly identify bottlenecks and deploy an effective linking strategy tailored to your context.

❓ Frequently Asked Questions

Les sitemaps XML sont-ils encore utiles si Google découvre les pages par liens ?

Oui, ils restent utiles comme filet de sécurité pour les pages profondes ou mal maillées. Mais ils ne remplacent pas une stratégie de liens internes, qui est bien plus efficace pour accélérer la découverte.

Combien de temps faut-il à Google pour découvrir une nouvelle page via un lien interne ?

Cela dépend de la fréquence de crawl de la page source. Une page liée depuis la home ou un hub actif peut être découverte en quelques minutes. Une page liée depuis une page peu crawlée peut attendre des jours.

Une page découverte est-elle automatiquement indexée ?

Non. Découverte et indexation sont deux étapes distinctes. Google peut découvrir une URL mais décider de ne pas l'indexer si elle est dupliquée, de faible qualité ou canonicalisée ailleurs.

Les backlinks externes accélèrent-ils la découverte de nouvelles pages internes ?

Oui, indirectement. Si un backlink pointe vers une page qui elle-même maille vos nouveaux contenus, Googlebot découvrira ces pages en suivant les liens internes. Les backlinks augmentent aussi la fréquence de crawl globale du site.

Comment savoir si mes nouvelles pages sont découvertes rapidement ?

Analysez vos logs serveur pour mesurer le délai entre publication et premier crawl par Googlebot. La Search Console ne donne qu'une vision partielle et retardée de la découverte.

🏷 Related Topics

découverte URL crawl maillage interne Googlebot sitemap XML logs serveur indexation backlinks

Domain Age & History Crawl & Indexing Links & Backlinks Domain Name

🎥 From the same video 9

Other SEO insights extracted from this same Google Search Central video · published on 22/02/2024

🎥 Watch the full video on YouTube →

Related statements

« Previous

Adjusted crawl speed to prevent server overload...

Crawling: Page Discovery and Download Process...

« Back to results