Should you worry if Google isn't crawling all your pages?

Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

It is completely normal for Google not to crawl and index all pages on a website. The 'Discovered - currently not indexed' status can last indefinitely. For a new site with lots of content, this is expected at the beginning.

🎥 Source video

Extracted from a Google Search Central video

💬 EN 📅 18/02/2022 ✂ 24 statements

Watch on YouTube →

✂ Other statements from this video 23 ▾

📅

Official statement from February 18, 2022 (4 years ago)

⚠ A more recent statement exists on this topic Should You Still Worry About Toxic Backlinks in 2024? John Mueller · March 26, 2024 View statement →

TL;DR

Google never crawls and indexes all pages on a website — this is completely normal. The 'Discovered - currently not indexed' status can persist indefinitely without being a cause for concern. For new sites with large volumes of content, this phenomenon is expected and part of the natural discovery process.

What you need to understand

This statement reminds us of a reality that many SEO professionals forget: Google has never promised to index everything you publish. Crawling and indexing are limited resources, and the search engine makes choices.

Why doesn't Google crawl all your pages?

The crawl budget — this allocation of resources that Google grants to each website — is not infinite. Google prioritizes pages it considers important based on several criteria: popularity, freshness, perceived quality, and depth in the site structure.

For a site with 10,000 pages, it is common for only 6,000 to 8,000 to be regularly crawled. The rest? Waiting, sometimes indefinitely.

What does the 'Discovered - currently not indexed' status really mean?

This status appears in Search Console when Google has detected the existence of a URL (via an internal link, sitemap, or external mention) but hasn't deemed it a priority to crawl or index it.

Contrary to what some believe, this is not necessarily a quality issue. It can simply be an arbitrage of resources. A page discovered a month ago on a new site will wait its turn — sometimes indefinitely if it remains 4 clicks away from the homepage.

Are new sites particularly affected?

Absolutely. A new site with 500 pages all at once will experience progressive indexation over several weeks or even months. Google doesn't immediately trust the site and carefully manages its crawl.

This is where the site earns its crawl budget: by showing that it produces consulted content, by acquiring backlinks, by proving its relevance. Without that, part of the catalog will remain in passive discovery.

Google prioritizes its crawl resources based on the perceived importance of pages
The 'Discovered - currently not indexed' status is not a penalty or a systematic signal of poor quality
New sites undergo an observation phase where indexation is intentionally slowed down
A page can remain indefinitely discovered without ever being indexed if it doesn't provide differentiated value
Indexation time depends on link depth, update frequency, and popularity signals

SEO Expert opinion

Does this statement match real-world observations?

Yes — and that's even understating it. On e-commerce sites with tens of thousands of product pages, we regularly see 30 to 40% of the catalog remain in passive discovery. And this isn't always quality-related: sometimes these are perfectly valid pages, simply buried 5 clicks deep or with few backlinks.

The problem is that Mueller remains vague about the exact prioritization criteria. We know depth matters, backlinks help, freshness plays a role — but the thresholds? The weightings? [Needs verification] on each project, because Google doesn't disclose them.

When should you really worry about the 'Discovered - not indexed' status?

Let's be honest: if your strategic pages — those that should rank and convert — remain stuck in discovery, that's a red flag. Don't panic about peripheral pages (legal notices in PDF format, 2015 blog archives), but a flagship product page remaining unindexed for 3 months? There's an issue.

Common causes: catastrophic internal linking, duplicate or near-duplicate content that triggers URL consolidation, internal cannibalization, or simply a page too thin on unique content to justify indexation.

Does Google provide enough tools to diagnose this problem?

No. Search Console displays the status but never explains why a page remains in discovery. Is it a crawl budget issue? Quality? Depth? Duplication? You have to guess.

This is where server log analysis becomes essential. If Googlebot never visits certain sections, the problem is structural — linking architecture, robots.txt, misplaced nofollow tags. If Googlebot visits but doesn't index, it's a quality or relevance signal.

Caution: Don't confuse 'Discovered - not indexed' with 'Crawled - not indexed'. The first means Google knows the URL exists but has never seriously visited it. The second means it has visited and decided not to index it — which is more concerning.

Practical impact and recommendations

What should you do to speed up indexation of strategic pages?

First priority: reduce link depth. If your important pages are 4-5 clicks from the homepage, Google considers them secondary. Move them up in the information architecture, add links from the main navigation or high-crawl pages.

Second lever: improve contextual internal linking. A page linked from 10 relevant blog articles with varied anchor text sends a much stronger value signal than an isolated page buried in the sitemap.

Third lever — and this is often overlooked: clean up unnecessary pages. If your site contains 5,000 URLs with 2,000 adding no value (archives, faceted filters with no content, old unoptimized landing pages), you dilute your crawl budget. Noindex, 404, or consolidate them.

How do you know if the problem is crawl budget or quality?

Analyze your server logs. If Googlebot never visits certain sections, it's a crawl budget or structure issue. If Googlebot visits every week but still doesn't index, it's a quality signal.

Also test forced indexation via Search Console (request indexation). If Google consistently refuses, it's judging the page as non-relevant — thin content, duplication, cannibalization.

What mistakes should you absolutely avoid?

Don't overwhelm Google with sitemaps of 50,000 URLs where half are worthless. Google will crawl some, discover many pages are weak, and reduce your overall crawl budget.

Don't create generic content just to fill pages. A product page with 30 words of copy lifted from a supplier has a better chance of remaining undiscovered than a page with 300 unique, well-structured words.

Avoid flat architectures with everything 1 click away: that doesn't work either. Google needs semantic hierarchy to understand what's prioritized.

Internal linking audit: verify that strategic pages are maximum 3 clicks from the homepage
Log analysis to identify sections never or rarely crawled
Cleanup of useless URLs: noindex, 404, or consolidation of pages with no added value
Content enrichment for pages stuck in 'Discovered - not indexed' if they're strategic
Sitemap optimization: submit only truly priority URLs
Monthly tracking of indexation rate by page type in Search Console
Test forced indexation to diagnose qualitative rejection vs. simple delay

The 'Discovered - not indexed' status is only problematic if it affects your strategic pages. In that case, you need to act on internal linking, link depth, and content quality. For large or technical sites, these optimizations require advanced expertise in architecture and log analysis — a complete diagnosis by a specialized SEO agency often quickly identifies structural bottlenecks and helps prioritize high-impact actions.

❓ Frequently Asked Questions

Combien de temps peut durer le statut 'découvert mais non indexé' ?

Indéfiniment selon Mueller. Google peut connaître une URL pendant des mois ou des années sans jamais l'indexer si elle n'est pas jugée prioritaire. Ce n'est problématique que si la page est stratégique pour votre business.

Faut-il supprimer les pages en 'découvert non indexé' de son sitemap ?

Pas systématiquement. Si ce sont des pages stratégiques, gardez-les et optimisez leur crawlabilité. Si ce sont des pages annexes sans valeur SEO, oui, retirez-les pour ne pas diluer votre crawl budget.

Un nouveau site doit-il attendre combien de temps avant que toutes ses pages soient indexées ?

Il n'y a pas de garantie que toutes les pages le soient un jour. Pour un nouveau site avec beaucoup de contenu, l'indexation progressive sur plusieurs mois est normale. Priorisez les pages importantes via le maillage interne.

Le crawl budget est-il le seul facteur qui explique ce phénomène ?

Non. Google peut aussi décider qu'une page n'apporte pas de valeur ajoutée par rapport à l'existant (contenu trop similaire, faible qualité) et refuser de l'indexer même s'il a le budget pour la crawler.

Comment forcer Google à indexer une page bloquée en 'découvert non indexé' ?

Utilisez l'outil d'inspection d'URL dans la Search Console et demandez l'indexation. Si Google refuse après plusieurs tentatives, c'est un signal qu'il juge la page non pertinente — il faut alors améliorer son contenu ou son positionnement dans le maillage interne.

🏷 Related Topics

indexation crawl budget Search Console maillage interne architecture logs serveur sitemap nouveaux sites

Domain Age & History Content Crawl & Indexing AI & SEO

🎥 From the same video 23

Other SEO insights extracted from this same Google Search Central video · published on 18/02/2022

🎥 Watch the full video on YouTube →

Related statements

« Previous

PDF/HTML Duplication: Not Duplicate Content but Po...

Click-through rate does not affect the ranking of ...

« Back to results