What does Google say about SEO? /

Official statement

It is completely normal for Google not to crawl and index all pages on a website. The 'Discovered - currently not indexed' status can last indefinitely. For a new site with lots of content, this is expected at the beginning.
🎥 Source video

Extracted from a Google Search Central video

💬 EN 📅 18/02/2022 ✂ 24 statements
Watch on YouTube →
Other statements from this video 23
  1. Does Google really count every single visible link pointing to your site in Search Console?
  2. Should you really concentrate your content on fewer pages to rank better?
  3. Do Google's product review criteria apply even if your site isn't classified as a review site?
  4. Does Google's Indexing API really work for all types of content?
  5. Does E-A-T Really Impact Google Rankings, or Is It Just a Myth?
  6. Do unlinked brand mentions really boost your SEO rankings?
  7. Do user comments really improve your Google rankings?
  8. Do premium SSL certificates really impact Google rankings?
  9. Does having the same content in both PDF and HTML formats hurt your SEO rankings through cannibalization?
  10. Can you really control PDF indexing through HTTP headers?
  11. Should you still use rel=next and rel=prev tags for pagination in 2024?
  12. Does Googlebot really index all your infinite scroll content?
  13. Should you really index every page on your website?
  14. Should you really worry about the referrer page shown in Google Search Console?
  15. Should you really redirect the old sitemap with a 301 or submit the new one directly instead?
  16. Is a 97% crawl refresh rate actually a positive sign for your website's health?
  17. Does your server speed actually control how often Google crawls your site?
  18. Does Google really measure crawl speed and Core Web Vitals the same way — and why should you care?
  19. Does Google really slow down crawling after a hosting migration, and how long does it last?
  20. Is the crawl rate parameter really a ceiling rather than something Google will try to maximize?
  21. Can CTR really penalize the rest of your website?
  22. Is internal linking really the most critical factor for SEO success?
  23. Does internal linking really take effect instantly after Google recrawls your pages?
📅
Official statement from (4 years ago)
TL;DR

Google never crawls and indexes all pages on a website — this is completely normal. The 'Discovered - currently not indexed' status can persist indefinitely without being a cause for concern. For new sites with large volumes of content, this phenomenon is expected and part of the natural discovery process.

What you need to understand

This statement reminds us of a reality that many SEO professionals forget: Google has never promised to index everything you publish. Crawling and indexing are limited resources, and the search engine makes choices.

Why doesn't Google crawl all your pages?

The crawl budget — this allocation of resources that Google grants to each website — is not infinite. Google prioritizes pages it considers important based on several criteria: popularity, freshness, perceived quality, and depth in the site structure.

For a site with 10,000 pages, it is common for only 6,000 to 8,000 to be regularly crawled. The rest? Waiting, sometimes indefinitely.

What does the 'Discovered - currently not indexed' status really mean?

This status appears in Search Console when Google has detected the existence of a URL (via an internal link, sitemap, or external mention) but hasn't deemed it a priority to crawl or index it.

Contrary to what some believe, this is not necessarily a quality issue. It can simply be an arbitrage of resources. A page discovered a month ago on a new site will wait its turn — sometimes indefinitely if it remains 4 clicks away from the homepage.

Are new sites particularly affected?

Absolutely. A new site with 500 pages all at once will experience progressive indexation over several weeks or even months. Google doesn't immediately trust the site and carefully manages its crawl.

This is where the site earns its crawl budget: by showing that it produces consulted content, by acquiring backlinks, by proving its relevance. Without that, part of the catalog will remain in passive discovery.

  • Google prioritizes its crawl resources based on the perceived importance of pages
  • The 'Discovered - currently not indexed' status is not a penalty or a systematic signal of poor quality
  • New sites undergo an observation phase where indexation is intentionally slowed down
  • A page can remain indefinitely discovered without ever being indexed if it doesn't provide differentiated value
  • Indexation time depends on link depth, update frequency, and popularity signals

SEO Expert opinion

Does this statement match real-world observations?

Yes — and that's even understating it. On e-commerce sites with tens of thousands of product pages, we regularly see 30 to 40% of the catalog remain in passive discovery. And this isn't always quality-related: sometimes these are perfectly valid pages, simply buried 5 clicks deep or with few backlinks.

The problem is that Mueller remains vague about the exact prioritization criteria. We know depth matters, backlinks help, freshness plays a role — but the thresholds? The weightings? [Needs verification] on each project, because Google doesn't disclose them.

When should you really worry about the 'Discovered - not indexed' status?

Let's be honest: if your strategic pages — those that should rank and convert — remain stuck in discovery, that's a red flag. Don't panic about peripheral pages (legal notices in PDF format, 2015 blog archives), but a flagship product page remaining unindexed for 3 months? There's an issue.

Common causes: catastrophic internal linking, duplicate or near-duplicate content that triggers URL consolidation, internal cannibalization, or simply a page too thin on unique content to justify indexation.

Does Google provide enough tools to diagnose this problem?

No. Search Console displays the status but never explains why a page remains in discovery. Is it a crawl budget issue? Quality? Depth? Duplication? You have to guess.

This is where server log analysis becomes essential. If Googlebot never visits certain sections, the problem is structural — linking architecture, robots.txt, misplaced nofollow tags. If Googlebot visits but doesn't index, it's a quality or relevance signal.

Caution: Don't confuse 'Discovered - not indexed' with 'Crawled - not indexed'. The first means Google knows the URL exists but has never seriously visited it. The second means it has visited and decided not to index it — which is more concerning.

Practical impact and recommendations

What should you do to speed up indexation of strategic pages?

First priority: reduce link depth. If your important pages are 4-5 clicks from the homepage, Google considers them secondary. Move them up in the information architecture, add links from the main navigation or high-crawl pages.

Second lever: improve contextual internal linking. A page linked from 10 relevant blog articles with varied anchor text sends a much stronger value signal than an isolated page buried in the sitemap.

Third lever — and this is often overlooked: clean up unnecessary pages. If your site contains 5,000 URLs with 2,000 adding no value (archives, faceted filters with no content, old unoptimized landing pages), you dilute your crawl budget. Noindex, 404, or consolidate them.

How do you know if the problem is crawl budget or quality?

Analyze your server logs. If Googlebot never visits certain sections, it's a crawl budget or structure issue. If Googlebot visits every week but still doesn't index, it's a quality signal.

Also test forced indexation via Search Console (request indexation). If Google consistently refuses, it's judging the page as non-relevant — thin content, duplication, cannibalization.

What mistakes should you absolutely avoid?

Don't overwhelm Google with sitemaps of 50,000 URLs where half are worthless. Google will crawl some, discover many pages are weak, and reduce your overall crawl budget.

Don't create generic content just to fill pages. A product page with 30 words of copy lifted from a supplier has a better chance of remaining undiscovered than a page with 300 unique, well-structured words.

Avoid flat architectures with everything 1 click away: that doesn't work either. Google needs semantic hierarchy to understand what's prioritized.

  • Internal linking audit: verify that strategic pages are maximum 3 clicks from the homepage
  • Log analysis to identify sections never or rarely crawled
  • Cleanup of useless URLs: noindex, 404, or consolidation of pages with no added value
  • Content enrichment for pages stuck in 'Discovered - not indexed' if they're strategic
  • Sitemap optimization: submit only truly priority URLs
  • Monthly tracking of indexation rate by page type in Search Console
  • Test forced indexation to diagnose qualitative rejection vs. simple delay
The 'Discovered - not indexed' status is only problematic if it affects your strategic pages. In that case, you need to act on internal linking, link depth, and content quality. For large or technical sites, these optimizations require advanced expertise in architecture and log analysis — a complete diagnosis by a specialized SEO agency often quickly identifies structural bottlenecks and helps prioritize high-impact actions.

❓ Frequently Asked Questions

Combien de temps peut durer le statut 'découvert mais non indexé' ?
Indéfiniment selon Mueller. Google peut connaître une URL pendant des mois ou des années sans jamais l'indexer si elle n'est pas jugée prioritaire. Ce n'est problématique que si la page est stratégique pour votre business.
Faut-il supprimer les pages en 'découvert non indexé' de son sitemap ?
Pas systématiquement. Si ce sont des pages stratégiques, gardez-les et optimisez leur crawlabilité. Si ce sont des pages annexes sans valeur SEO, oui, retirez-les pour ne pas diluer votre crawl budget.
Un nouveau site doit-il attendre combien de temps avant que toutes ses pages soient indexées ?
Il n'y a pas de garantie que toutes les pages le soient un jour. Pour un nouveau site avec beaucoup de contenu, l'indexation progressive sur plusieurs mois est normale. Priorisez les pages importantes via le maillage interne.
Le crawl budget est-il le seul facteur qui explique ce phénomène ?
Non. Google peut aussi décider qu'une page n'apporte pas de valeur ajoutée par rapport à l'existant (contenu trop similaire, faible qualité) et refuser de l'indexer même s'il a le budget pour la crawler.
Comment forcer Google à indexer une page bloquée en 'découvert non indexé' ?
Utilisez l'outil d'inspection d'URL dans la Search Console et demandez l'indexation. Si Google refuse après plusieurs tentatives, c'est un signal qu'il juge la page non pertinente — il faut alors améliorer son contenu ou son positionnement dans le maillage interne.
🏷 Related Topics
Domain Age & History Content Crawl & Indexing AI & SEO

🎥 From the same video 23

Other SEO insights extracted from this same Google Search Central video · published on 18/02/2022

🎥 Watch the full video on YouTube →

Related statements

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.