Why does Google not index all your pages despite active crawling?

Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

When Search Console shows a lot of 'Discovered - currently not indexed' or 'Crawled - currently not indexed' pages, it is often because Google's algorithms are not convinced by the overall quality of the site. You need to significantly improve the overall quality, not just add content.

11:31

🎥 Source video

Extracted from a Google Search Central video

⏱ 54:50 💬 EN 📅 15/05/2020 ✂ 23 statements

Watch on YouTube (11:31) →

✂ Other statements from this video 22 ▾

📅

Official statement from May 15, 2020 (5 years ago)

⚠ A more recent statement exists on this topic Is the Google Product Experts Program Really at Risk for Your SEO Strategy? John Mueller · October 1, 2024 View statement →

TL;DR

Google states that bulk statuses of 'Discovered - currently not indexed' and 'Crawled - currently not indexed' reveal a broader quality issue of the site, not just a content deficit. The algorithm assesses your entire ecosystem before deciding to index. In practical terms: adding 50 new pages won't fix anything if the foundation is shaky — it is essential to first clean up, prune, and enhance perceived quality.

What you need to understand

What do these two statuses in Search Console really mean?

'Discovered - currently not indexed' indicates that Googlebot has spotted the URL (via sitemap, internal link, external link) but has chosen not to crawl it immediately or not to index it after a superficial crawl. 'Crawled - currently not indexed' goes further: Google visited the page, analyzed its content, but decided not to include it in the index.

These two statuses are not technical bugs. They reflect a deliberate algorithmic decision. Google believes that these pages do not provide enough value to deserve a place in the index — either because they duplicate existing content or because the site overall lacks quality or authority signals.

Why does Google refer to 'overall site quality'?

Indexing is not binary. Google evaluates each site against an implied crawl budget and quality threshold. If your domain has a bad reputation (thin content, historical spam, toxic links, terrible UX), the algorithm applies a higher severity filter on all new URLs.

You can publish decent or even good content — if the rest of the site is mediocre, Google remains hesitant. This is a reversed halo effect: the perceived quality of the site contaminates the perception of every individual page. Mueller emphasizes this point: the issue may not necessarily be the unindexed page itself, but the environment in which it resides.

How does this differ from a simple crawl budget issue?

The crawl budget limits the number of pages that Googlebot visits each day. Here, the concern is different: even when Google crawls, it refuses to index. It is a post-crawl quality filter, not a blockage upstream.

A site with 10,000 pages may see 8,000 URLs crawled regularly but only 3,000 indexed. The crawl budget is not saturated — it is the quality that is the issue. Google has decided that these 5,000 pages do not deserve indexing, even after visiting them.

Discovered not indexed: Google hesitates, evaluates, postpones the indexing decision
Crawled not indexed: Google has decided after analysis — the page is deemed insufficient
Overall quality signal: a massive volume of these statuses reveals a structural problem with the site, not a one-off issue
No quick fix: adding content or forcing the crawl changes nothing if the quality foundation is lacking
Action required: complete audit, pruning, partial redesign — not just marginal optimization

SEO Expert opinion

Is this statement consistent with on-the-ground observations?

Yes, it corroborates what has been observed for years. Sites accumulating thousands of 'Discovered not indexed' pages often have a troubled history: poorly managed migrations, acquired content farms, uncontrolled explosion of e-commerce facets. Google does not explicitly say how it measures 'overall quality,' but experience shows that signals like overall bounce rate, average loading speed, density of broken internal links, or the proportion of zero-traffic pages come into play.

What Mueller does not specify — and that’s unfortunate — is the threshold. At what point does the percentage of unindexed pages become a cause for concern? 10% of the total? 50%? It depends on the context, of course, but the lack of a figure makes diagnosis difficult. [To be verified] on new or rapidly growing sites, a high volume of 'Discovered' may be temporary while Google assesses.

What nuances should be added to this statement?

Not all sites with a lot of unindexed pages are necessarily of poor quality. A media site with thousands of old archives, an e-commerce site with permanently out-of-stock seasonal products, a UGC platform with moderation — in these cases, Google may legitimately ignore entire sections without it indicating a deeper issue.

The trap is in generalizing. If you have 5,000 'Crawled not indexed' pages on a blog with 6,000 articles, then yes, that's a warning signal. If it’s on a site with 200,000 product listings where 80% are outdated, that’s almost logical. The key is to analyze the indexed/total ratio and the nature of the concerned pages before panicking.

In what cases does this rule not apply?

Very recent sites (less than 6 months, low authority) may see long indexing delays without this reflecting a quality issue. Google takes its time assessing new entrants, especially in saturated niches. Likewise, a site under manual penalty or spam algorithm will see its pages rejected in bulk, but that’s a specific case — non-indexation is a consequence, not a diagnosis.

Finally, some CMS generate junk URLs (filters, sorts, sessions) that Google crawls by mistake but never indexes. If these URLs account for 90% of your 'Discovered not indexed', the problem is not overall quality but a robots.txt or canonical configuration error. Distinguishing structural noise from quality signal is essential.

Practical impact and recommendations

What should you do concretely if your site accumulates these statuses?

First step: audit the real quality of unindexed pages. Export the list from Search Console, sample 50-100 URLs, and evaluate them honestly. Thin content? Internal duplication? Low user value? If the answer is yes, these pages might deserve not to be indexed — or to be removed.

Next, look at the indexed and performing pages. What do they have in common? Length, semantic depth, internal linking, UX signals (time on page, CTR)? Identify winning patterns and gradually align the rest of the content to this standard. Don’t seek to index more — aim to deserve indexing.

What mistakes to avoid in this context?

Don't fall into blind activism. Adding 200 new articles to 'dilute' the ratio of unindexed pages resolves nothing if those articles themselves are mediocre. Google evaluates the overall trend, not just a snapshot. Likewise, forcing crawl via 'Request indexing' in bulk is pointless — Google has already crawled and refused these pages.

Another trap: focusing solely on unindexed pages while ignoring those that are indexed but generate zero traffic. The latter also drag your overall quality score down. A site with 10,000 indexed pages, of which 7,000 receive zero monthly visits, sends a strong negative signal. Pruning, merging, redirecting these zombie pages often improves the algorithmic perception of the entire site.

How to verify that your strategy produces results?

Follow the evolution of the indexed pages / submitted pages ratio over 3-6 months after your actions. If you prune 2,000 weak pages and improve 500, you should see the number of 'Crawled not indexed' gradually decrease. Watch out: it’s slow. Google reevaluates the overall quality of a site over multiple crawl cycles.

At the same time, monitor crawl metrics (frequency, volume) and aggregated UX signals (Core Web Vitals, average bounce rate). An improvement in these indicators boosts algorithmic trust and promotes the indexing of new pages. If nothing changes after 6 months of sustained efforts, you need to dig deeper — toxic links, silent penalty, structural technical issue.

Export and analyze 'Discovered' and 'Crawled not indexed' URLs from Search Console
Identify low-value pages and decide: improve, merge, delete, or noindex
Audit indexed pages with zero traffic and address these 'zombies' to clean the index
Strengthen internal linking to strategic pages to redistribute authority
Improve overall UX signals (speed, mobile, engagement) to enhance quality perception
Monitor the evolution of the indexing ratio for at least 6 months before concluding the effectiveness of actions

Google's selective indexing reflects an overall qualitative judgment on your site. Instead of trying to index more, aim to deserve indexing by raising the standard of the whole. This involves thorough auditing, courageous pruning, and partial redesign of weak content. These projects are often heavy and technical — if you lack internal resources or expertise to lead this transformation, hiring a specialized SEO agency can accelerate diagnosis and ensure consistent implementation over time.

❓ Frequently Asked Questions

Combien de pages 'Discovered not indexed' est considéré comme anormal ?

Il n'y a pas de seuil absolu. Un ratio supérieur à 30-40 % de vos URLs soumises peut signaler un problème si ces pages ont vocation à être indexées. Contexte et nature des pages comptent autant que le chiffre brut.

Faut-il supprimer les pages 'Crawled not indexed' pour améliorer la qualité globale ?

Pas systématiquement. Analysez d'abord leur valeur réelle. Si elles sont thin, dupliquées ou sans intérêt utilisateur, oui, supprimez-les ou passez-les en noindex. Si elles ont un potentiel, améliorez-les plutôt que de les effacer.

Est-ce que forcer l'indexation via 'Demander une indexation' fonctionne dans ce cas ?

Non. Google a déjà crawlé et refusé ces pages. Redemander l'indexation ne changera rien si le problème de qualité globale persiste. Il faut d'abord corriger la cause profonde.

Un site neuf avec peu de backlinks peut-il avoir beaucoup de pages non indexées sans que ce soit grave ?

Oui. Google prend du temps pour indexer les nouveaux sites à faible autorité. Si le contenu est solide, l'indexation viendra progressivement. Patience et acquisition de signaux de confiance (liens, trafic) accélèrent le processus.

Comment distinguer un problème de crawl budget d'un problème de qualité globale ?

Si Google crawle peu vos pages, c'est un souci de crawl budget. Si Google crawle beaucoup mais n'indexe pas, c'est un filtre qualité post-crawl. Vérifiez les stats de crawl dans Search Console pour trancher.

🏷 Related Topics

indexation qualité site crawl budget Search Console contenu thin algorithme Google audit SEO pages zombies

Algorithms Domain Age & History Content Crawl & Indexing Discover & News AI & SEO Search Console

🎥 From the same video 22

Other SEO insights extracted from this same Google Search Central video · duration 54 min · published on 15/05/2020

🎥 Watch the full video on YouTube →

Related statements

« Previous

Hreflang: Favor less pages over more...

Backlinks from Suspicious Pages Only Visible in Se...

« Back to results