What does your "discovered but not crawled" URL status really reveal about your site?

Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

If a large proportion of URLs appears as "discovered but not crawled" in Search Console, this indicates either a content quality issue (Google doesn't think users are searching for this content), or a technical issue (insufficient server capacity). In the technical case, only the webmaster can resolve the problem.

🎥 Source video

Extracted from a Google Search Central video

💬 EN 📅 25/08/2022 ✂ 13 statements

Watch on YouTube →

✂ Other statements from this video 12 ▾

📅

Official statement from August 25, 2022 (3 years ago)

⚠ A more recent statement exists on this topic Can a 100 MB background video kill your SEO? John Mueller · December 2, 2025 View statement →

TL;DR

A high proportion of URLs with "discovered but not crawled" status in Search Console indicates either content Google considers uninteresting, or a technically deficient server. Google doesn't crawl what it deems pointless for its users — or what it cannot crawl properly.

What you need to understand

What exactly does this "discovered not crawled" status mean?

Google has identified the URL — via an internal link, external link, or sitemap — but decided not to crawl it. It's not an oversight: it's a deliberate choice by the algorithm.

This status indicates that Googlebot has prioritized other pages on your site. It estimates that these URLs don't deserve immediate crawling, or perhaps never will.

Why does Google refuse to crawl certain pages?

Two main scenarios according to Gary Illyes: quality problem or technical problem.

If it's a quality issue, Google thinks the content interests no one — duplicate pages, thin content, unnecessary facets, out-of-stock product sheets. If it's technical, your server responds too slowly, timeouts, sporadic 5xx errors. In both cases, Googlebot conserves its crawl budget.

How do you distinguish a quality problem from a technical problem?

Analyze your server logs. If Googlebot attempts to crawl but receives errors or catastrophic response times, it's technical. If Googlebot doesn't even try, it's a quality signal.

Also check the type of URLs involved: thousands of facet filter pages? Architecture problem. Recent product sheets with unique content? Dig into server-side issues.

Discovered not crawled is not a Google bug — it's a verdict on your content or infrastructure
Google prioritizes its crawl budget: it won't crawl what it judges as valueless
A server logs audit allows you to distinguish technical refusal from editorial refusal
A high proportion of this status should trigger a critical analysis of your site

SEO Expert opinion

Does this statement truly reflect what we observe in the field?

Yes, and it's brutal. We regularly observe sites with 60-70% of URLs discovered but not crawled — often poorly managed e-commerce sites that generate thousands of filter combinations, or WordPress sites that index anything via the sitemap.

What Gary Illyes doesn't say: Google can also deliberately place URLs in "discovered not crawled" to test the site's reaction. If you fix a technical problem, Googlebot returns — sometimes within hours. If you clean up low-quality content, the effect is slower but measurable.

Should you always be alarmed by a high discovered-not-crawled rate?

No. It depends on which URLs are affected. If they're old-school pagination pages, blog archives from 2008, or tracking parameters, it's better that Google doesn't crawl them.

The problem arises when these are your new product sheets, your strategic landing pages, or your fresh editorial content. Then you have a real issue — either Google doesn't find them relevant, or your server is struggling.

Warning: A site that suddenly sees its discovered/not-crawled ratio skyrocket after a migration or technical deployment should react quickly. It's often a sign of server regression or misconfigured robots.txt.

Can you force Google to crawl these URLs?

No. [To be verified] but field experience shows that requesting a crawl via Search Console on 500 discovered-not-crawled URLs changes nothing. Google returns when it deems it worthwhile — or never.

The only solution: fix the root cause. Improve content, optimize the server, clean up the architecture. Googlebot is not an on-demand tool, it's an algorithm that prioritizes according to its own economic logic.

Practical impact and recommendations

What should you do concretely if you have a high discovered-not-crawled rate?

First, segment the URLs involved. Export the Search Console report, classify by type: products, categories, blog, facets, parameters. Identify patterns.

Next, cross-reference with your server logs for the same period. Is Googlebot attempting to crawl and failing? Or is it not even trying? If attempts + 5xx errors or timeouts, it's a server problem. If no attempts, it's a quality signal or crawl budget issue.

What mistakes should you absolutely avoid?

Never force indexation via sitemap of thousands of low-quality URLs hoping Google will crawl them. You make the problem worse: Google detects that you're massively offering it content it judges valueless, and this degrades the overall perception of your site.

Another classic mistake: deploying an undersized server for a product catalog of 50,000 references. If your average server response time exceeds 500ms, Googlebot will slow down its crawl — or even abandon certain sections.

How do you verify that your site is compliant and optimized?

Analyze the discovered/crawled ratio over the last 3 months in Search Console
Export URLs with "discovered not crawled" status and segment by page type
Check your server logs: is Googlebot attempting to crawl these URLs?
Measure average server response time (target: under 200ms)
Identify URLs with no added value and block them via robots.txt or noindex
Improve content on strategic non-crawled pages (uniqueness, depth, relevance)
Optimize your internal linking to push priority pages

The "discovered not crawled" status is a diagnosis, not a fatality. Either you clean up what doesn't deserve to be crawled, or you fix what prevents Google from crawling what matters. Either way, it requires a rigorous technical and editorial audit. If your team lacks resources or expertise to conduct this in-depth analysis — particularly on server logs and crawl budget optimization — you'll save time by relying on a specialized SEO agency that masters these complex diagnostics.

❓ Frequently Asked Questions

Combien de temps faut-il pour que Google crawle une URL découverte non crawlée après correction ?

Impossible à prédire. Ça peut aller de quelques heures si vous corrigez un problème serveur critique à plusieurs semaines si vous améliorez du contenu. Google recrawle selon sa propre priorisation.

Un taux de 30% d'URLs découvertes non crawlées est-il normal ?

Ça dépend des URLs concernées. Si ce sont des facettes ou des paramètres inutiles, c'est acceptable. Si ce sont vos nouvelles fiches produits, c'est un problème sérieux.

Faut-il retirer du sitemap les URLs découvertes non crawlées ?

Si ce sont des URLs sans valeur, oui. Si ce sont des pages stratégiques, non — corrigez d'abord le problème de fond (contenu ou serveur) avant de les retirer.

Google peut-il pénaliser un site avec beaucoup d'URLs découvertes non crawlées ?

Pas directement, mais un site qui propose massivement du contenu low-quality peut voir sa perception globale dégradée, ce qui impacte le crawl budget et potentiellement le classement.

🏷 Related Topics

crawl budget Search Console indexation logs serveur thin content temps réponse Googlebot

Content Crawl & Indexing AI & SEO Domain Name Search Console

🎥 From the same video 12

Other SEO insights extracted from this same Google Search Central video · published on 25/08/2022

🎥 Watch the full video on YouTube →

Related statements

« Previous

Google didn't have the concept of crawl budget fro...

Over 90% of websites don't need to worry about cra...

« Back to results