What does Google say about SEO? /
Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

If a large proportion of URLs appears as "discovered but not crawled" in Search Console, this indicates either a content quality issue (Google doesn't think users are searching for this content), or a technical issue (insufficient server capacity). In the technical case, only the webmaster can resolve the problem.
🎥 Source video

Extracted from a Google Search Central video

💬 EN 📅 25/08/2022 ✂ 13 statements
Watch on YouTube →
Other statements from this video 12
  1. Faut-il vraiment se préoccuper du crawl budget pour votre site ?
  2. Comment Google définit-il réellement le crawl budget et quels leviers peut-on actionner ?
  3. Le crawl budget est-il un concept inventé par Google ou par les SEO ?
  4. Google n'indexe-t-il vraiment qu'une fraction du web à cause de ses coûts de stockage ?
  5. Les requêtes POST plombent-elles vraiment votre crawl budget ?
  6. Le crawl budget d'une nouvelle section est-il hérité de la qualité du site principal ?
  7. Les codes 503 et 429 peuvent-ils vraiment réduire votre crawl budget ?
  8. Peut-on vraiment piloter son crawl budget depuis Google Search Console ?
  9. HTTP/2 améliore-t-il vraiment votre crawl budget ?
  10. Faut-il bloquer l'indexation de vos fichiers JavaScript pour optimiser le crawl budget ?
  11. Les 404 et robots.txt gaspillent-ils vraiment votre crawl budget ?
  12. Faut-il bloquer vos fichiers JavaScript décoratifs pour optimiser votre crawl budget ?
📅
Official statement from (3 years ago)
TL;DR

A high proportion of URLs with "discovered but not crawled" status in Search Console indicates either content Google considers uninteresting, or a technically deficient server. Google doesn't crawl what it deems pointless for its users — or what it cannot crawl properly.

What you need to understand

What exactly does this "discovered not crawled" status mean?

Google has identified the URL — via an internal link, external link, or sitemap — but decided not to crawl it. It's not an oversight: it's a deliberate choice by the algorithm.

This status indicates that Googlebot has prioritized other pages on your site. It estimates that these URLs don't deserve immediate crawling, or perhaps never will.

Why does Google refuse to crawl certain pages?

Two main scenarios according to Gary Illyes: quality problem or technical problem.

If it's a quality issue, Google thinks the content interests no one — duplicate pages, thin content, unnecessary facets, out-of-stock product sheets. If it's technical, your server responds too slowly, timeouts, sporadic 5xx errors. In both cases, Googlebot conserves its crawl budget.

How do you distinguish a quality problem from a technical problem?

Analyze your server logs. If Googlebot attempts to crawl but receives errors or catastrophic response times, it's technical. If Googlebot doesn't even try, it's a quality signal.

Also check the type of URLs involved: thousands of facet filter pages? Architecture problem. Recent product sheets with unique content? Dig into server-side issues.

  • Discovered not crawled is not a Google bug — it's a verdict on your content or infrastructure
  • Google prioritizes its crawl budget: it won't crawl what it judges as valueless
  • A server logs audit allows you to distinguish technical refusal from editorial refusal
  • A high proportion of this status should trigger a critical analysis of your site

SEO Expert opinion

Does this statement truly reflect what we observe in the field?

Yes, and it's brutal. We regularly observe sites with 60-70% of URLs discovered but not crawled — often poorly managed e-commerce sites that generate thousands of filter combinations, or WordPress sites that index anything via the sitemap.

What Gary Illyes doesn't say: Google can also deliberately place URLs in "discovered not crawled" to test the site's reaction. If you fix a technical problem, Googlebot returns — sometimes within hours. If you clean up low-quality content, the effect is slower but measurable.

Should you always be alarmed by a high discovered-not-crawled rate?

No. It depends on which URLs are affected. If they're old-school pagination pages, blog archives from 2008, or tracking parameters, it's better that Google doesn't crawl them.

The problem arises when these are your new product sheets, your strategic landing pages, or your fresh editorial content. Then you have a real issue — either Google doesn't find them relevant, or your server is struggling.

Warning: A site that suddenly sees its discovered/not-crawled ratio skyrocket after a migration or technical deployment should react quickly. It's often a sign of server regression or misconfigured robots.txt.

Can you force Google to crawl these URLs?

No. [To be verified] but field experience shows that requesting a crawl via Search Console on 500 discovered-not-crawled URLs changes nothing. Google returns when it deems it worthwhile — or never.

The only solution: fix the root cause. Improve content, optimize the server, clean up the architecture. Googlebot is not an on-demand tool, it's an algorithm that prioritizes according to its own economic logic.

Practical impact and recommendations

What should you do concretely if you have a high discovered-not-crawled rate?

First, segment the URLs involved. Export the Search Console report, classify by type: products, categories, blog, facets, parameters. Identify patterns.

Next, cross-reference with your server logs for the same period. Is Googlebot attempting to crawl and failing? Or is it not even trying? If attempts + 5xx errors or timeouts, it's a server problem. If no attempts, it's a quality signal or crawl budget issue.

What mistakes should you absolutely avoid?

Never force indexation via sitemap of thousands of low-quality URLs hoping Google will crawl them. You make the problem worse: Google detects that you're massively offering it content it judges valueless, and this degrades the overall perception of your site.

Another classic mistake: deploying an undersized server for a product catalog of 50,000 references. If your average server response time exceeds 500ms, Googlebot will slow down its crawl — or even abandon certain sections.

How do you verify that your site is compliant and optimized?

  • Analyze the discovered/crawled ratio over the last 3 months in Search Console
  • Export URLs with "discovered not crawled" status and segment by page type
  • Check your server logs: is Googlebot attempting to crawl these URLs?
  • Measure average server response time (target: under 200ms)
  • Identify URLs with no added value and block them via robots.txt or noindex
  • Improve content on strategic non-crawled pages (uniqueness, depth, relevance)
  • Optimize your internal linking to push priority pages
The "discovered not crawled" status is a diagnosis, not a fatality. Either you clean up what doesn't deserve to be crawled, or you fix what prevents Google from crawling what matters. Either way, it requires a rigorous technical and editorial audit. If your team lacks resources or expertise to conduct this in-depth analysis — particularly on server logs and crawl budget optimization — you'll save time by relying on a specialized SEO agency that masters these complex diagnostics.

❓ Frequently Asked Questions

Combien de temps faut-il pour que Google crawle une URL découverte non crawlée après correction ?
Impossible à prédire. Ça peut aller de quelques heures si vous corrigez un problème serveur critique à plusieurs semaines si vous améliorez du contenu. Google recrawle selon sa propre priorisation.
Un taux de 30% d'URLs découvertes non crawlées est-il normal ?
Ça dépend des URLs concernées. Si ce sont des facettes ou des paramètres inutiles, c'est acceptable. Si ce sont vos nouvelles fiches produits, c'est un problème sérieux.
Faut-il retirer du sitemap les URLs découvertes non crawlées ?
Si ce sont des URLs sans valeur, oui. Si ce sont des pages stratégiques, non — corrigez d'abord le problème de fond (contenu ou serveur) avant de les retirer.
Google peut-il pénaliser un site avec beaucoup d'URLs découvertes non crawlées ?
Pas directement, mais un site qui propose massivement du contenu low-quality peut voir sa perception globale dégradée, ce qui impacte le crawl budget et potentiellement le classement.
🏷 Related Topics
Content Crawl & Indexing AI & SEO Domain Name Search Console

🎥 From the same video 12

Other SEO insights extracted from this same Google Search Central video · published on 25/08/2022

🎥 Watch the full video on YouTube →

Related statements

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.