Official statement
Other statements from this video 17 ▾
- 3:11 Le crawl budget : pourquoi Google ne crawle-t-il qu'une fraction de vos pages connues ?
- 5:17 Core Web Vitals : pourquoi vos tests en laboratoire ne servent-ils à rien pour le ranking ?
- 9:30 Le contenu généré par les utilisateurs engage-t-il vraiment la responsabilité SEO du site ?
- 11:03 Faut-il vraiment inclure toutes vos pages dans un sitemap général ?
- 12:05 Le crawl budget varie-t-il selon l'origine du contenu ?
- 13:08 Googlebot envoie-t-il un referrer HTTP lors du crawl de votre site ?
- 14:09 La qualité des images influence-t-elle vraiment le ranking dans la recherche web Google ?
- 18:15 Comment Google évalue-t-il vraiment l'importance de vos pages via le linking interne ?
- 20:19 Pourquoi un site bien positionné peut-il perdre sa pertinence sans avoir commis d'erreur ?
- 21:53 Les Core Web Vitals sont-ils vraiment un facteur de ranking ou juste un écran de fumée ?
- 22:57 Discover fonctionne-t-il vraiment sans critères techniques stricts ?
- 25:02 Retirer des pages d'un sitemap peut-il limiter leur crawl par Google ?
- 27:08 Faut-il vraiment utiliser unavailable_after pour gérer le contenu temporaire ?
- 30:11 Le structured data influence-t-il réellement le ranking dans Google ?
- 31:45 Pourquoi Google indexe-t-il parfois vos pages AMP avant leur version HTML canonique ?
- 33:52 Les Core Web Vitals sont-ils vraiment décisifs pour le ranking Google ?
- 35:51 Google voit-il vraiment le contenu chargé dynamiquement après un clic utilisateur ?
Google hasn't changed its indexing algorithm—it's simply the reporting in Search Console that is evolving. Discovered but non-indexed URLs are now more visible in the interface. In practical terms? You'll see a volume of excluded URLs that you may not have noticed before, but that already existed in Google's pipeline. Don't panic: this isn't a degradation of your indexing; it's just that Google is finally showing you what it already ignored.
What you need to understand
Has Google changed its indexing criteria?
No. The indexing algorithm hasn't changed a bit. What John Mueller clarifies is that the change only concerns the visibility of data in Search Console. In other words, the URLs that Google discovers but chooses not to index have always existed—they just weren't as prominent in the reports.
Before this reporting update, many SEOs only saw part of the iceberg. Now, Search Console explicitly shows the discovered but excluded URLs. This isn't a problem in itself; it's just that Google decided to be more transparent about its selective filtering.
What does it really mean when we say “Google cannot index the entire web”?
Google crawls billions of pages, but indexing is resource-intensive (storage, computation, relevance). Hence, it filters. Some URLs are discovered (via a sitemap, an internal link, or a backlink) but deemed irrelevant, duplicated, too low in quality, or simply unnecessary for its users.
What Google calls “being selective” is actually a constant balancing act between crawl budget, duplicate content, thin content, canonicalization. A discovered page is not an indexed page—and many sites overlook this. Seeing these non-indexed URLs in Search Console is just Google finally showing you what it chose to leave out.
Should we be worried about the surge in non-indexed URLs?
Let's be honest: if you see a spike of several thousand discovered but non-indexed URLs, your first reaction is panic. But before you break everything, ask yourself the question: did these URLs really deserve to be indexed?
In many cases, these pages are annoying URL parameters, poorly managed e-commerce filters, wild pagination, WordPress archives that no one bothered to exclude correctly. If Google discovers them but doesn't index them, it might just be doing its job well. The problem arises when strategic pages end up in this lot—and then you need to dig deeper.
- Search Console reporting is more transparent, but indexing itself hasn't changed.
- Google has always been selective: discovering a URL does not guarantee its indexing.
- Seeing non-indexed URLs isn't necessarily an alarm signal—it depends on which ones.
- Analyzing the nature of these URLs is essential before panicking or completely overhauling everything.
- If strategic pages are excluded, that's where you need to investigate (quality, duplication, canonicalization, robots.txt, noindex).
SEO Expert opinion
Is this statement consistent with what we observe in the field?
Yes, and it’s even reassuring. For years, we’ve known that Google crawls far more than it indexes. Search Console reports have always been partial on this point: some exclusion signals were vague, while others were completely absent. This reporting update merely confirms what we were already seeing in server logs—hundreds, if not thousands, of URLs crawled but never indexed.
What changes now is that Google is putting it right in front of you. Previously, you had to cross-reference logs, sitemaps, GSC reports, and sometimes third-party tools to understand. Now, it's clearly displayed. And that’s a good thing—it forces you to clean up, prioritize, and stop throwing sitemaps of 50,000 URLs where half of them are useless.
What nuances should we add to this statement?
John Mueller says, “Google has always been selective.” That's true. But to what extent? And on what criteria? Here lies the artistic blur. Google never explicitly states why a certain URL is discovered but not indexed. Sometimes it’s obvious (duplicate, thin content); other times, it’s opaque (perceived quality, page authority, thematic context).
[To be checked]: Google claims this change does not affect indexing, but we have seen “reporting adjustments” coincide with indexing fluctuations before. We’ll need to monitor if sites see a real drop in indexed URLs in the weeks to come. This would align with a tightening of the crawl budget or a hardening of quality criteria—but Google will never explicitly say so.
In which cases does this rule not apply or pose problems?
If you have a clean, well-structured site with a nice sitemap and clear strategic URLs, this reporting change shouldn’t affect anything. You may see a few excluded URLs, but nothing alarming. However, if you're managing an e-commerce site with thousands of product variations, dynamic filters, or a media site with poorly managed archives, brace yourself for a shock.
The issue arises when important pages end up non-indexed for unclear reasons. Then you must investigate: content quality, internal duplication, sloppy canonicalization, robots.txt blocking, accidental noindex, or simply a lack of authority on the page. And that’s where it gets tricky—because Google will never tell you precisely why.
Practical impact and recommendations
What should you do concretely with these non-indexed URLs?
First step: audit the nature of these URLs. Go to Search Console, export the list of discovered but non-indexed URLs, and see what lies beneath. You’ll often find annoying URL parameters (?sort=, ?color=), wild pagination (/page/42/), empty categories, and worthless WordPress tags. If that’s the case, don’t panic—just exclude them properly.
Next, isolate the URLs that should be indexed. Product sheets, in-depth articles, SEO landing pages. If they’re on the list, that’s where you need to act: check content quality, correct duplications, strengthen internal linking, add strategic internal backlinks, or simply improve relevance.
What mistakes should be avoided in response to this reporting change?
Big mistake number one: panicking and submitting everything for indexing through the GSC tool. This is pointless. If Google has deemed a URL not relevant enough, forcing it won’t change anything in the long run. At best, it will be temporarily indexed and then re-excluded. At worst, you’ll spam Google with unnecessary requests and degrade your crawl budget.
Big mistake number two: completely ignoring this data. Yes, it’s just reporting. But if you have thousands of discovered non-indexed URLs, it likely signals a structural problem: polluted sitemap, poorly structured hierarchy, massive duplicate content, or failing canonicalization. This is an opportunity to clean up—not to sweep things under the rug.
How to check if my site is managing this indexing filtering well?
Start by cross-referencing Search Console with your server logs. Look at which URLs Googlebot crawls but does not index. If they're useless pages, great. If they're strategic pages, corrections are needed. Next, check your sitemap: remove any URLs you don’t want to be indexed (yes, it sounds stupid, but many just throw everything in).
Then, work on the internal quality and authority of pages you want to index. Strong internal linking, unique and substantial content, clean canonicalization, no accidental noindex. And above all, stop creating URLs like crazy—every additional URL dilutes your crawl budget and authority.
- Export the list of non-indexed discovered URLs from Search Console
- Identify the irrelevant URLs (parameters, filters, pagination) and exclude them properly (robots.txt, noindex, canonical)
- Spot the non-indexed strategic pages and investigate the cause (quality, duplication, weak internal linking)
- Clean the sitemap: submit only the URLs you genuinely want to index
- Enhance internal linking and authority of priority pages
- Monitor the evolution of the volume of non-indexed URLs over several weeks to detect trends
❓ Frequently Asked Questions
Ce changement de reporting signifie-t-il que Google indexe moins de pages qu'avant ?
Dois-je forcer l'indexation des URLs découvertes mais non indexées via l'outil de soumission GSC ?
Comment savoir si les URLs non indexées sont vraiment un problème pour mon SEO ?
Faut-il retirer ces URLs non indexées de mon sitemap ?
Ce changement peut-il impacter mon trafic SEO à court terme ?
🎥 From the same video 17
Other SEO insights extracted from this same Google Search Central video · duration 37 min · published on 12/06/2020
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.