Official statement
Other statements from this video 17 ▾
- 3:11 Why does Google only crawl a fraction of your known pages?
- 5:17 Core Web Vitals: Why do your laboratory tests fail to impact your ranking?
- 9:30 Does user-generated content really expose your site's SEO liability?
- 11:03 Should you include all your pages in a general sitemap?
- 12:05 Does the source of content affect the crawl budget?
- 13:08 Does Googlebot send an HTTP referrer when crawling your site?
- 14:09 Does image quality really affect rankings in Google’s web search?
- 18:15 How does Google really assess the importance of your pages through internal linking?
- 20:19 Is it true that a well-ranked website can lose its relevance without making any mistakes?
- 21:53 Are Core Web Vitals truly a ranking factor or just smoke and mirrors?
- 22:57 Does Discover really work without strict technical criteria?
- 25:02 Can removing pages from a sitemap actually limit their crawling by Google?
- 27:08 Should you really use unavailable_after to manage temporary content?
- 30:11 Does structured data really influence rankings on Google?
- 31:45 Why does Google sometimes index your AMP pages before their canonical HTML version?
- 33:52 Are Core Web Vitals truly crucial for Google ranking?
- 35:51 Does Google really see the content loaded dynamically after a user clicks?
Google hasn't changed its indexing algorithm—it's simply the reporting in Search Console that is evolving. Discovered but non-indexed URLs are now more visible in the interface. In practical terms? You'll see a volume of excluded URLs that you may not have noticed before, but that already existed in Google's pipeline. Don't panic: this isn't a degradation of your indexing; it's just that Google is finally showing you what it already ignored.
What you need to understand
Has Google changed its indexing criteria?
No. The indexing algorithm hasn't changed a bit. What John Mueller clarifies is that the change only concerns the visibility of data in Search Console. In other words, the URLs that Google discovers but chooses not to index have always existed—they just weren't as prominent in the reports.
Before this reporting update, many SEOs only saw part of the iceberg. Now, Search Console explicitly shows the discovered but excluded URLs. This isn't a problem in itself; it's just that Google decided to be more transparent about its selective filtering.
What does it really mean when we say “Google cannot index the entire web”?
Google crawls billions of pages, but indexing is resource-intensive (storage, computation, relevance). Hence, it filters. Some URLs are discovered (via a sitemap, an internal link, or a backlink) but deemed irrelevant, duplicated, too low in quality, or simply unnecessary for its users.
What Google calls “being selective” is actually a constant balancing act between crawl budget, duplicate content, thin content, canonicalization. A discovered page is not an indexed page—and many sites overlook this. Seeing these non-indexed URLs in Search Console is just Google finally showing you what it chose to leave out.
Should we be worried about the surge in non-indexed URLs?
Let's be honest: if you see a spike of several thousand discovered but non-indexed URLs, your first reaction is panic. But before you break everything, ask yourself the question: did these URLs really deserve to be indexed?
In many cases, these pages are annoying URL parameters, poorly managed e-commerce filters, wild pagination, WordPress archives that no one bothered to exclude correctly. If Google discovers them but doesn't index them, it might just be doing its job well. The problem arises when strategic pages end up in this lot—and then you need to dig deeper.
- Search Console reporting is more transparent, but indexing itself hasn't changed.
- Google has always been selective: discovering a URL does not guarantee its indexing.
- Seeing non-indexed URLs isn't necessarily an alarm signal—it depends on which ones.
- Analyzing the nature of these URLs is essential before panicking or completely overhauling everything.
- If strategic pages are excluded, that's where you need to investigate (quality, duplication, canonicalization, robots.txt, noindex).
SEO Expert opinion
Is this statement consistent with what we observe in the field?
Yes, and it’s even reassuring. For years, we’ve known that Google crawls far more than it indexes. Search Console reports have always been partial on this point: some exclusion signals were vague, while others were completely absent. This reporting update merely confirms what we were already seeing in server logs—hundreds, if not thousands, of URLs crawled but never indexed.
What changes now is that Google is putting it right in front of you. Previously, you had to cross-reference logs, sitemaps, GSC reports, and sometimes third-party tools to understand. Now, it's clearly displayed. And that’s a good thing—it forces you to clean up, prioritize, and stop throwing sitemaps of 50,000 URLs where half of them are useless.
What nuances should we add to this statement?
John Mueller says, “Google has always been selective.” That's true. But to what extent? And on what criteria? Here lies the artistic blur. Google never explicitly states why a certain URL is discovered but not indexed. Sometimes it’s obvious (duplicate, thin content); other times, it’s opaque (perceived quality, page authority, thematic context).
[To be checked]: Google claims this change does not affect indexing, but we have seen “reporting adjustments” coincide with indexing fluctuations before. We’ll need to monitor if sites see a real drop in indexed URLs in the weeks to come. This would align with a tightening of the crawl budget or a hardening of quality criteria—but Google will never explicitly say so.
In which cases does this rule not apply or pose problems?
If you have a clean, well-structured site with a nice sitemap and clear strategic URLs, this reporting change shouldn’t affect anything. You may see a few excluded URLs, but nothing alarming. However, if you're managing an e-commerce site with thousands of product variations, dynamic filters, or a media site with poorly managed archives, brace yourself for a shock.
The issue arises when important pages end up non-indexed for unclear reasons. Then you must investigate: content quality, internal duplication, sloppy canonicalization, robots.txt blocking, accidental noindex, or simply a lack of authority on the page. And that’s where it gets tricky—because Google will never tell you precisely why.
Practical impact and recommendations
What should you do concretely with these non-indexed URLs?
First step: audit the nature of these URLs. Go to Search Console, export the list of discovered but non-indexed URLs, and see what lies beneath. You’ll often find annoying URL parameters (?sort=, ?color=), wild pagination (/page/42/), empty categories, and worthless WordPress tags. If that’s the case, don’t panic—just exclude them properly.
Next, isolate the URLs that should be indexed. Product sheets, in-depth articles, SEO landing pages. If they’re on the list, that’s where you need to act: check content quality, correct duplications, strengthen internal linking, add strategic internal backlinks, or simply improve relevance.
What mistakes should be avoided in response to this reporting change?
Big mistake number one: panicking and submitting everything for indexing through the GSC tool. This is pointless. If Google has deemed a URL not relevant enough, forcing it won’t change anything in the long run. At best, it will be temporarily indexed and then re-excluded. At worst, you’ll spam Google with unnecessary requests and degrade your crawl budget.
Big mistake number two: completely ignoring this data. Yes, it’s just reporting. But if you have thousands of discovered non-indexed URLs, it likely signals a structural problem: polluted sitemap, poorly structured hierarchy, massive duplicate content, or failing canonicalization. This is an opportunity to clean up—not to sweep things under the rug.
How to check if my site is managing this indexing filtering well?
Start by cross-referencing Search Console with your server logs. Look at which URLs Googlebot crawls but does not index. If they're useless pages, great. If they're strategic pages, corrections are needed. Next, check your sitemap: remove any URLs you don’t want to be indexed (yes, it sounds stupid, but many just throw everything in).
Then, work on the internal quality and authority of pages you want to index. Strong internal linking, unique and substantial content, clean canonicalization, no accidental noindex. And above all, stop creating URLs like crazy—every additional URL dilutes your crawl budget and authority.
- Export the list of non-indexed discovered URLs from Search Console
- Identify the irrelevant URLs (parameters, filters, pagination) and exclude them properly (robots.txt, noindex, canonical)
- Spot the non-indexed strategic pages and investigate the cause (quality, duplication, weak internal linking)
- Clean the sitemap: submit only the URLs you genuinely want to index
- Enhance internal linking and authority of priority pages
- Monitor the evolution of the volume of non-indexed URLs over several weeks to detect trends
❓ Frequently Asked Questions
Ce changement de reporting signifie-t-il que Google indexe moins de pages qu'avant ?
Dois-je forcer l'indexation des URLs découvertes mais non indexées via l'outil de soumission GSC ?
Comment savoir si les URLs non indexées sont vraiment un problème pour mon SEO ?
Faut-il retirer ces URLs non indexées de mon sitemap ?
Ce changement peut-il impacter mon trafic SEO à court terme ?
🎥 From the same video 17
Other SEO insights extracted from this same Google Search Central video · duration 37 min · published on 12/06/2020
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.