Official statement
Other statements from this video 16 ▾
- 1:33 La structure hiérarchique améliore-t-elle vraiment le référencement par rapport à une architecture plate ?
- 2:38 La refonte de navigation fait-elle vraiment perdre du ranking ?
- 3:44 Pourquoi Google conserve-t-il les URLs 404 dans Search Console pendant des années ?
- 4:24 Peut-on injecter les balises vidéo en JavaScript sans pénalité SEO ?
- 4:44 Google recadre-t-il automatiquement vos images de recettes si vous ne fournissez pas les bons formats ?
- 5:42 Comment Google adapte-t-il l'affichage AMP selon les capacités techniques du navigateur ?
- 5:45 Faut-il vraiment remplir les dates de modification dans vos sitemaps XML ?
- 8:42 Les iframes sont-elles vraiment neutres pour le SEO ou faut-il s'en méfier ?
- 9:03 Google peut-il faire pointer les backlinks de vos concurrents vers votre PDF ?
- 12:26 Le contenu dupliqué cross-domain est-il vraiment sans risque pour votre SEO ?
- 42:28 Faut-il limiter le nombre de liens sortants vers un même domaine pour éviter une pénalité Google ?
- 43:33 Pourquoi Google met-il plus de temps à indexer un simple changement de title ?
- 45:35 Comment Google calcule-t-il vraiment le crawl budget de votre site ?
- 47:48 Pourquoi Google n'indexe-t-il qu'une seule langue si votre site switche via JavaScript ?
- 50:53 Faut-il s'inquiéter quand le nombre de pages indexées fluctue de 50% en quelques jours ?
- 53:32 Le nofollow empêche-t-il vraiment Google de crawler vos liens ?
Google evaluates each page individually, not the overall volume of content. Having 5,000 articles instead of 500 doesn't mechanically improve your relevance. Removing low-quality content (duplicate news, thin articles) can optimize crawling and perceived quality on very large sites, but the impact remains limited for sites with a few thousand pages.
What you need to understand
Does Google prioritize the individual quality of pages or the total volume?
The statement by Johannes Müller cuts through a recurring debate: Google does not give an intrinsic bonus for content volume. A site with 5,000 pages is not mechanically better ranked than a site with 500 pages if the latter better meets search intent.
This approach relies on page-by-page evaluation. Each URL is judged on its own relevance, expertise, depth. The total number of articles in your CMS does not factor into the ranking of a given page. This means that mass production strategies — publishing just to publish — have no ROI if quality is not met.
How can removing old content improve crawling and perceived quality?
Müller mentions two possible benefits: crawl budget optimization and improved perceived quality. The crawl budget is the number of pages that Googlebot will explore on your site within a given time frame. If you have 10,000 pages and 7,000 of them are noise (thin archives, duplicate news, outdated pages), you waste crawl on content of no value.
Removing or de-indexing this low-quality content frees up crawl for the pages that really matter. On very large sites (millions of pages), this impact is measurable. On a site with a few thousand pages, the effect is marginal or even negligible — Google crawls this volume without issue.
Perceived quality is more nebulous. Müller suggests that the massive presence of low-quality content can harm the site's overall image in Google's eyes. But this assertion remains vague: no threshold, no precise metric, no quantified example.
What types of content are considered weak or problematic?
Müller explicitly cites duplicate agency news and thin articles. AFP, Reuters, or other agency news that are republished as-is are duplicate content that Google finds on hundreds of other sites. They provide no differentiating value.
Thin articles are content that adds nothing: 150 words without depth, expertise, or clear responses to an intent. These also include automatically generated pages (empty listings, tag pages without editorial content), unmoderated forum archives, and outdated pages that have never been updated.
- Google evaluates page by page, not based on the overall volume of content.
- Removing low-quality content improves crawl budget and potentially perceived quality, especially for sites with several million pages.
- For medium-sized sites (a few thousand pages), the impact remains limited — no need to panic and remove everything.
- Content to monitor closely: duplicate news, thin articles, outdated pages, empty listings.
- No precise metric is given to define what constitutes a problematic threshold of weak content.
SEO Expert opinion
Is this statement consistent with real-world observations?
Yes and no. The claim that Google evaluates page by page aligns with what we observe: a site with 200 highly-targeted pages can outperform a competitor with 2,000 mediocre pages. No doubt about that.
However, the notion of overall perceived quality remains unclear. Müller does not explain how Google measures this quality, at what ratio of weak pages a site is penalized, nor if this perception actually influences ranking. [To be verified] — we lack numerical data to validate this hypothesis. Some sites with thousands of thin pages continue to perform, while others do not. The pattern is not clear.
Regarding crawl budget, it's factual: very large sites (millions of pages) see measurable impact after a substantial cleanup. But Müller himself acknowledges that for medium-sized sites, the impact is limited. Most SEO-managed sites by agencies or in-house teams have between 1,000 and 50,000 pages — not 5 million.
What nuances should be added to this recommendation?
Removing content is never a trivial action. Every URL removed is potential organic traffic lost, orphaned backlinks, and 301 redirects to manage. Before throwing everything out, a detailed audit is necessary: does this page generate traffic, even if low? Does it capture profitable long-tails? Does it have quality backlinks?
Müller does not mention alternatives to removal: improving existing content, merging similar pages, canonicalization, de-indexing via robots.txt or noindex. These options are often more relevant than a drastic removal. A thin article of 200 words can become solid content of 1,200 words with targeted rewriting.
Finally, the notion of a medium site is vague. A few thousand pages can range from 2,000 to 9,000. A site with 9,000 pages and 6,000 zombie pages (zero traffic, zero backlinks) has a strong incentive to clean up, even if the crawl impact remains marginal. The effect on UX performance (server response time, navigation, internal linking) can be real.
In what cases does this rule not apply?
Never remove content that generates qualified traffic, even if low. A page that brings in 10 visits per month on a super-specific long-tail can convert at 50%. The same goes for pages with quality backlinks — even if the content is mediocre, the link juice remains useful for internal linking.
E-commerce sites with seasonal product pages should not remove everything out of season. These pages accumulate authority over the years, they are crawled and cached indexed. Reactivating them in season with a 301 from an archive breaks that continuity.
Practical impact and recommendations
How to identify content to remove or improve?
Start with a data-driven content audit. Export all your indexed URLs via Google Search Console. Cross-reference with Analytics data: organic traffic over 12 months, bounce rate, session duration, conversions. Add crawl metrics (depth, internal links) via Screaming Frog or Oncrawl.
Identify zombie pages: zero organic traffic over 12 months, zero external backlinks, low number of internal links. Then, segment by type: blog articles, product sheets, category pages, landing pages. A zombie blog article doesn’t have the same strategic value as a zombie product sheet.
What strategy to apply based on content volume and type?
For sites with a few thousand pages, prioritize improvement and merging over removal. Identify clusters of similar content: five articles on the same topic can become one pillar article. Redirect old URLs in 301 to the new enriched version.
For very large sites (> 100,000 pages), massive removal may be justified. But beware of the redirection plan: each removed URL must either be redirected to equivalent or superior content or return a clean 410 (Gone) if no equivalent exists. Mass 404s degrade user experience and waste crawl.
Duplicate news: de-index them via noindex, do not necessarily remove them if they have internal editorial value. Outdated pages: update them if the subject remains relevant, remove them if the topic is dead. Thin pages: enrich them if they capture intent, remove them if they add nothing.
What mistakes to avoid during a content cleanup?
Never remove without a complete backup of your database and redirection file. A rollback must be possible in less than 24 hours if you notice a drastic drop in traffic post-cleanup.
Avoid redirection chains: if you merge A and B to C, then C to D, you create a chain A > C > D. Google follows up to 5 hops, but this results in a loss of PageRank and crawl. Always redirect A > D and B > D directly.
Do not rely solely on Analytics data. Some pages generate dark traffic (apps, newsletters, social networks) that is invisible in GA. Check server logs before deleting. The same goes for pages with quality backlinks: Ahrefs, Majestic, or SEMrush can reveal links that Google Search Console does not display.
- Export all indexed URLs (Search Console, sitemap, crawl) and cross-reference with traffic, backlinks, and conversion data.
- Identify zombie pages (zero traffic for 12 months, zero backlinks, low internal linking) and segment by type.
- Favor merging and improving for sites < 50,000 pages, consider massive removal for sites > 100,000 pages.
- Establish a rigorous 301 redirection plan, avoid chains, track 404s post-cleanup.
- De-index (noindex) rather than remove if the content has internal editorial value or quality backlinks.
- Monitor traffic and positions weekly for 3 months post-cleanup to detect any side effects.
❓ Frequently Asked Questions
Supprimer du contenu ancien améliore-t-il directement le ranking des autres pages ?
À partir de combien de pages le crawl budget devient-il un problème ?
Faut-il supprimer les articles de blog qui génèrent peu de trafic ?
Les dépêches d'agence dupliquées nuisent-elles au SEO ?
Comment mesurer l'impact d'un nettoyage de contenu ?
🎥 From the same video 16
Other SEO insights extracted from this same Google Search Central video · duration 55 min · published on 14/08/2020
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.