Official statement
Other statements from this video 17 ▾
- 1:06 Pourquoi Google affiche-t-il soudainement plus d'URLs non indexées dans Search Console ?
- 3:11 Le crawl budget : pourquoi Google ne crawle-t-il qu'une fraction de vos pages connues ?
- 5:17 Core Web Vitals : pourquoi vos tests en laboratoire ne servent-ils à rien pour le ranking ?
- 9:30 Le contenu généré par les utilisateurs engage-t-il vraiment la responsabilité SEO du site ?
- 12:05 Le crawl budget varie-t-il selon l'origine du contenu ?
- 13:08 Googlebot envoie-t-il un referrer HTTP lors du crawl de votre site ?
- 14:09 La qualité des images influence-t-elle vraiment le ranking dans la recherche web Google ?
- 18:15 Comment Google évalue-t-il vraiment l'importance de vos pages via le linking interne ?
- 20:19 Pourquoi un site bien positionné peut-il perdre sa pertinence sans avoir commis d'erreur ?
- 21:53 Les Core Web Vitals sont-ils vraiment un facteur de ranking ou juste un écran de fumée ?
- 22:57 Discover fonctionne-t-il vraiment sans critères techniques stricts ?
- 25:02 Retirer des pages d'un sitemap peut-il limiter leur crawl par Google ?
- 27:08 Faut-il vraiment utiliser unavailable_after pour gérer le contenu temporaire ?
- 30:11 Le structured data influence-t-il réellement le ranking dans Google ?
- 31:45 Pourquoi Google indexe-t-il parfois vos pages AMP avant leur version HTML canonique ?
- 33:52 Les Core Web Vitals sont-ils vraiment décisifs pour le ranking Google ?
- 35:51 Google voit-il vraiment le contenu chargé dynamiquement après un clic utilisateur ?
Mueller states that general sitemaps should include all your pages, whereas news sitemaps are limited to 1000 URLs for Google News but can be utilized by any website. Including all of your URLs allows for comprehensive reporting in Search Console and facilitates tracking of indexing. Essentially, this directive requires a reevaluation of your sitemap strategy if you are intentionally excluding certain pages.
What you need to understand
Why does Google insist on including all pages in a general sitemap?
Mueller's directive is based on a simple logic: a general sitemap serves as a comprehensive reference for Google. By including all your URLs, you explicitly communicate to the search engine the entirety of your indexable structure.
This principle addresses two needs on Google's side: facilitating the discovery of deep pages and establishing a basis for comparison between what you declare and what is actually crawled. If your sitemap only lists 60% of your pages, Search Console can't alert you about the remaining 40% — you lose visibility on indexing issues.
What is the actual difference between a general sitemap and a news sitemap?
The news sitemap imposes a strict limit of 1000 articles and specifically targets inclusion in Google News. It uses a distinct XML schema with dedicated tags (publication_date, title, keywords) and should only contain recent editorial content — generally published within 48 to 72 hours.
A general sitemap, on the other hand, has no theoretical volume limit (although it is recommended to segment beyond 50,000 URLs per file). It accepts all types of pages: products, categories, static pages, old articles. Mueller clarifies that a news sitemap can be utilized by any site, even if not eligible for Google News, but this practice offers no advantage if you are not aiming for news inclusion.
What truly benefits from comprehensive reporting in Search Console?
Search Console cross-references your sitemaps with crawl and indexing data to generate contextual alerts. If a URL declared in your sitemap returns a 404, you receive a notification. If 300 listed pages are not indexed, the Coverage report will inform you of this with the reason (noindex, canonicalized, blocked by robots.txt).
Without exhaustive declaration, these metrics become partial and distort your perception of the site's SEO health. You might believe that 95% of your important pages are indexed when in reality only those in the sitemap are — and that 2000 pages outside the sitemap are orphaned or blocked without your knowledge.
- Declare all your indexable URLs in the general sitemap, even those accessible via internal linking
- Reserve the news sitemap for the last 1000 articles if you aim for Google News — otherwise, it is pointless
- Segment your sitemaps beyond 50,000 URLs per file to respect technical limits
- Monitor the Coverage report in Search Console to identify declared URLs that are not indexed
- Do not intentionally exclude sections from your general sitemap on the pretext that they are well-linked — you lose reporting
SEO Expert opinion
Is this directive consistent with observed practices on the ground?
Yes, overall. Websites that comprehensively declare their URLs in the sitemaps indeed achieve finer visibility in Search Console. Alerts on 404 errors, redirections, and canonicalization issues are more precise and actionable.
But there are exceptions: some large-scale sites (+ 500,000 pages) intentionally segment their sitemaps to prioritize crawling of strategic sections. They include 100% of product listings and categories, but exclude blog archives or parameterized filters. This approach contradicts Mueller's directive but is based on a crawl budget logic — a concept that Google regularly downplays in public. [To be verified]: the actual impact of this segmentation on the ranking of excluded pages remains difficult to measure.
Is the news sitemap truly useful outside the context of Google News?
Let's be honest: no. If you are not eligible for Google News (no validated publisher status, no recent news feed), creating a news sitemap does not speed up the indexing of your articles compared to a standard general sitemap.
Some SEOs imagine that the news sitemap triggers a priority crawl even outside Google News. No official data confirms this — and field tests show contradictory results. If your goal is to quickly index your fresh content, focus instead on the IndexNow API or on a general sitemap with a properly populated lastmod tag. The news sitemap, in this case, is a gimmick.
Should you really include pages that you do not want to index?
No. Mueller refers to "all your pages" in the sense of all those you wish to see indexed. If a URL has a noindex tag or a canonical pointing to another page, do not include it in the general sitemap. It is a source of confusion for Google and generates unnecessary alerts in Search Console.
The problem: many sites by default include all URLs generated by their CMS, including parameterized variants, noindex pagination pages, and filters showing all. The result: polluted sitemaps that dilute the signal and complicate reporting. Regularly clean your sitemaps to keep only indexable canonical URLs.
Practical impact and recommendations
How can you check if your general sitemap covers all your indexable pages?
Start by crawling your site with Screaming Frog or Oncrawl in exhaustive mode. Export the list of indexable URLs (status 200, no noindex, canonical to themselves). Compare this list with the URLs declared in your XML sitemaps.
If you see a discrepancy of more than 5%, there are two scenarios: either your sitemap is incomplete (orphaned URLs not declared), or your crawl has discovered pages you do not want to index. In the latter case, clean your structure or add noindex — do not let them linger outside the sitemap.
What to do if you exceed the limit of 50,000 URLs per sitemap file?
Create a sitemap index that references several segmented sitemap files. For example: sitemap_products_1.xml, sitemap_products_2.xml, sitemap_blog.xml, sitemap_categories.xml. Each file stays under 50,000 URLs, and the sitemap index centralizes them.
Do not segment based on arbitrary criteria like publication date or strategic importance — this complicates maintenance. Prefer segmentation by content type or site section: it’s easier to audit and automatically update through your CMS.
What errors should you avoid when generating sitemaps?
The most common mistake: including URLs in HTTP while the site is in HTTPS, or declaring URLs with tracking parameters (utm_source, etc.). Google may index these variants, but it dilutes authority and creates duplication.
Another classic pitfall: forgetting to update the lastmod tag when you modify a page. If this tag remains fixed at the creation date while you regularly republish, Google may ignore your updates or crawl them less frequently. Automate this tag via your CMS to reflect the true last modified date.
- Audit your sitemaps quarterly to eliminate URLs in 404, noindex, or redirected
- Create a sitemap index if you exceed 50,000 URLs, segmented by content type
- Automate sitemap generation through your CMS to ensure their freshness
- Check in Search Console that the coverage rate (submitted vs indexed URLs) exceeds 85%
- Exclude parameterized URLs, HTTP/HTTPS variants, and noindex pages
- Properly populate the lastmod tag to signal important updates
❓ Frequently Asked Questions
Dois-je créer un sitemap news si mon site n'est pas dans Google News ?
Que se passe-t-il si j'inclus des URLs en noindex dans mon sitemap général ?
Combien de sitemaps puis-je déclarer dans Search Console ?
La balise lastmod influence-t-elle réellement la fréquence de crawl ?
Faut-il inclure les URLs canonicalisées vers d'autres pages dans le sitemap ?
🎥 From the same video 17
Other SEO insights extracted from this same Google Search Central video · duration 37 min · published on 12/06/2020
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.