Official statement
Other statements from this video 17 ▾
- 1:06 Why does Google suddenly show more non-indexed URLs in Search Console?
- 3:11 Why does Google only crawl a fraction of your known pages?
- 5:17 Core Web Vitals: Why do your laboratory tests fail to impact your ranking?
- 9:30 Does user-generated content really expose your site's SEO liability?
- 12:05 Does the source of content affect the crawl budget?
- 13:08 Does Googlebot send an HTTP referrer when crawling your site?
- 14:09 Does image quality really affect rankings in Google’s web search?
- 18:15 How does Google really assess the importance of your pages through internal linking?
- 20:19 Is it true that a well-ranked website can lose its relevance without making any mistakes?
- 21:53 Are Core Web Vitals truly a ranking factor or just smoke and mirrors?
- 22:57 Does Discover really work without strict technical criteria?
- 25:02 Can removing pages from a sitemap actually limit their crawling by Google?
- 27:08 Should you really use unavailable_after to manage temporary content?
- 30:11 Does structured data really influence rankings on Google?
- 31:45 Why does Google sometimes index your AMP pages before their canonical HTML version?
- 33:52 Are Core Web Vitals truly crucial for Google ranking?
- 35:51 Does Google really see the content loaded dynamically after a user clicks?
Mueller states that general sitemaps should include all your pages, whereas news sitemaps are limited to 1000 URLs for Google News but can be utilized by any website. Including all of your URLs allows for comprehensive reporting in Search Console and facilitates tracking of indexing. Essentially, this directive requires a reevaluation of your sitemap strategy if you are intentionally excluding certain pages.
What you need to understand
Why does Google insist on including all pages in a general sitemap?
Mueller's directive is based on a simple logic: a general sitemap serves as a comprehensive reference for Google. By including all your URLs, you explicitly communicate to the search engine the entirety of your indexable structure.
This principle addresses two needs on Google's side: facilitating the discovery of deep pages and establishing a basis for comparison between what you declare and what is actually crawled. If your sitemap only lists 60% of your pages, Search Console can't alert you about the remaining 40% — you lose visibility on indexing issues.
What is the actual difference between a general sitemap and a news sitemap?
The news sitemap imposes a strict limit of 1000 articles and specifically targets inclusion in Google News. It uses a distinct XML schema with dedicated tags (publication_date, title, keywords) and should only contain recent editorial content — generally published within 48 to 72 hours.
A general sitemap, on the other hand, has no theoretical volume limit (although it is recommended to segment beyond 50,000 URLs per file). It accepts all types of pages: products, categories, static pages, old articles. Mueller clarifies that a news sitemap can be utilized by any site, even if not eligible for Google News, but this practice offers no advantage if you are not aiming for news inclusion.
What truly benefits from comprehensive reporting in Search Console?
Search Console cross-references your sitemaps with crawl and indexing data to generate contextual alerts. If a URL declared in your sitemap returns a 404, you receive a notification. If 300 listed pages are not indexed, the Coverage report will inform you of this with the reason (noindex, canonicalized, blocked by robots.txt).
Without exhaustive declaration, these metrics become partial and distort your perception of the site's SEO health. You might believe that 95% of your important pages are indexed when in reality only those in the sitemap are — and that 2000 pages outside the sitemap are orphaned or blocked without your knowledge.
- Declare all your indexable URLs in the general sitemap, even those accessible via internal linking
- Reserve the news sitemap for the last 1000 articles if you aim for Google News — otherwise, it is pointless
- Segment your sitemaps beyond 50,000 URLs per file to respect technical limits
- Monitor the Coverage report in Search Console to identify declared URLs that are not indexed
- Do not intentionally exclude sections from your general sitemap on the pretext that they are well-linked — you lose reporting
SEO Expert opinion
Is this directive consistent with observed practices on the ground?
Yes, overall. Websites that comprehensively declare their URLs in the sitemaps indeed achieve finer visibility in Search Console. Alerts on 404 errors, redirections, and canonicalization issues are more precise and actionable.
But there are exceptions: some large-scale sites (+ 500,000 pages) intentionally segment their sitemaps to prioritize crawling of strategic sections. They include 100% of product listings and categories, but exclude blog archives or parameterized filters. This approach contradicts Mueller's directive but is based on a crawl budget logic — a concept that Google regularly downplays in public. [To be verified]: the actual impact of this segmentation on the ranking of excluded pages remains difficult to measure.
Is the news sitemap truly useful outside the context of Google News?
Let's be honest: no. If you are not eligible for Google News (no validated publisher status, no recent news feed), creating a news sitemap does not speed up the indexing of your articles compared to a standard general sitemap.
Some SEOs imagine that the news sitemap triggers a priority crawl even outside Google News. No official data confirms this — and field tests show contradictory results. If your goal is to quickly index your fresh content, focus instead on the IndexNow API or on a general sitemap with a properly populated lastmod tag. The news sitemap, in this case, is a gimmick.
Should you really include pages that you do not want to index?
No. Mueller refers to "all your pages" in the sense of all those you wish to see indexed. If a URL has a noindex tag or a canonical pointing to another page, do not include it in the general sitemap. It is a source of confusion for Google and generates unnecessary alerts in Search Console.
The problem: many sites by default include all URLs generated by their CMS, including parameterized variants, noindex pagination pages, and filters showing all. The result: polluted sitemaps that dilute the signal and complicate reporting. Regularly clean your sitemaps to keep only indexable canonical URLs.
Practical impact and recommendations
How can you check if your general sitemap covers all your indexable pages?
Start by crawling your site with Screaming Frog or Oncrawl in exhaustive mode. Export the list of indexable URLs (status 200, no noindex, canonical to themselves). Compare this list with the URLs declared in your XML sitemaps.
If you see a discrepancy of more than 5%, there are two scenarios: either your sitemap is incomplete (orphaned URLs not declared), or your crawl has discovered pages you do not want to index. In the latter case, clean your structure or add noindex — do not let them linger outside the sitemap.
What to do if you exceed the limit of 50,000 URLs per sitemap file?
Create a sitemap index that references several segmented sitemap files. For example: sitemap_products_1.xml, sitemap_products_2.xml, sitemap_blog.xml, sitemap_categories.xml. Each file stays under 50,000 URLs, and the sitemap index centralizes them.
Do not segment based on arbitrary criteria like publication date or strategic importance — this complicates maintenance. Prefer segmentation by content type or site section: it’s easier to audit and automatically update through your CMS.
What errors should you avoid when generating sitemaps?
The most common mistake: including URLs in HTTP while the site is in HTTPS, or declaring URLs with tracking parameters (utm_source, etc.). Google may index these variants, but it dilutes authority and creates duplication.
Another classic pitfall: forgetting to update the lastmod tag when you modify a page. If this tag remains fixed at the creation date while you regularly republish, Google may ignore your updates or crawl them less frequently. Automate this tag via your CMS to reflect the true last modified date.
- Audit your sitemaps quarterly to eliminate URLs in 404, noindex, or redirected
- Create a sitemap index if you exceed 50,000 URLs, segmented by content type
- Automate sitemap generation through your CMS to ensure their freshness
- Check in Search Console that the coverage rate (submitted vs indexed URLs) exceeds 85%
- Exclude parameterized URLs, HTTP/HTTPS variants, and noindex pages
- Properly populate the lastmod tag to signal important updates
❓ Frequently Asked Questions
Dois-je créer un sitemap news si mon site n'est pas dans Google News ?
Que se passe-t-il si j'inclus des URLs en noindex dans mon sitemap général ?
Combien de sitemaps puis-je déclarer dans Search Console ?
La balise lastmod influence-t-elle réellement la fréquence de crawl ?
Faut-il inclure les URLs canonicalisées vers d'autres pages dans le sitemap ?
🎥 From the same video 17
Other SEO insights extracted from this same Google Search Central video · duration 37 min · published on 12/06/2020
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.