What does Google say about SEO? /

Official statement

General sitemaps must include all your pages. News sitemaps are limited to the last 1000 articles and are specifically meant for Google News, but can be used by any website. Including all URLs in a sitemap allows for comprehensive reporting in Search Console.
11:03
🎥 Source video

Extracted from a Google Search Central video

⏱ 37:34 💬 EN 📅 12/06/2020 ✂ 18 statements
Watch on YouTube (11:03) →
Other statements from this video 17
  1. 1:06 Why does Google suddenly show more non-indexed URLs in Search Console?
  2. 3:11 Why does Google only crawl a fraction of your known pages?
  3. 5:17 Core Web Vitals: Why do your laboratory tests fail to impact your ranking?
  4. 9:30 Does user-generated content really expose your site's SEO liability?
  5. 12:05 Does the source of content affect the crawl budget?
  6. 13:08 Does Googlebot send an HTTP referrer when crawling your site?
  7. 14:09 Does image quality really affect rankings in Google’s web search?
  8. 18:15 How does Google really assess the importance of your pages through internal linking?
  9. 20:19 Is it true that a well-ranked website can lose its relevance without making any mistakes?
  10. 21:53 Are Core Web Vitals truly a ranking factor or just smoke and mirrors?
  11. 22:57 Does Discover really work without strict technical criteria?
  12. 25:02 Can removing pages from a sitemap actually limit their crawling by Google?
  13. 27:08 Should you really use unavailable_after to manage temporary content?
  14. 30:11 Does structured data really influence rankings on Google?
  15. 31:45 Why does Google sometimes index your AMP pages before their canonical HTML version?
  16. 33:52 Are Core Web Vitals truly crucial for Google ranking?
  17. 35:51 Does Google really see the content loaded dynamically after a user clicks?
📅
Official statement from (5 years ago)
TL;DR

Mueller states that general sitemaps should include all your pages, whereas news sitemaps are limited to 1000 URLs for Google News but can be utilized by any website. Including all of your URLs allows for comprehensive reporting in Search Console and facilitates tracking of indexing. Essentially, this directive requires a reevaluation of your sitemap strategy if you are intentionally excluding certain pages.

What you need to understand

Why does Google insist on including all pages in a general sitemap?

Mueller's directive is based on a simple logic: a general sitemap serves as a comprehensive reference for Google. By including all your URLs, you explicitly communicate to the search engine the entirety of your indexable structure.

This principle addresses two needs on Google's side: facilitating the discovery of deep pages and establishing a basis for comparison between what you declare and what is actually crawled. If your sitemap only lists 60% of your pages, Search Console can't alert you about the remaining 40% — you lose visibility on indexing issues.

What is the actual difference between a general sitemap and a news sitemap?

The news sitemap imposes a strict limit of 1000 articles and specifically targets inclusion in Google News. It uses a distinct XML schema with dedicated tags (publication_date, title, keywords) and should only contain recent editorial content — generally published within 48 to 72 hours.

A general sitemap, on the other hand, has no theoretical volume limit (although it is recommended to segment beyond 50,000 URLs per file). It accepts all types of pages: products, categories, static pages, old articles. Mueller clarifies that a news sitemap can be utilized by any site, even if not eligible for Google News, but this practice offers no advantage if you are not aiming for news inclusion.

What truly benefits from comprehensive reporting in Search Console?

Search Console cross-references your sitemaps with crawl and indexing data to generate contextual alerts. If a URL declared in your sitemap returns a 404, you receive a notification. If 300 listed pages are not indexed, the Coverage report will inform you of this with the reason (noindex, canonicalized, blocked by robots.txt).

Without exhaustive declaration, these metrics become partial and distort your perception of the site's SEO health. You might believe that 95% of your important pages are indexed when in reality only those in the sitemap are — and that 2000 pages outside the sitemap are orphaned or blocked without your knowledge.

  • Declare all your indexable URLs in the general sitemap, even those accessible via internal linking
  • Reserve the news sitemap for the last 1000 articles if you aim for Google News — otherwise, it is pointless
  • Segment your sitemaps beyond 50,000 URLs per file to respect technical limits
  • Monitor the Coverage report in Search Console to identify declared URLs that are not indexed
  • Do not intentionally exclude sections from your general sitemap on the pretext that they are well-linked — you lose reporting

SEO Expert opinion

Is this directive consistent with observed practices on the ground?

Yes, overall. Websites that comprehensively declare their URLs in the sitemaps indeed achieve finer visibility in Search Console. Alerts on 404 errors, redirections, and canonicalization issues are more precise and actionable.

But there are exceptions: some large-scale sites (+ 500,000 pages) intentionally segment their sitemaps to prioritize crawling of strategic sections. They include 100% of product listings and categories, but exclude blog archives or parameterized filters. This approach contradicts Mueller's directive but is based on a crawl budget logic — a concept that Google regularly downplays in public. [To be verified]: the actual impact of this segmentation on the ranking of excluded pages remains difficult to measure.

Is the news sitemap truly useful outside the context of Google News?

Let's be honest: no. If you are not eligible for Google News (no validated publisher status, no recent news feed), creating a news sitemap does not speed up the indexing of your articles compared to a standard general sitemap.

Some SEOs imagine that the news sitemap triggers a priority crawl even outside Google News. No official data confirms this — and field tests show contradictory results. If your goal is to quickly index your fresh content, focus instead on the IndexNow API or on a general sitemap with a properly populated lastmod tag. The news sitemap, in this case, is a gimmick.

Should you really include pages that you do not want to index?

No. Mueller refers to "all your pages" in the sense of all those you wish to see indexed. If a URL has a noindex tag or a canonical pointing to another page, do not include it in the general sitemap. It is a source of confusion for Google and generates unnecessary alerts in Search Console.

The problem: many sites by default include all URLs generated by their CMS, including parameterized variants, noindex pagination pages, and filters showing all. The result: polluted sitemaps that dilute the signal and complicate reporting. Regularly clean your sitemaps to keep only indexable canonical URLs.

Warning: A sitemap that lists 10,000 URLs including 3,000 in noindex or 404 sends a signal of low technical quality. Search Console will raise these errors, but Google may also consider that your site lacks structural consistency. Audit your sitemaps at least every quarter.

Practical impact and recommendations

How can you check if your general sitemap covers all your indexable pages?

Start by crawling your site with Screaming Frog or Oncrawl in exhaustive mode. Export the list of indexable URLs (status 200, no noindex, canonical to themselves). Compare this list with the URLs declared in your XML sitemaps.

If you see a discrepancy of more than 5%, there are two scenarios: either your sitemap is incomplete (orphaned URLs not declared), or your crawl has discovered pages you do not want to index. In the latter case, clean your structure or add noindex — do not let them linger outside the sitemap.

What to do if you exceed the limit of 50,000 URLs per sitemap file?

Create a sitemap index that references several segmented sitemap files. For example: sitemap_products_1.xml, sitemap_products_2.xml, sitemap_blog.xml, sitemap_categories.xml. Each file stays under 50,000 URLs, and the sitemap index centralizes them.

Do not segment based on arbitrary criteria like publication date or strategic importance — this complicates maintenance. Prefer segmentation by content type or site section: it’s easier to audit and automatically update through your CMS.

What errors should you avoid when generating sitemaps?

The most common mistake: including URLs in HTTP while the site is in HTTPS, or declaring URLs with tracking parameters (utm_source, etc.). Google may index these variants, but it dilutes authority and creates duplication.

Another classic pitfall: forgetting to update the lastmod tag when you modify a page. If this tag remains fixed at the creation date while you regularly republish, Google may ignore your updates or crawl them less frequently. Automate this tag via your CMS to reflect the true last modified date.

  • Audit your sitemaps quarterly to eliminate URLs in 404, noindex, or redirected
  • Create a sitemap index if you exceed 50,000 URLs, segmented by content type
  • Automate sitemap generation through your CMS to ensure their freshness
  • Check in Search Console that the coverage rate (submitted vs indexed URLs) exceeds 85%
  • Exclude parameterized URLs, HTTP/HTTPS variants, and noindex pages
  • Properly populate the lastmod tag to signal important updates
Optimizing sitemaps is a technical lever often underestimated that directly impacts visibility in Search Console and crawl efficiency. If your site exceeds a few thousand pages or if you observe significant discrepancies between crawled and indexed URLs, these adjustments can become complex to manage alone — especially if your CMS generates sitemaps by default without distinction between indexable and non-indexable. Seeking help from a specialized SEO agency allows you to benefit from a thorough technical audit, custom automation, and regular tracking of indexing coverage to avoid blind spots.

❓ Frequently Asked Questions

Dois-je créer un sitemap news si mon site n'est pas dans Google News ?
Non, ça n'apporte aucun avantage. Le sitemap news est spécifiquement conçu pour l'inclusion dans Google News et n'accélère pas l'indexation générale. Contentez-vous d'un sitemap général classique.
Que se passe-t-il si j'inclus des URLs en noindex dans mon sitemap général ?
Search Console remontera des alertes signalant que ces pages sont soumises mais exclues de l'indexation. Ça pollue votre reporting et peut donner l'impression d'un problème technique alors que c'est voulu. Excluez-les du sitemap.
Combien de sitemaps puis-je déclarer dans Search Console ?
Jusqu'à 500 fichiers sitemap par propriété. Si vous dépassez cette limite, utilisez un sitemap index pour regrouper plusieurs fichiers dans un seul point d'entrée déclaré dans Search Console.
La balise lastmod influence-t-elle réellement la fréquence de crawl ?
Google l'utilise comme signal parmi d'autres, mais ce n'est pas un déclencheur automatique de recrawl. Si elle est fiable et mise à jour régulièrement, elle peut accélérer la prise en compte des modifications — mais sans garantie.
Faut-il inclure les URLs canonicalisées vers d'autres pages dans le sitemap ?
Non. Seule l'URL canonique doit figurer dans le sitemap. Inclure les variantes canonicalisées génère des alertes dans Search Console et dilue le signal envoyé à Google.
🏷 Related Topics
Domain Age & History Crawl & Indexing Discover & News AI & SEO Domain Name Search Console

🎥 From the same video 17

Other SEO insights extracted from this same Google Search Central video · duration 37 min · published on 12/06/2020

🎥 Watch the full video on YouTube →

Related statements

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.