What does Google say about SEO? /
Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

Sitemaps must contain only canonical and indexable URLs, meaning those that should appear in search results. URLs that redirect elsewhere or are marked as noindex provide little value in the sitemap.
🎥 Source video

Extracted from a Google Search Central video

💬 FR EN 📅 16/11/2023 ✂ 8 statements
Watch on YouTube →
Other statements from this video 7
  1. Le sitemap XML est-il vraiment indispensable pour améliorer le crawl de votre site ?
  2. Faut-il vraiment un sitemap pour être indexé par Google ?
  3. Faut-il vraiment limiter les mises à jour de lastmod dans vos sitemaps XML ?
  4. Quelles sont les limites techniques réelles des fichiers sitemap XML ?
  5. Faut-il vraiment diviser vos sitemaps volumineux en plusieurs fichiers ?
  6. Faut-il vraiment indexer toutes les URL de votre sitemap ?
  7. Quels types de contenu faut-il vraiment inclure dans vos sitemaps ?
📅
Official statement from (2 years ago)
TL;DR

Google is crystal clear: sitemaps should contain only canonical and indexable URLs. Everything else — redirects, noindex pages, non-canonical variants — pollutes your sitemap and adds zero value. Most sites desperately need a cleanup.

What you need to understand

Why does Google keep hammering on what seems like basic stuff?

Because in reality, the majority of sitemaps are misconfigured. You'll find URLs that redirect, pages marked noindex, non-canonicalized parameter variants. Google has to sort through this mess, burning crawl budget for nothing.

The sitemap is supposed to make Googlebot's job easier, not harder. When you stuff it with URLs that shouldn't be indexed, you're sending mixed signals: "crawl this page" on one hand, "don't index it" on the other.

What exactly counts as an indexable URL in this context?

An indexable URL is one that returns a 200 status code, has no noindex tag, isn't blocked in robots.txt, and represents the canonical version (either self-referencing or without any canonical tag if it's the only version).

If your URL redirects to another one with a 301 or 302, it's not indexable. If it has a canonical pointing elsewhere, it's not the canonical version. Simple — and yet.

What are the real consequences of a polluted sitemap?

Googlebot wastes time crawling pointless pages. Your crawl budget gets diluted, especially on large sites. Result: strategic pages might get crawled less frequently.

Another nasty side effect: a sitemap full of errors can lead Google to view it as unreliable, or even partially ignore it. You lose the prioritization advantage it's supposed to provide.

  • Put only canonical URLs in your sitemap
  • Exclude any noindex URLs or ones that redirect
  • Avoid non-canonicalized parameter variants
  • Regularly verify consistency between sitemap and indexation directives
  • Treat the sitemap as a prioritization signal, not a dumping ground

SEO Expert opinion

Is this rule actually followed by major web players?

Spoiler: nope. A quick audit of sitemaps from well-known sites reveals thousands of redirect or noindex URLs. Even big tech platforms send contradictory signals.

That said — and here's where it gets interesting — Google is capable of handling this pollution. It won't penalize your site because your sitemap contains 10% of 301 URLs. But you lose the crawl optimization effect the sitemap should deliver.

Are there cases where including a non-canonical URL makes sense?

Honestly? No. Some SEOs argue that including variants helps Google discover the canonical version faster. That's flawed reasoning: if your internal linking is solid, Google will find the canonical without help.

Others deliberately include temporary noindex pages to get them crawled faster. Again, that's a crutch. If a page needs to be crawled quickly, it should be linked from an important page — not snuck into a sitemap.

Warning: If you use a CMS that auto-generates your sitemap, check its filtering logic. WordPress, Shopify, Magento… they all have their quirks. Misconfigured auto-generation can create more problems than it solves.

Is Google transparent about the real impact of this recommendation?

As usual, the statement stays vague. Martin Splitt says non-indexable URLs are "of little value." Little value, or genuinely harmful? [Needs verification]

Hard data is missing. What percentage of problematic URLs starts to affect sitemap efficiency? Google won't say. We're flying blind, relying on field reports suggesting that beyond 15-20% useless URLs, the crawl impact becomes measurable.

Practical impact and recommendations

How do you audit your current sitemap?

Start by extracting all URLs from your sitemap. Use Screaming Frog, Oncrawl, or a Python script with standard libraries (requests, BeautifulSoup).

Then crawl those URLs and verify: HTTP status code, presence of canonical tag, indexation directive (noindex or not). Cross-reference with your server logs to see if Google actually crawls what you're telling it to.

What should you actually do to clean up a polluted sitemap?

Remove every URL returning something other than 200. Strip out pages with a canonical pointing elsewhere. Systematically exclude pages marked noindex.

If you have thousands of URLs, automate the process. Most CMS platforms let you set filtering rules. Shopify, for instance, includes filtered collections by default — you need to exclude them manually.

  • Crawl your sitemap with an SEO tool (Screaming Frog, Sitebulb, Oncrawl)
  • Identify URLs returning 3XX, 4XX, 5XX codes and remove them
  • Check for noindex tags and exclude those pages
  • Verify that each sitemap URL is truly the canonical version
  • Configure your CMS to prevent auto-generation of non-indexable URLs
  • Submit the cleaned sitemap via Google Search Console
  • Monitor coverage rate and crawl trends in GSC
A clean sitemap boosts crawl efficiency, especially on large sites. Regular sitemap cleanup should be part of your technical SEO routine. If your infrastructure is complex — multilingual, multi-domain, dynamic content generation — this task can become time-consuming. In that case, working with a specialized SEO agency can save you precious time and prevent costly crawl budget mistakes.

❓ Frequently Asked Questions

Peut-on avoir plusieurs sitemaps pour un même site ?
Oui, et c'est même recommandé sur les gros sites. Vous pouvez segmenter par type de contenu (articles, produits, catégories) et les référencer dans un sitemap index. Cela facilite le monitoring et l'optimisation.
Que se passe-t-il si on ne met aucun sitemap ?
Google crawlera votre site via le maillage interne et les liens externes. Pas de pénalité directe, mais vous perdez un levier de priorisation. Sur un site bien structuré avec peu de pages, l'impact est faible. Sur un gros site, c'est une erreur.
Les images et vidéos doivent-elles être dans le sitemap principal ?
Non. Utilisez des sitemaps spécifiques (image sitemap, video sitemap) avec des balises dédiées. Mélanger tout dans un sitemap unique le rend illisible et moins efficace.
Faut-il inclure les pages paginées dans le sitemap ?
Seulement si elles sont canonisées sur elles-mêmes et indexables. Si vous utilisez rel=prev/next ou un canonical vers la page 1, excluez-les. L'objectif reste de ne proposer que des URL à indexer.
À quelle fréquence faut-il mettre à jour le sitemap ?
Idéalement en temps réel ou quasi-réel si vous publiez souvent. Sinon, au minimum une fois par semaine. Google recrawle les sitemaps selon la fréquence de mise à jour détectée, donc un sitemap statique sera moins souvent vérifié.
🏷 Related Topics
Crawl & Indexing AI & SEO Domain Name Search Console

🎥 From the same video 7

Other SEO insights extracted from this same Google Search Central video · published on 16/11/2023

🎥 Watch the full video on YouTube →

Related statements

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.