What does Google say about SEO? /
Quick SEO Quiz

Test your SEO knowledge in 5 questions

Less than a minute. Find out how much you really know about Google search.

🕒 ~1 min 🎯 5 questions

Official statement

Google recommends using the 'noindex, follow' robots meta tag for pages like HTML sitemaps that you do not want to appear in search results, but for which you want the links to be followed and counted by Google. This allows the page itself not to be indexed while still retaining tracking of the links it contains.
0:31
🎥 Source video

Extracted from a Google Search Central video

⏱ 1:02 💬 EN 📅 15/02/2011
Watch on YouTube (0:31) →
📅
Official statement from (15 years ago)
TL;DR

Google explicitly recommends applying the 'noindex, follow' robots meta tag to HTML sitemaps. The goal is to prevent the indexing of these utility pages while preserving crawling and passing PageRank through their links. This means that your HTML sitemaps remain tools for navigation and content discovery for Googlebot, without cluttering your SERPs with pages that hold no value for the end user.

What you need to understand

Why does Google differentiate between indexing and link following?

The noindex, follow directive is based on a fundamental distinction in how Googlebot operates. Indexing a page means storing it in the search index and potentially presenting it in search results. Following links means exploring the URLs it contains and passing PageRank to those destinations.

This combination allows the creation of gateway pages: they guide Googlebot to important content without appearing themselves in the SERPs. HTML sitemaps typically fall into this category, as their function is architectural, not editorial.

What's the problem with indexing HTML sitemaps?

An indexed HTML sitemap creates noise in search results. It occupies a position that could be taken by a value-added page. Worse, it sometimes generates unintentional queries where Google ranks it due to a lack of more relevant content in your domain.

HTML sitemaps are built for bots and for lost users seeking an overview. They are not meant to capture organic traffic. Indexing them exposes the plumbing of your site.

How does the 'follow' tag preserve crawl budget and PageRank?

Without the follow directive, Googlebot ignores the links present on the noindex page. As a result, your important URLs may be discovered later or never, if they have no other internal backlinks. The crawl budget then focuses on less optimal paths.

By adding follow, you keep exploration active. PageRank normally flows from your HTML sitemap to the target pages. This is especially useful for deep sites where certain sections are far from the homepage or poorly linked.

  • noindex, follow prevents indexing while allowing crawl and PageRank transfer.
  • HTML sitemaps act as a discovery hub for Googlebot, not as a destination for users.
  • Indexing these pages dilutes the visibility of your strategic content in the SERPs.
  • The follow directive ensures that internal links retain their transmission value.
  • This approach optimizes crawl budget by directing Googlebot toward high-value pages.

SEO Expert opinion

Is this directive aligned with observed practices on the ground?

Yes, and it is even a welcome confirmation. For years, seasoned SEOs have been applying noindex, follow to utility pages: HTML sitemaps, non-strategic tag pages, deep pagination archives. Google formalizes here what already qualifies as architectural common sense.

However, there is a nuance: some high-authority sites index their HTML sitemaps without apparent harm. Their excess PageRank and crawl depth compensate. But for 95% of domains, this practice is suboptimal. [To be verified] whether Google actively penalizes the indexing of sitemaps, or if it merely advises against it without direct algorithmic sanction.

In what cases does this rule not apply?

An HTML sitemap can be indexed if you give it a real editorial value. For example, a media site that turns its sitemap into a hub page with enhanced descriptions, visuals, and interactive filters. At this point, it is no longer a technical sitemap; it's a category page.

Similarly, on a niche site with very few pages (fewer than 50), the impact of indexing an HTML sitemap is negligible. The real risk concerns medium to large sites, where every indexed position counts and where internal cannibalization lurks.

What are the common mistakes related to this directive?

The first trap: using noindex, nofollow out of excessive caution. The result: Googlebot does not crawl the sitemap links, which become useless. The second mistake: applying noindex via robots.txt, which purely blocks crawling and prevents Google from seeing the meta tag itself. The directive must appear in the HTML of the page, not in robots.txt.

The third confusion: believing that noindex reduces crawl budget. False. A noindex page is crawled normally as long as it remains accessible. It consumes budget; it just doesn’t enter the index. If you want to save crawl, you have to block access in robots.txt or remove the page. But then you lose PageRank transmission.

Attention: On very large sites (several tens of thousands of pages), multiplying noindex, follow pages can create a significant crawl load without direct returns in visibility. Analyze your server logs to ensure that Googlebot is not spending disproportionate time on these utility pages.

Practical impact and recommendations

What should you do practically with your existing HTML sitemaps?

First step: Identify all your HTML sitemaps. Look for URLs containing "sitemap", "plan-du-site", "sommaire" or equivalent. Check their indexing status in Google Search Console. If they appear as indexed, it means no noindex directive is applied.

Then add the following meta tag in the <head> of each affected page: <meta name="robots" content="noindex, follow">. Ensure that the tag is present in the rendered source code, and not just injected by JavaScript after the initial load. Google can interpret JS, but it’s best to make it easier.

How can you verify that the directive is properly acknowledged?

Use the URL Inspection tool in Google Search Console. Request a live indexing, then check the "Coverage" section. Google should indicate "Excluded due to the 'noindex' tag." If the page remains indexed several weeks after adding the tag, force a new crawl or check that no other directives (conflicting canonical, XML sitemap referencing the page) interfere.

Also check your server logs. Googlebot should continue to crawl the page regularly despite the noindex. If the crawl drops sharply, it means another directive is blocking access. Distinguish between "crawled but not indexed" (goal achieved) and "not crawled" (configuration issue).

What errors should be avoided during implementation?

Never block your HTML sitemaps in robots.txt if you want to apply noindex, follow. Googlebot must be able to access the page to read the meta tag. Do not duplicate directives: if you have noindex in the HTML, there’s no need to add it via X-Robots-Tag in the HTTP header, as this creates confusion during audits.

Also avoid applying noindex, follow to pages that receive high-quality external backlinks. An HTML sitemap referenced from an authoritative site can pass PageRank to your internal pages, but only if it is crawlable. Check your link profiles before mass noindexing.

  • Audit all your HTML sitemaps and check their current indexing status.
  • Add <meta name="robots" content="noindex, follow"> in the <head> of each affected page.
  • Validate acknowledgment via the URL Inspection tool in Google Search Console.
  • Monitor your server logs to confirm that Googlebot continues to crawl these pages.
  • Never block these URLs in robots.txt if you want to preserve link tracking.
  • Monitor crawl and indexing evolution for 4 to 6 weeks post-modification.
Applying noindex, follow to HTML sitemaps is a matter of basic SEO hygiene, but implementation requires diligence and verification. Complex sites with thousands of pages, multi-level architectures, or specific CMS constraints may encounter technical difficulties during deployment. In these contexts, working with a specialized SEO agency can help avoid configuration errors and finely optimize crawl budget distribution according to your business priorities.

❓ Frequently Asked Questions

Puis-je utiliser noindex, follow sur d'autres types de pages que les sitemaps HTML ?
Oui, cette directive s'applique à toute page utilitaire sans valeur pour l'utilisateur final : pages de tags peu stratégiques, archives de pagination profonde, pages de filtres à faible trafic. L'essentiel est que la page contienne des liens internes que vous voulez voir crawlés.
Que se passe-t-il si j'applique noindex, nofollow au lieu de noindex, follow ?
Googlebot ne suivra pas les liens présents sur la page. Elle devient invisible pour l'exploration, ce qui peut retarder la découverte de nouvelles URLs ou couper des chemins de transmission de PageRank. Votre sitemap HTML perd alors toute utilité SEO.
Dois-je aussi exclure mes sitemaps HTML du fichier sitemap.xml ?
Oui, par cohérence. Si vous ne voulez pas indexer ces pages, inutile de les soumettre explicitement à Google via sitemap.xml. Cela évite d'envoyer des signaux contradictoires et économise des lignes dans votre budget de soumission.
Combien de temps faut-il à Google pour désindexer une page après ajout de noindex ?
Généralement entre quelques jours et 4 semaines, selon la fréquence de crawl de votre site. Vous pouvez accélérer le processus en demandant une inspection d'URL dans Search Console, mais le délai reste variable.
La directive noindex, follow consomme-t-elle du budget crawl ?
Oui, une page noindex reste crawlée normalement tant qu'elle est accessible. Elle consomme donc du budget crawl, simplement elle n'entre pas dans l'index. Si vous voulez économiser du crawl, bloquez l'accès via robots.txt, mais vous perdrez alors le suivi des liens.
🏷 Related Topics
Domain Age & History Crawl & Indexing AI & SEO Links & Backlinks

Related statements

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.