Official statement
Other statements from this video 9 ▾
- 2:00 Les publicités Google Ads pénalisent-elles vraiment le référencement naturel ?
- 13:40 Les liens nofollow transmettent-ils vraiment zéro PageRank ?
- 23:21 Les liens internes influencent-ils vraiment le PageRank de vos pages ?
- 29:53 AMP booste-t-il vraiment votre classement Google ou est-ce un mythe SEO ?
- 34:32 Peut-on cumuler plusieurs schémas de balisage sur une même page sans risque SEO ?
- 48:00 Pourquoi Google tolère-t-il le contenu dupliqué dans la documentation technique ?
- 54:50 La modération des commentaires peut-elle déclencher une action manuelle Google ?
- 55:52 Mettre à jour son contenu sans changer la date améliore-t-il vraiment le classement ?
- 57:00 Google Web Light : Faut-il optimiser différemment pour les connexions lentes ?
Blocking a URL via robots.txt prevents Google from crawling the content but does not stop link tracking pointing to that page. The noindex directive effectively removes the page from the index, but requires Googlebot to first access the content to read this directive. In practice, using both robots.txt AND noindex simultaneously creates a technical conflict: the bot can never see the noindex instruction.
What you need to understand
Why isn't robots.txt enough to disallow a page from being indexed?
The robots.txt file acts like a "No Entry" sign placed in front of a door. Googlebot adheres to this instruction and does not crawl the blocked URL. The problem is that this URL can still appear in search results if external links point to it.
Google detects these backlinks, notes that a resource exists at this address, but cannot access the content to verify its nature. The result? An empty snippet with only the URL visible in the SERPs. It's not technically content indexing, but the URL remains in the index.
How does the noindex directive actually work?
The meta robots noindex tag (or the HTTP header X-Robots-Tag: noindex) explicitly tells Google to remove the page from its index. However, for this instruction to be read and applied, the bot must first be able to crawl the page.
Herein lies the trap: if you block the URL in robots.txt, Googlebot can never reach the HTML code where the noindex tag is located. The instruction remains invisible, thus ineffective. The page blocked in robots.txt with a noindex in the source code could still be listed in the index via its backlinks.
What is the impact on tracking outgoing links?
An often overlooked crucial point: robots.txt does not prevent Google from following the links pointing to the blocked URL. The engine detects these signals of popularity and incorporates them into its link graph, even without accessing the content.
On the other hand, if you use noindex without robots.txt, Google crawls the page, reads the directive, removes the URL from the index AND can follow the links present in the content of that page. The PageRank continues to flow through these outgoing links, which can be strategically useful for intermediate pages in your architecture.
- Robots.txt blocks crawling but does not prevent the appearance of the URL in the index if backlinks exist
- Noindex effectively removes the page from the index but requires prior crawling to be read
- Combining robots.txt AND noindex on the same URL creates a technical conflict: the noindex will never be applied
- A URL blocked in robots.txt can still consume crawl budget if Googlebot attempts to access it regularly
- Link tracking to a URL blocked in robots.txt remains active, contrary to popular belief
SEO Expert opinion
Does this statement align with observations in the field?
Yes, and it is a welcome confirmation of a behavior observed for years. In crawl budget audits, we regularly see URLs blocked in robots.txt that continue to appear in server logs: Googlebot attempts to crawl them periodically, especially if they receive new backlinks.
The point on link tracking is less publicly documented but corresponds to tests conducted on high-volume sites. A page blocked in robots.txt with outgoing internal links does not transmit classic PageRank (since it is not crawled), but the external links pointing to it generate detectable signals of popularity for the algorithm.
What nuances should be added to this rule?
Google specifies that noindex "requires the content to be accessible first," but does not detail the processing delay. In practice, a crawled page with noindex can remain visible in the index for several days or even weeks before complete deindexation. [To be verified] based on the crawl budget allocated to the site.
Another gray area: what happens if you block a URL in robots.txt AFTER it has been indexed with noindex? Theoretically, the already applied noindex should maintain deindexation, but Googlebot can no longer re-crawl to confirm the directive. Some practitioners have observed partial reindexing in this scenario.
In what cases does this logic create problems?
The classic scenario: you inherit a site with thousands of pages blocked in robots.txt that the client wants to "cleanly" deindex. Removing these lines from robots.txt to allow Googlebot to crawl the noindex consumes a massive crawl budget on URLs without value.
A pragmatic solution that is rarely mentioned: use the HTTP header X-Robots-Tag: noindex in the server response, even for URLs blocked in robots.txt. Technically, Googlebot should not see this header since it is not crawling, but some field reports suggest that Google might still detect it during occasional checks. [To be verified] — official documentation remains vague on this point.
Practical impact and recommendations
What concrete steps should be taken to deindex pages?
The clean method: remove URLs from robots.txt, add a meta robots noindex tag in the HTML code or via X-Robots-Tag in the HTTP header, and then let Googlebot crawl these pages. Monitor deindexation via Search Console, in the "Coverage" section or the URL inspection tool.
To expedite the process on large volumes, submit a XML sitemap containing only the URLs to be deindexed. Counterintuitive, but this forces Google to prioritize crawling these pages to read the noindex. Remove the sitemap once deindexation is confirmed.
What errors should be absolutely avoided in this configuration?
Error number one: blocking entire sections in robots.txt (e.g., /blog/) while adding noindex in the templates. The noindex will never be applied. If backlinks point to these URLs, they will appear in the index with empty snippets.
Error number two: using robots.txt to "hide" duplicate or low-quality content. Google does not see the content but still detects the URL via links. It is better to use noindex + allow in robots.txt, or completely remove the pages with 301 redirects to consolidated content.
How can you check that your configuration is consistent?
Audit your robots.txt line by line: every blocked URL must have a valid technical reason (system files, session parameters, duplicate content managed otherwise). If the goal is deindexation, robots.txt is the wrong tool.
Crawl your site with Screaming Frog or Oncrawl in "Googlebot" mode to identify pages with noindex AND blocked in robots.txt. These conflicts are more common than one might think, especially on CMS platforms with poorly configured SEO plugins. Also check the HTTP headers: some servers send X-Robots-Tag: noindex on already blocked URLs, creating unnecessary redundancy.
- Remove any URL from robots.txt that you truly want to deindex
- Implement noindex via a meta tag or X-Robots-Tag header according to your technical stack
- Temporarily submit a sitemap of the URLs to deindex to speed up crawling
- Monitor deindexation in Search Console with alerts on coverage changes
- Regularly audit robots.txt + noindex conflicts with a technical crawler
- Document every line of your robots.txt: why is this URL blocked?
❓ Frequently Asked Questions
Peut-on utiliser robots.txt ET noindex sur la même URL ?
Une page bloquée en robots.txt peut-elle quand même apparaître dans Google ?
Comment désindexer rapidement des milliers de pages bloquées en robots.txt ?
Le PageRank circule-t-il via une page en noindex ?
Quel impact sur le crawl budget si je débloque des milliers d'URLs pour appliquer noindex ?
🎥 From the same video 9
Other SEO insights extracted from this same Google Search Central video · duration 1h00 · published on 16/03/2017
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.