Does robots.txt really block the transmission of PageRank and prevent indexing?

Official statement

URLs blocked by the robots.txt file can still receive PageRank through external links, and can be indexed with minimal information if they are not crawlable.

41:48

🎥 Source video

Extracted from a Google Search Central video

⏱ 1h02 💬 EN 📅 07/03/2017 ✂ 10 statements

Watch on YouTube (41:48) →

✂ Other statements from this video 9 ▾

2:08 Le Knowledge Graph fonctionne-t-il vraiment sans intervention manuelle de Google ?
4:38 Le duplicate content involontaire peut-il vraiment bloquer votre récupération Panda ?
14:44 Les pages utilitaires avec beaucoup de liens internes tuent-elles vraiment votre SEO ?
15:46 Les pages de faible qualité sabotent-elles vraiment l'autorité de tout votre site ?
47:00 La vitesse mobile affecte-t-elle vraiment le classement SEO ?
51:30 L'indexation mobile-first hérite-t-elle vraiment de tous les signaux desktop ?
56:40 La vitesse mobile va-t-elle enfin devenir un critère de classement Google ?
58:06 Le contenu sous onglets mobile est-il vraiment indexé par Google ?
59:10 La structure de site suffit-elle vraiment à sauver votre indexation mobile ?

What you need to understand

Why can a URL blocked by robots.txt still be indexed?

The robots.txt file only prohibits crawling of a URL, not its discovery. When an external site links to a page you've blocked, Google identifies this URL but cannot access its content. As a result, the page could appear in search results with a minimal description like "No information available for this page due to robots.txt".

This situation poses a problem when someone believes they are protecting a page from indexing via robots.txt. Blocking does not prevent indexing; it simply prevents Google from reading the content. If you really don't want a page to be indexed, the noindex directive in meta tags or HTTP headers remains the only reliable method. But beware: for Google to read this noindex, it must be able to crawl the page, so it should not be blocked by robots.txt.

How does PageRank flow to a non-crawlable URL?

PageRank flows through links, whether or not Google can explore the destination page. When a site links to your blocked URL, Google registers this link in its link graph and transfers SEO juice, even if it cannot access the content of the target page.

This mechanism explains why disavowing links pointing to pages blocked by robots.txt changes nothing. Disavowing links serves to neutralize toxic juice, but if these links point to pages you have deliberately excluded from crawling, Google still considers them in its popularity calculations. The only way to completely prevent the transmission of PageRank is to physically remove the link or obtain a nofollow/sponsored/ugc attribute on it.

What is the difference between robots.txt blocking and noindex?

Blocking with robots.txt says "do not crawl this page," while noindex says "do not display it in the results." These two directives act at different times in the indexing process and can conflict if not combined properly.

If you block a page with robots.txt AND add a noindex, Google will never be able to read the noindex directive since it cannot access the page. The result is that the page may still get indexed if it receives external links. The best practice is to temporarily allow crawling so Google reads the noindex, then monitor the deindexing before potentially blocking with robots.txt.

Robots.txt blocks crawling, but not discovery or minimal indexing
PageRank flows through links even to non-crawlable URLs
Noindex requires crawling to be read and applied by Google
Disavowing links to pages blocked by robots.txt remains ineffective
Minimal indexing displays "No information available" in SERPs if backlinks exist

SEO Expert opinion

Does this statement confirm what we observe in the field?

Yes, absolutely. SEOs have been noticing for years that URLs blocked by robots.txt appear in search results with the note "No information available." Mueller's statement simply formalizes a behavior that has already been documented. What still surprises some practitioners is the persistence of PageRank flow to these non-crawlable pages.

In migrations or redesigns where entire sections remain accidentally blocked, we indeed observe that these pages continue to capture SEO juice without being able to redistribute it effectively. The crawl budget shifts elsewhere, but Google's link graph still remembers these URLs as active nodes. Essentially, this creates bottlenecks in your internal link architecture.

What nuances should we consider regarding this statement?

Mueller is vague on one point: to what extent can the PageRank transmitted to a blocked page then be redistributed? If Google never crawls the page, it never sees its outgoing links, so theoretically this juice should stagnate. [To be checked]: does Google model some form of default redistribution, or does PageRank remain frozen?

Another nuance: the duration for which Google keeps a blocked URL in the index depends on the frequency of incoming links. A page with active backlinks will persist longer, while an orphan URL will gradually disappear. Thus, robots.txt does not guarantee deindexation, only degraded indexing if there are external signals.

When does this rule not fully apply?

If you block a URL with robots.txt AND it receives no external or internal links discoverable otherwise (sitemap, navigation), Google simply will have no means to detect it. No discovery means no indexing, even minimal. The issue only arises when backlinks or mentions in an XML sitemap reveal the existence of these pages.

Another edge case: pages blocked by robots.txt but present in an XML sitemap generate errors in Search Console ("Submitted URL blocked by robots.txt"). Google attempts indexing because you mark the URL as important, but the robots.txt blocks it. The result is algorithmic confusion and conflicting signals that can affect the site's overall crawl budget.

Attention: Blocking strategic pages receiving quality backlinks with robots.txt amounts to wasting PageRank. These pages will capture juice without being able to effectively redistribute it to your target pages, creating choke points in your SEO architecture.

Practical impact and recommendations

What should you do concretely if you really want to deindex a page?

First, remove that page from robots.txt if it is listed there. Temporarily allow crawling so Google can access the content. Then, add a meta robots noindex tag in the <head> or send an HTTP header X-Robots-Tag: noindex. Check in Search Console that Google correctly crawls the page and detects the directive.

Monitor the deindexing via a search site:yourdomain.com/target-page. Once the page disappears from the results (this may take several weeks), you may optionally block it again with robots.txt if you want to conserve crawl budget, but it is no longer necessary. The noindex is sufficient to keep the page out of the index.

What mistakes should you absolutely avoid with robots.txt and noindex?

Never block with robots.txt a page that contains a noindex. This is the most common configuration that leads to residual indexing. Google cannot read your noindex directive if robots.txt prevents it from accessing the page. The result is that the page remains indexed with a minimal description if it receives backlinks.

Avoid also blocking with robots.txt intermediate pages in your strategic internal linking. These pages serve as hubs to distribute PageRank to your target pages. If you block them, you break the flows of SEO juice and create algorithmic dead ends. Regularly audit your robots.txt to identify these accidental blocks that sabotage your architecture.

How to audit your site to detect these issues?

Use an SEO crawler (Screaming Frog, Oncrawl, Botify) set to ignore robots.txt and compare it with a crawl that respects the directives. The URLs present solely in the first crawl are blocked but potentially discoverable by Google through backlinks. Cross-reference this list with your link profiles (Ahrefs, Majestic, Semrush) to identify blocked pages receiving external juice.

In Search Console, check the Coverage report and filter for errors "Submitted URL blocked by robots.txt". These pages are often in your XML sitemap but forbidden from crawling, a classic conflicting signal. Clean up your sitemap to no longer submit these URLs, or remove them from robots.txt if they should be indexed.

Remove from robots.txt any page you really want to deindex
Add a noindex to these pages and check the crawl in Search Console
Never combine robots.txt blocking and noindex directive
Regularly audit blocked pages that receive backlinks
Clean XML sitemaps of URLs blocked by robots.txt
Monitor deindexing through targeted site: searches

Robots.txt neither protects against indexing nor receiving PageRank. To properly deindex, use noindex while temporarily allowing crawling. Regularly audit your configuration to avoid directive conflicts that waste SEO juice. These technical optimizations require sharp expertise and ongoing monitoring. If your crawl architecture presents friction points or PageRank bottlenecks, consulting a specialized SEO agency can help you map out these issues and implement a cohesive indexing strategy that aligns with your business objectives.

❓ Frequently Asked Questions

Peut-on désavouer efficacement des liens pointant vers des pages bloquées par robots.txt ?

Non. Google comptabilise ces liens dans son graphe et transfère du PageRank même si la page de destination n'est pas crawlable. Le désaveu n'aura aucun effet puisque Google traite ces liens en amont du crawl.

Une page bloquée par robots.txt mais avec un noindex sera-t-elle désindexée ?

Non, Google ne pourra jamais lire la directive noindex puisque le robots.txt l'empêche de crawler la page. La page risque de rester indexée avec une description minimale si elle reçoit des backlinks.

Le PageRank reçu par une page bloquée peut-il ensuite être redistribué vers d'autres pages ?

C'est flou. Google ne crawle pas la page donc ne voit pas ses liens sortants. Théoriquement le PageRank devrait stagner, mais Google pourrait modéliser une redistribution par défaut. Ce point nécessite des tests approfondis.

Faut-il supprimer complètement le fichier robots.txt pour éviter ces problèmes ?

Non. Le robots.txt reste utile pour gérer le crawl budget et bloquer des sections sans valeur SEO (admin, filtres, etc.). L'important est de ne jamais bloquer des pages stratégiques ou celles contenant un noindex.

Combien de temps faut-il pour qu'une page bloquée disparaisse complètement de l'index ?

Ça dépend de la fréquence et du volume des backlinks. Une page avec des liens actifs peut persister des mois voire des années. Sans signaux externes, la désindexation progressive prend généralement quelques semaines à quelques mois.

🎥 From the same video 9

Other SEO insights extracted from this same Google Search Central video · duration 1h02 · published on 07/03/2017

🎥 Watch the full video on YouTube →