Official statement
Other statements from this video 9 ▾
- 2:08 Le Knowledge Graph fonctionne-t-il vraiment sans intervention manuelle de Google ?
- 4:38 Le duplicate content involontaire peut-il vraiment bloquer votre récupération Panda ?
- 14:44 Les pages utilitaires avec beaucoup de liens internes tuent-elles vraiment votre SEO ?
- 15:46 Les pages de faible qualité sabotent-elles vraiment l'autorité de tout votre site ?
- 47:00 La vitesse mobile affecte-t-elle vraiment le classement SEO ?
- 51:30 L'indexation mobile-first hérite-t-elle vraiment de tous les signaux desktop ?
- 56:40 La vitesse mobile va-t-elle enfin devenir un critère de classement Google ?
- 58:06 Le contenu sous onglets mobile est-il vraiment indexé par Google ?
- 59:10 La structure de site suffit-elle vraiment à sauver votre indexation mobile ?
Google confirms that a URL blocked by robots.txt can still receive PageRank through external links and can be indexed with minimal information. Blocking prevents crawling, but not discovery or the transmission of SEO juice. Therefore, disavowing links to pages in robots.txt is ineffective: Google still sees them and includes them in its link graph.
What you need to understand
Why can a URL blocked by robots.txt still be indexed?
The robots.txt file only prohibits crawling of a URL, not its discovery. When an external site links to a page you've blocked, Google identifies this URL but cannot access its content. As a result, the page could appear in search results with a minimal description like "No information available for this page due to robots.txt".
This situation poses a problem when someone believes they are protecting a page from indexing via robots.txt. Blocking does not prevent indexing; it simply prevents Google from reading the content. If you really don't want a page to be indexed, the noindex directive in meta tags or HTTP headers remains the only reliable method. But beware: for Google to read this noindex, it must be able to crawl the page, so it should not be blocked by robots.txt.
How does PageRank flow to a non-crawlable URL?
PageRank flows through links, whether or not Google can explore the destination page. When a site links to your blocked URL, Google registers this link in its link graph and transfers SEO juice, even if it cannot access the content of the target page.
This mechanism explains why disavowing links pointing to pages blocked by robots.txt changes nothing. Disavowing links serves to neutralize toxic juice, but if these links point to pages you have deliberately excluded from crawling, Google still considers them in its popularity calculations. The only way to completely prevent the transmission of PageRank is to physically remove the link or obtain a nofollow/sponsored/ugc attribute on it.
What is the difference between robots.txt blocking and noindex?
Blocking with robots.txt says "do not crawl this page," while noindex says "do not display it in the results." These two directives act at different times in the indexing process and can conflict if not combined properly.
If you block a page with robots.txt AND add a noindex, Google will never be able to read the noindex directive since it cannot access the page. The result is that the page may still get indexed if it receives external links. The best practice is to temporarily allow crawling so Google reads the noindex, then monitor the deindexing before potentially blocking with robots.txt.
- Robots.txt blocks crawling, but not discovery or minimal indexing
- PageRank flows through links even to non-crawlable URLs
- Noindex requires crawling to be read and applied by Google
- Disavowing links to pages blocked by robots.txt remains ineffective
- Minimal indexing displays "No information available" in SERPs if backlinks exist
SEO Expert opinion
Does this statement confirm what we observe in the field?
Yes, absolutely. SEOs have been noticing for years that URLs blocked by robots.txt appear in search results with the note "No information available." Mueller's statement simply formalizes a behavior that has already been documented. What still surprises some practitioners is the persistence of PageRank flow to these non-crawlable pages.
In migrations or redesigns where entire sections remain accidentally blocked, we indeed observe that these pages continue to capture SEO juice without being able to redistribute it effectively. The crawl budget shifts elsewhere, but Google's link graph still remembers these URLs as active nodes. Essentially, this creates bottlenecks in your internal link architecture.
What nuances should we consider regarding this statement?
Mueller is vague on one point: to what extent can the PageRank transmitted to a blocked page then be redistributed? If Google never crawls the page, it never sees its outgoing links, so theoretically this juice should stagnate. [To be checked]: does Google model some form of default redistribution, or does PageRank remain frozen?
Another nuance: the duration for which Google keeps a blocked URL in the index depends on the frequency of incoming links. A page with active backlinks will persist longer, while an orphan URL will gradually disappear. Thus, robots.txt does not guarantee deindexation, only degraded indexing if there are external signals.
When does this rule not fully apply?
If you block a URL with robots.txt AND it receives no external or internal links discoverable otherwise (sitemap, navigation), Google simply will have no means to detect it. No discovery means no indexing, even minimal. The issue only arises when backlinks or mentions in an XML sitemap reveal the existence of these pages.
Another edge case: pages blocked by robots.txt but present in an XML sitemap generate errors in Search Console ("Submitted URL blocked by robots.txt"). Google attempts indexing because you mark the URL as important, but the robots.txt blocks it. The result is algorithmic confusion and conflicting signals that can affect the site's overall crawl budget.
Practical impact and recommendations
What should you do concretely if you really want to deindex a page?
First, remove that page from robots.txt if it is listed there. Temporarily allow crawling so Google can access the content. Then, add a meta robots noindex tag in the <head> or send an HTTP header X-Robots-Tag: noindex. Check in Search Console that Google correctly crawls the page and detects the directive.
Monitor the deindexing via a search site:yourdomain.com/target-page. Once the page disappears from the results (this may take several weeks), you may optionally block it again with robots.txt if you want to conserve crawl budget, but it is no longer necessary. The noindex is sufficient to keep the page out of the index.
What mistakes should you absolutely avoid with robots.txt and noindex?
Never block with robots.txt a page that contains a noindex. This is the most common configuration that leads to residual indexing. Google cannot read your noindex directive if robots.txt prevents it from accessing the page. The result is that the page remains indexed with a minimal description if it receives backlinks.
Avoid also blocking with robots.txt intermediate pages in your strategic internal linking. These pages serve as hubs to distribute PageRank to your target pages. If you block them, you break the flows of SEO juice and create algorithmic dead ends. Regularly audit your robots.txt to identify these accidental blocks that sabotage your architecture.
How to audit your site to detect these issues?
Use an SEO crawler (Screaming Frog, Oncrawl, Botify) set to ignore robots.txt and compare it with a crawl that respects the directives. The URLs present solely in the first crawl are blocked but potentially discoverable by Google through backlinks. Cross-reference this list with your link profiles (Ahrefs, Majestic, Semrush) to identify blocked pages receiving external juice.
In Search Console, check the Coverage report and filter for errors "Submitted URL blocked by robots.txt". These pages are often in your XML sitemap but forbidden from crawling, a classic conflicting signal. Clean up your sitemap to no longer submit these URLs, or remove them from robots.txt if they should be indexed.
- Remove from robots.txt any page you really want to deindex
- Add a noindex to these pages and check the crawl in Search Console
- Never combine robots.txt blocking and noindex directive
- Regularly audit blocked pages that receive backlinks
- Clean XML sitemaps of URLs blocked by robots.txt
- Monitor deindexing through targeted site: searches
❓ Frequently Asked Questions
Peut-on désavouer efficacement des liens pointant vers des pages bloquées par robots.txt ?
Une page bloquée par robots.txt mais avec un noindex sera-t-elle désindexée ?
Le PageRank reçu par une page bloquée peut-il ensuite être redistribué vers d'autres pages ?
Faut-il supprimer complètement le fichier robots.txt pour éviter ces problèmes ?
Combien de temps faut-il pour qu'une page bloquée disparaisse complètement de l'index ?
🎥 From the same video 9
Other SEO insights extracted from this same Google Search Central video · duration 1h02 · published on 07/03/2017
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.