Official statement
Other statements from this video 8 ▾
- 2:07 Les grands sites peuvent-ils se classer malgré des pages médiocres ?
- 7:31 Faut-il vraiment signaler la validation médicale de vos contenus santé en données structurées ?
- 9:02 L'équivalence AMP/mobile impacte-t-elle réellement le classement Google ?
- 11:07 Faut-il vraiment inclure un GTIN dans vos données structurées produit ?
- 14:30 Les images de stock plombent-elles vraiment votre référencement Google Images ?
- 17:38 Pourquoi votre site n'est-il toujours pas passé en indexation mobile-first ?
- 20:20 Comment Google gère-t-il vraiment le contenu dupliqué dans les résultats de recherche ?
- 36:10 L'indexation JavaScript à deux vagues est-elle vraiment en train de disparaître ?
Google states that a URL blocked by robots.txt cannot be crawled, thus the noindex tag remains invisible to the engine. The URL Removal Tool does not affect crawling or indexing; it is merely a temporary cache. Essentially, you must choose: either block access (robots.txt) or allow crawling for Google to read your indexing directives (noindex). The two approaches are incompatible.
What you need to understand
What's the difference between blocking with robots.txt and deindexing with noindex?
The robots.txt file acts as a barrier upstream: it outright denies a crawler access to a URL. Googlebot stops before even downloading the HTML content. The result: no analysis of the page, no reading of meta tags, no detection of noindex or canonical directives.
The noindex tag, on the other hand, requires the bot to access the page, download the HTML, and parse the header or the body of the document. It is a post-crawl instruction: "Okay, you can read this page, but don't index it." If you block the URL upstream, Google will never see this instruction. The page may remain indexed — orphaned, stagnant, with a snippet saying "A description for this result is not available due to the robots.txt file of this site."
Why doesn't the URL Removal Tool change anything about indexing?
The URL Removal Tool in Search Console is a temporary cache lasting 90 days. It hides a URL from search results without altering the crawling process or structural indexing. It's an emergency band-aid, not a long-term solution.
Google will continue to crawl the URL according to its usual schedule unless a robots.txt or noindex clearly instructs it otherwise. The tool does not change either the crawl frequency or the actual indexing status. Once the 90 days are up, if nothing has changed on the server side, the page reappears in the SERPs.
What happens if I use robots.txt AND noindex simultaneously?
You create a technical conflict. The robots.txt blocks access, so Google never reads the noindex. The page remains potentially indexed with a degraded snippet. This is a common scenario on poorly configured sites: old staging URLs blocked by robots.txt but with a noindex in the invisible HTML.
Google always prioritizes robots.txt first. If the file says "Disallow," the crawler won't go any further. The noindex becomes moot. To properly deindex, you must allow crawling (remove the Disallow line) and let the bot discover the noindex directive over a few crawl cycles.
- The robots.txt blocks crawl access, preventing any HTML reading.
- The noindex requires a crawl to be detected and applied.
- The removal tool is temporary (90 days) and affects neither crawl nor structural indexing.
- Combining robots.txt + noindex creates a technical conflict where the noindex remains invisible.
- To properly deindex: allow crawl, let Google read the noindex, then block if necessary.
SEO Expert opinion
Does this statement reflect real-world observations?
Yes, and it's even a classic in SEO audits. We regularly find sites with thousands of URLs blocked by robots.txt but still indexed, displaying the infamous snippet “Description not available.” Google can't read the noindex, so it keeps the URL indexed by default — especially if there are backlinks pointing to it.
The problem is complicated by migrations or redesigns. A URL blocked by robots.txt for months, then unblocked, can take several weeks to be recrawled if the crawl budget is tight. In the meantime, it stays indexed with outdated or empty content. The result: index pollution, potential cannibalization, crawl dilution.
When is the robots.txt justifiable for blocking indexing?
Rarely. The robots.txt is mainly used to save crawl budget: infinite facets, dynamic URL parameters, admin areas, unnecessary resources. But to deindex an indexable page (legitimate content you simply don’t want in the SERPs), a noindex is cleaner.
The only case where robots.txt + indexing is tolerable: PDFs or downloadable files that you want to keep indexed for visibility but do not want Google to crawl their internal content. It is important that external links contribute to the URL’s authority. [To be verified] depending on the type of site: some sectors (legal, medical) prefer to block the crawl of sensitive documents entirely, even if it sacrifices SEO.
What should I do if a page blocked by robots.txt remains indexed?
First step: remove the Disallow directive from the robots.txt for that URL or directory. Next, add a clean noindex (meta tag or HTTP header X-Robots-Tag). Wait for Googlebot to recrawl — this can take anywhere from a few days to several weeks depending on the crawl budget.
In parallel, use the URL Removal Tool to accelerate the visual removal from SERPs, but never rely solely on it. Check in Search Console that the status correctly changes to “Excluded by the noindex tag.” If nothing changes after 4-6 weeks, force a recrawl using “URL Inspection” or submit a sitemap XML containing the URL (counterintuitive, but it works for triggering a quick crawl).
Practical impact and recommendations
How to audit robots.txt / noindex conflicts on a site?
Export the list of URLs blocked by robots.txt from your file (or via Screaming Frog in “List” mode). Cross-reference this list with the indexed URLs in Google: use the query site:example.com and then manually filter, or use a Search Console export “Coverage > Excluded” + a Screaming Frog crawl with “Respect robots.txt” turned off.
Look for URLs that appear in both “Blocked by robots.txt” AND “Indexed”. These are your critical conflicts. Check if a noindex is present in the HTML or HTTP headers: if so, it is invisible to Google. Decide for each URL: proper deindexing (remove robots.txt, keep noindex) or permanent blocking (keep robots.txt, accept residual indexing).
Which method to choose based on content type?
For sensitive or private content: password-protected, paywall, or server-side blocking (401/403). Never rely solely on robots.txt or noindex — a link leak can index the page. For duplicate or low-value content: noindex + canonical if relevant, never robots.txt (Google needs to read your directives).
For technical resources (CSS, JS, images): block NOTHING with robots.txt since 2015 — Google needs these resources for rendering and Core Web Vitals. For facets or URL parameters: use robots.txt if crawl budget is tight + canonical on the main version. For obsolete or archived pages: 301 or 410 depending on the case, never just robots.txt.
What critical errors should be absolutely avoided?
Never block with robots.txt a URL you want to properly deindex. It's the guarantee of a zombie indexing. Never use the URL Removal Tool as a long-term solution: it’s a 90-day cache, not an indexing directive.
Avoid blocking /wp-admin/ or /wp-includes/ if you're using WordPress: some plugins inject critical CSS/JS from these directories, and blocking could degrade mobile rendering. Finally, never remove a robots.txt line without checking the crawl impact: unlocking 50,000 facet URLs at once can saturate your server and dilute the budget on strategic pages.
- Export URLs blocked by robots.txt and cross-reference with indexed URLs (Search Console + crawl)
- To deindex: remove robots.txt, add noindex, wait for recrawl, check status “Excluded by noindex”
- To block crawl without deindexing: accept residual indexing or use a real server restriction (401/403)
- Never block critical CSS/JS/images — Google needs them for rendering and Core Web Vitals
- Use the URL Removal Tool only as an urgent temporary measure, never as a long-term strategy
- Test all robots.txt changes on a sample before global deployment (crawl budget risk)
❓ Frequently Asked Questions
Peut-on utiliser robots.txt pour désindexer une page rapidement ?
L'outil de suppression d'URL remplace-t-il le noindex ?
Que faire si une URL bloquée par robots.txt est toujours indexée après plusieurs mois ?
Bloquer des ressources CSS/JS par robots.txt impacte-t-il le SEO ?
Peut-on combiner robots.txt et canonical sur la même URL ?
🎥 From the same video 8
Other SEO insights extracted from this same Google Search Central video · duration 43 min · published on 23/08/2019
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.