Official statement
Other statements from this video 25 ▾
- 1:41 Should you really use cross-domain canonicals to consolidate multiple thematic sites?
- 2:00 Do 302 redirects really pass PageRank like 301 redirects?
- 2:00 Does the canonical tag really transfer 100% of PageRank without any loss?
- 14:00 Should you really avoid putting all your outbound links in nofollow?
- 14:10 Should you really avoid setting all your outbound links to nofollow?
- 16:16 Is the URL Parameters Tool in Search Console a zombie or still useful for your SEO?
- 16:36 Does Google's URL Parameters tool still work even when its interface is broken?
- 22:03 Are Core Web Vitals really the only speed criterion that counts for ranking?
- 23:03 Core Web Vitals: Why does Google ignore other performance metrics for Page Experience?
- 25:15 Do PageSpeed tests really mislead you about your Core Web Vitals?
- 26:50 Is alt text truly crucial for your visibility in Google Images?
- 26:50 Does alternative text for images really enhance SEO?
- 28:26 Do 302 redirects really pass as much PageRank as 301s?
- 30:17 Should you really hide cookie consent banners from Googlebot?
- 30:57 Should you really block cookie banners for Googlebot?
- 34:46 Why does Google still display old content in your meta descriptions?
- 34:46 Why does Google sometimes show your old meta descriptions in the SERPs?
- 36:57 Should you really show cookie banners to Googlebot?
- 37:56 Do 302 redirects really turn into 301s over time?
- 40:01 Should you really return a 404 for products that are permanently unavailable?
- 40:01 Should you return a 404 or a 200 on a product page that's out of stock?
- 43:37 Should you sync visible and technical dates to enhance your crawl?
- 43:38 Should you really differentiate between the visible date and the structured data date?
- 46:46 Why does Google still crawl your deleted old URLs?
- 47:09 Why does Google keep crawling your old 404 URLs?
Google cannot see the noindex tag if you block the URL in robots.txt — as a result, the page remains indexed despite your directive. This classic configuration error creates a technical conflict: crawling is denied before Googlebot can even read the HTML. The solution lies in the URL parameters tool to control crawling without compromising de-indexing.
What you need to understand
What is the technical conflict between robots.txt and noindex?
The robots.txt acts like a door lock: it prevents Googlebot from entering a URL. If you block access, the bot never downloads the HTML of the page.
However, the noindex directive is a tag located in the <head> of the HTML — or in the HTTP header. To read it, Google must first crawl the page. Blocking in robots.txt is like locking the door before the bot can read the "do not index" sign inside.
What happens concretely if we combine both directives?
Googlebot encounters the robots.txt block, stops crawling, and registers the URL in the index with the note "Blocked by robots.txt". The page appears in the results without a snippet or title — just the naked URL.
Worse: if the URL was already indexed before blocking, it may remain indefinitely. Google will not come back to check the noindex tag since you are denying access. The status stays frozen.
What alternative does Google propose to reduce crawling?
The URL Parameters Tool in Search Console allows you to inform Google that a parameter does not generate unique content. Example: ?sessionID=, ?utm_source=, ?color=.
You indicate that these variations do not need to be crawled intensively. Google adjusts its behavior without blocking access — the noindex directive remains readable if it exists. It’s a fine-tuning of crawl budget, not a brute lock.
- Robots.txt blocks access before reading the HTML — the noindex tag becomes invisible
- Noindex alone allows crawling but prohibits indexing — this is the correct configuration
- URL Parameters reduce crawling of variants without preventing reading of directives
- A page blocked by robots.txt may appear in the index with the naked URL, without a snippet
- If an indexed URL is then blocked by robots.txt, it may remain indefinitely without an update
SEO Expert opinion
Is this statement consistent with real-world observations?
Yes, and it’s a classic of SEO audits. Regularly, sites layer Disallow and noindex on the same URLs — often out of overzealousness. The webmaster wants to "be sure" that the page is not indexed, so they pile on the directives.
The problem is that Google Search Console keeps reporting URLs "Blocked by robots.txt" in the coverage report. These pages sometimes appear in the SERP as naked URLs — a classic symptom of this conflict. The crawl logs confirm: Googlebot tries to access, receives a virtual 403 or 404 from robots.txt, and gives up without reading the HTML.
Is the URL Parameters Tool really the optimal solution?
It’s a recommendation from Google, but [To be checked] how actively this tool is maintained. Google has already deprecated several Search Console tools without warning — and the URL parameters tool has not evolved in years.
In practice, canonical tags and clean sitemaps often do a better job. If you have 50,000 pagination or facet URLs, a consistent canonicalization strategy is better than a hack in a Google tool that could vanish overnight. The tool remains useful for edge cases — sessions, tracking, minor variants — but it’s not a magic wand.
In what cases does this rule not apply?
If you want to permanently remove a URL from the index AND prevent crawling, the correct sequence is: (1) keep the page accessible with noindex while Google removes it, (2) monitor Search Console until complete disappearance, (3) block in robots.txt afterwards if necessary.
Another case: sensitive files (personal data PDFs, admin, etc.). Here, robots.txt is not enough — an X-Robots-Tag: noindex in the HTTP header + server authentication is required. Relying solely on robots.txt to protect sensitive content is a mistake: the URL can be discovered through other means (backlinks, shares) and appear in the index without the content being crawled.
Practical impact and recommendations
What should you do if you combine robots.txt and noindex?
First, identify the affected URLs. In Google Search Console, go to the Coverage section, look for pages "Blocked by robots.txt". Export the list. Cross-reference it with your sitemap or CMS to spot those that also have a noindex.
Next, remove the robots.txt block for these URLs. Leave the noindex in place. Submit the URLs via the Search Console URL inspection tool to force a re-crawl. Monitor the coverage report: the pages should move from "Blocked" to "Excluded (noindex)" within 2 to 4 weeks depending on crawl frequency.
What mistakes should you avoid during the fix?
Do not remove the robots.txt all at once if you have thousands of rules. Proceed in segments: identify patterns (e.g., /admin/*, /?sessionid=*) and test with a sample before global deployment.
Another pitfall: removing the noindex too soon. If you lift the robots.txt block AND remove the noindex simultaneously, Google will index pages you wanted to exclude. Keep the noindex active, lift robots.txt, wait for complete de-indexing, then decide if you want to allow indexing or maintain the noindex.
How can you verify that your configuration is correct?
Use the robots.txt testing tool in Search Console: paste a URL, check that it is not blocked. Then inspect the URL with the inspection tool: Google should be able to crawl the page and detect the noindex tag in the "Coverage" tab.
On the server logs side, filter Googlebot requests: if you see 200 OK with Googlebot user-agent but the URL remains "Blocked" in Search Console, it means there is a cache delay or a dynamic robots.txt rule causing the issue. Compare robots.txt locally vs. what Googlebot sees (using tools like Screaming Frog + rendering).
- Export the "Blocked by robots.txt" URLs from Search Console and cross-reference with noindex pages
- Remove the robots.txt block for noindex URLs, without touching the noindex itself
- Submit a sample of URLs via the inspection tool to quickly force a re-crawl
- Monitor the coverage report: expected transition from "Blocked" to "Excluded (noindex)" within 2-4 weeks
- Never combine Disallow and noindex on the same URLs — choose one or the other based on the goal
- Test with the robots.txt tool in Search Console + URL inspection to validate the configuration
❓ Frequently Asked Questions
Peut-on bloquer une URL en robots.txt si elle contient déjà un noindex ?
Que se passe-t-il si je bloque une page déjà indexée dans robots.txt ?
L'outil de paramètres d'URL remplace-t-il le robots.txt pour gérer le crawl budget ?
Comment forcer Google à retirer une URL bloquée par robots.txt de l'index ?
Le noindex en HTTP header fonctionne-t-il si la page est bloquée par robots.txt ?
🎥 From the same video 25
Other SEO insights extracted from this same Google Search Central video · duration 53 min · published on 29/10/2020
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.