Why does Googlebot persist in crawling your deleted pages with 410 status?

Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

A site publisher noticed a drop in visibility on Google after Googlebot made millions of requests to non-existent pages, at a level close to a DDoS attack. One example: over 2.4 million requests targeted a single URL with a ?feature parameter. Although these pages return a 410 HTTP status code (page permanently deleted), Googlebot continues to crawl them to check if they come back. The issue stems from accidental exposure of these URLs via a JSON payload generated by Next.js. The publisher is considering using robots.txt to block this behavior, but this can affect the rendering of the most important pages if misconfigured. John Mueller instead suggests analyzing the causes of traffic loss more deeply rather than stopping at a superficial explanation.

Source : Search Engine Journal

📅

Official statement from June 17, 2025 (10 months ago)

⚠ A more recent statement exists on this topic Do 404 Errors Really Hurt Your Website's Rankings? John Mueller · January 6, 2026 View statement →

What you need to understand

This statement reveals a little-known behavior of Googlebot: pages returning a 410 HTTP status code (Gone) are not permanently ignored by the crawler. Contrary to the common belief that a 410 signals to Google that a page is deleted forever, the bot continues to periodically check these URLs to ensure they don't come back.

The case described illustrates an extreme situation: 2.4 million requests on a single URL, causing a crawl volume close to a DDoS attack. The origin of the problem was accidental exposure of parameterized URLs via a JSON payload automatically generated by Next.js. These URLs, although inaccessible and marked 410, were discovered and indexed by Google, which then attempted to re-crawl them massively.

The instinctive reaction - blocking via robots.txt - carries major risks. Incorrect robots.txt configuration can prevent proper rendering of important pages, particularly in modern JavaScript architectures where certain resources are critical for display.

The 410 status code does not permanently prevent crawling, contrary to expectations
Accidental exposure of URLs (JSON, sitemaps, internal links) can trigger massive crawling
Crawl budget can be wasted on useless pages even with 410 status
Blocking via robots.txt requires thorough analysis of dependencies
The correlation between excessive crawling and visibility loss is not automatic

SEO Expert opinion

Mueller's analysis is consistent with what I've observed for years: Google maintains periodic verification even on content marked as permanently deleted. This makes sense from an algorithmic perspective - sites can make mistakes, have configuration errors, or reactivate content. The search engine prefers to check rather than permanently lose track of potentially relevant content.

However, Mueller's point needs to be nuanced on one crucial aspect: he invites us not to stop at a superficial explanation. This is fundamental. In 90% of the cases I've analyzed where a publisher attributes a traffic drop to Googlebot's behavior, the real cause lies elsewhere: loss of links, content cannibalization, algorithmic update, quality issues. Excessive crawling is often a symptom, not the disease.

Warning: In modern JavaScript architectures (React, Vue, Next.js, Nuxt), blocking URL patterns via robots.txt can break server-side rendering and client-side hydration. Frameworks often expose JSON endpoints for routing or data, and blocking them can make your pages incomprehensible to Googlebot. Always test with the URL Inspection tool before implementing a block.

Practical impact and recommendations

Summary: Don't panic in the face of excessive crawling on 410s. First audit the source of exposure for these URLs, eliminate internal links and JSON/sitemap files concerned, then monitor the natural evolution of crawling over 2-3 weeks before taking any drastic action.

Audit the origin of exposure: Search your JSON payloads, built JavaScript files, XML/HTML sitemaps, and internal links for references to these problematic URLs
Clean up exposure sources: Configure Next.js/Nuxt to exclude these patterns from builds, remove URLs from sitemaps, eliminate orphaned internal links
Do NOT immediately block via robots.txt: First test the impact on rendering with the URL Inspection tool in Search Console for each pattern you're considering blocking
Use the crawl-delay parameter: In Search Console, adjust crawl speed if the volume poses a real infrastructure problem (availability varies by account)
Analyze the real cause of traffic loss: Compare in Google Analytics the pages that lost traffic vs those being over-crawled. Look for temporal correlations with algorithmic updates (Core Updates, Helpful Content)
Post-cleanup monitoring: After removing exposure sources, crawling should naturally decrease over 15-30 days. If not, then consider targeted blocking
Parameterized URL pattern: If your 410s come from parameters (?feature=, ?id=), configure parameter handling in Search Console to tell Google which ones to ignore
Implement noindex before 410: For future mass deletions, first go through a noindex phase (a few weeks) before switching to 410, this reduces Google's interest in these pages

Related statements

« Previous

To identify off-topic content penalized by Google....

« Back to results