Why does Google keep crawling outdated 404 URLs on your site?

Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

It is normal for Google to check old URLs that return 404 from time to time, even after years. This is not a sign of a problem, just the systems ensuring nothing is missed. Old sites naturally accumulate more 404s over time. No need to worry.

51:24

🎥 Source video

Extracted from a Google Search Central video

⏱ 1h01 💬 EN 📅 05/02/2021 ✂ 48 statements

Watch on YouTube (51:24) →

✂ Other statements from this video 47 ▾

📅

Official statement from February 5, 2021 (5 years ago)

⚠ A more recent statement exists on this topic Could removing obsolete URLs from your sitemaps actually boost your SEO? Google · February 9, 2023 View statement →

TL;DR

Google regularly crawls old 404 URLs even years after they've been removed. This behavior is intentional: Google's systems periodically check to see if these pages have been restored or redirected. For SEO, this means that these requests in logs are normal and do not require corrective action — unless they create abnormal server load.

What you need to understand

Does Google really crawl dead URLs for years?

Yes, and this is a documented and accepted behavior by John Mueller. Google's bots sporadically revisit URLs that have returned a 404, even if they have been gone for a long time.

The logic is simple: Google cannot know if a dead page today might come back tomorrow. A deleted URL could be restored, redirected to a new resource, or become active again due to a redesign. Therefore, the crawling systems include periodic checks — let’s say once a quarter, or even annually for very old URLs — to detect any status changes.

Does this behavior unnecessarily consume crawl budget?

Not really, or at least not significantly for most sites. Google adjusts the crawl frequency of 404s based on their age and how frequently the site creates new URLs.

An old editorial content site — let’s say a media outlet that has been publishing for 15 years — naturally accumulates thousands of 404s: deleted pages, moved content, abandoned categories. Google continues to ping them, but at a reduced frequency that does not impact the crawling of active pages. This is not a problem unless your server is undersized or misconfigured.

Should you actively clean up 404s in Search Console?

No, and that’s exactly what Mueller points out: no need to worry or correct these errors. Search Console shows detected 404s, but Google does not consider them critical errors.

That said — an important nuance — if a 404 URL is still receiving active backlinks or referral traffic, then it's a missed opportunity. In this specific case, a 301 redirect to an equivalent page or the homepage makes sense. But for an old URL with no traffic or incoming links, forget about it.

Crawling old 404s is normal and does not indicate a malfunction.
Google adjusts the check frequency based on the age of the URL and the site's history.
No need to clean up Search Console: these errors do not impact the ranking of active pages.
Redirecting a 404 only makes sense if it still receives traffic, incoming links, or external mentions.
Old sites naturally accumulate more 404s — it's inevitable and Google knows it.

SEO Expert opinion

Is this statement consistent with real-world observations?

Absolutely. Server logs confirm that Googlebot sporadically revisits dead URLs, often without a predictable pattern. We see 404s being crawled once every 3-6 months, sometimes with unexplained spikes.

But there’s a detail that Mueller doesn’t clarify: the recrawl frequency of 404s also depends on the internal structure of the site. If a dead URL remains present in the XML sitemap or is linked from active pages, Google will crawl it more often. So if you see 404s being crawled weekly, first check your internal linking and your sitemap — that’s often where the issue lies.

When does this behavior become problematic?

When the volume of crawled 404s exceeds your server's capacity, or when it cannibalizes the crawl budget of active pages. On a site with several million indexed URLs, a poorly distributed crawl budget can delay the discovery of new content.

In practical terms? If your server shows load spikes due to Googlebot requests on 404s, that's a warning sign. The solution is not to remove 404s — it's to optimize the server response (cache, CDN, Apache/Nginx configuration) so that these requests don't take a toll. A well-configured 404 should be served in under 50 ms.

Does Google reveal everything about the check frequency?

[To be verified] Mueller remains vague about the exact criteria that trigger a recrawl of 404s. He talks about “periodic checks,” but provides neither frequency nor threshold.

According to real-world observations, several factors seem to play a role: the age of the URL, the presence of historical backlinks (even if they no longer point anywhere), the frequency of site updates, and likely an algorithm of “rediscovery” based on the past behavior of the URL. But this is inference — Google does not disclose the exact algorithm, and that's normal.

Attention: If you notice a sudden massive crawl of 404s, check to ensure it's not an internal linking issue or a misconfigured XML sitemap. Abnormal crawling of 404s is often a symptom of a structural problem, not an arbitrary decision by Google.

Practical impact and recommendations

What should you do concretely with these crawled 404s?

Nothing, in most cases. If a URL has been dead for years, receives no traffic, has no active backlinks, and is not listed in your sitemap, leave it alone. Google will crawl it, see the 404, and move on.

However — and this is where many go wrong — check the 404s that regularly appear in your logs. If a URL is crawled every week, it is still referenced somewhere: sitemap, internal linking, or external link. In this case, take action: redirect or remove the internal reference.

What mistakes should you absolutely avoid?

Do not redirect all your 404s en masse to the homepage. This is a practice hated by Google and can be interpreted as a soft 404, especially if the destination page has no thematic relation to the original URL.

Another classic mistake: blocking 404s in robots.txt. This does nothing and prevents Google from realizing that the page no longer exists. The result: the URL remains indexable in memory, and Google will continue to attempt to crawl it indefinitely. Let Google see the 404; it's the only clean way to signal the death of a page.

How to check if your site handles 404s correctly?

Analyze your server logs with a tool like Oncrawl, Screaming Frog Log Analyzer, or a custom Python script. Identify the 404 URLs crawled more than 5 times a month — those are the ones that deserve your attention.

Then, cross-reference these URLs with your XML sitemap and your internal linking. If a 404 is present in the sitemap, remove it immediately. If it's linked from an active page, correct the link or redirect to an equivalent resource. Finally, check your server's performance: a 404 should be served quickly, without unnecessary database requests.

Audit server logs to identify recurring crawled 404s.
Verify that 404s are not included in the XML sitemap.
Correct any internal links pointing to a 404 page.
Optimize the server response for 404s (cache, response time < 50 ms).
Only redirect 404s that are still receiving traffic or active backlinks.
Never redirect all 404s en masse to the homepage.

Crawled 404s by Google are normal and do not require action in most cases. Focus on dead URLs that are still linked or present in the sitemap, and ensure that your server handles 404s efficiently. For high-volume sites or complex architectures, a thorough technical audit may reveal crawl budget inefficiencies — in this context, working with a specialized SEO agency can help identify and correct these structural issues with appropriate tools and methodologies.

❓ Frequently Asked Questions

Combien de temps Google continue-t-il de crawler une URL en 404 ?

Il n'y a pas de durée fixe. Google peut continuer de vérifier une URL morte pendant des années, avec une fréquence décroissante. Une URL sans backlinks ni référence interne sera crawlée de moins en moins souvent.

Faut-il supprimer les 404 affichées dans la Search Console ?

Non, ce n'est pas nécessaire. Google ne considère pas les 404 comme des erreurs critiques. La présence de 404 dans la Search Console est normale, surtout pour un site ancien.

Les 404 impactent-elles le SEO des pages actives ?

Non, pas directement. Une URL en 404 n'affecte pas le ranking des autres pages. Par contre, si elle consomme trop de crawl budget, cela peut ralentir l'indexation de nouveaux contenus.

Dois-je rediriger toutes mes anciennes 404 ?

Seulement celles qui reçoivent encore du trafic, des backlinks actifs ou des mentions externes. Rediriger des 404 mortes sans raison peut créer des chaînes de redirections inutiles.

Comment savoir si mes 404 consomment trop de crawl budget ?

Analyse tes logs serveur. Si les requêtes Googlebot sur des 404 représentent plus de 20-30% du crawl total, et que tes nouvelles pages mettent du temps à être indexées, c'est un signal d'alerte.

🏷 Related Topics

crawl budget erreurs 404 indexation Googlebot logs serveur redirections sitemap XML maillage interne

Domain Age & History Domain Name

🎥 From the same video 47

Other SEO insights extracted from this same Google Search Central video · duration 1h01 · published on 05/02/2021

🎥 Watch the full video on YouTube →

Related statements

« Previous

Core Web Vitals: the displayed version in the resu...

Core Web Vitals: AMP and non-AMP versions tracked ...

« Back to results