Why does Google keep crawling 404 URLs that are years old?

Quick SEO Quiz

Test your SEO knowledge in 5 questions

Less than a minute. Find out how much you really know about Google search.

🕒 ~1 min 🎯 5 questions

Official statement

Google may continue attempting to crawl URLs that existed 7-8 years ago, even if they have returned 404 or 410 for a long time. These URLs are kept in a low-priority queue and are occasionally retried.

144:15

🎥 Source video

Extracted from a Google Search Central video

⏱ 985h14 💬 EN 📅 26/02/2021 ✂ 39 statements

Watch on YouTube (144:15) →

✂ Other statements from this video 38 ▾

📅

Official statement from February 26, 2021 (5 years ago)

⚠ A more recent statement exists on this topic Should you still worry about Googlebot desktop with a mobile-first index? Google · October 31, 2024 View statement →

TL;DR

Google remembers dead URLs for at least 7 to 8 years and occasionally retries them, even if they consistently return 404 or 410. These URLs end up in a low-priority queue and consume a tiny portion of the crawl budget. For an SEO practitioner, this means that URLs removed a long time ago can still appear in server logs and managing old redirects remains relevant over time.

What you need to understand

What is the actual lifespan of a URL in Google's memory?

John Mueller reveals that Google keeps track of crawled URLs for at least 7 to 8 years, even if they no longer exist. This duration significantly exceeds what most practitioners imagine. Specifically, a deleted page from 2016 can still receive sporadic crawl attempts.

These URLs join a low-priority queue where Google occasionally attempts to check if the content has returned. The search engine doesn’t abandon a URL at the first 404 — it marks it as inactive but doesn’t forget it completely. This persistence can be explained by the historical operation of the index: Google prefers to keep a record rather than delete it permanently.

How does this low-priority queue function?

The exact mechanism remains unclear, but field observations confirm that Google gradually spaces out its crawl attempts on URLs that consistently return 404 or 410. A URL may be tried once a week initially, then once a month, and subsequently every quarter.

This low-priority queue consumes only a marginal fraction of the total crawl budget. However, on a site with a heavy history (re-designs, multiple migrations, massive removals), the cumulative volume can become visible in logs. These crawl attempts do not directly penalize SEO but reveal Google’s long memory.

Why does Google maintain this persistence on dead URLs?

The search engine does not want to miss a content resurrection. If a historical URL with a good backlink profile comes back online, Google wants to detect it quickly. This logic applies especially to URLs that had visibility, inbound links, or significant traffic in the past.

Moreover, Google knows that some sites practice temporary downtime or poorly managed migrations where 404 URLs may return months later. Rather than erasing all traces, the engine prefers to keep a list of “watch” URLs. It’s an insurance policy against false negatives.

Google remembers URLs for 7-8 years minimum, even after permanent deletion
Dead URLs join a low-priority queue with spaced crawl attempts
This persistence aims to detect potential content resurrections, especially if the URL had weight
The crawl volume consumed remains marginal but can be visible on historically heavy sites
HTTP codes 410 (Gone) and 404 (Not Found) are treated similarly in the long term

SEO Expert opinion

Is this statement consistent with field observations?

Absolutely. Server logs from sites that have undergone multiple redesigns confirm that Googlebot regularly attempts to crawl URLs deleted for years. Hits are frequently observed on paths dating back to 2015-2017, with a low but consistent frequency. Mueller merely officially confirms what technical SEOs have seen in their logs for a long time.

However, the exact duration of 7-8 years remains a ballpark figure, not a strict rule. Some sites report attempts on even older URLs, while others see attempts stop after 3-4 years. The initial priority of the URL, its link profile, and its traffic history likely play a role in this retention duration. [To Verify]: No official data specifies the exact criteria for prioritization in this queue.

Should we treat 404s and 410s differently to expedite forgetting?

Let's be honest: the distinction between 404 (Not Found) and 410 (Gone) is theoretically clear, but in practice, Google treats them very similarly in the long run. The 410 is supposed to signal a permanent deletion, but Mueller clarifies that even these URLs remain in the low-priority queue.

Using a 410 can slightly speed up the initial deindexing, but it does not guarantee that Google stops its crawl attempts entirely. The difference is mainly in the first weeks after deletion. After that point, both codes converge towards the same treatment: retained in memory with spaced attempts. Don’t rely on the 410 as a magic erase button.

What are the hidden implications for managing crawl budget?

On a medium-sized site with a comfortable crawl budget, this persistence has no measurable impact. Googlebot dedicates most of its resources to active and fresh URLs. Attempts on old dead URLs represent a negligible portion, often less than 1% of the total crawl.

The problem emerges on massive sites with a history of multiple migrations or thousands of deleted URLs. If your crawl budget is already stretched (low crawl frequency, important pages updated slowly), every hit on a dead URL is a hit that isn’t going to active content. In these specific cases, monitoring logs and identifying old URLs still being crawled can help diagnose inefficiencies. But let’s be pragmatic: optimizing the current structure of the site will have 100 times more impact than trying to erase Google’s memory.

Practical impact and recommendations

What to do with old URLs that linger in the logs?

First step: identify the actual crawl volume consumed by these dead URLs. Parse your server logs (Screaming Frog Log Analyzer, Botify, OnCrawl, or a custom script) and filter Googlebot hits on URLs returning 404 or 410. If the volume is less than 2-3% of total crawl, ignore them — this isn’t where your SEO performance is at stake.

If the volume is significant (>5% of crawl), dig deeper. Do these URLs still have active backlinks? If yes, 301 redirect them to the most relevant page. If not, leave the 404 in place and focus on optimizing active content. Don’t waste time cleaning up URLs that only consume a marginal fraction of the budget.

Should you block these URLs in robots.txt to force forgetting?

No. Blocking 404 URLs in robots.txt is a classic mistake that worsens the situation. If Googlebot can no longer crawl the URL, it cannot confirm that it actually returns 404 — so it remembers it indefinitely, in a “blocked” status. You replace an occasional crawl with a permanent uncertainty.

The only exception concerns sensitive URLs that you absolutely want to disappear from the index. In this case, keep them accessible as 404/410 until Google fully deindexes them, then possibly block them. But for ordinary dead URLs, robots.txt adds no value. Let Google see the 404 and naturally space out its attempts.

How to manage migrations and redesigns to limit this effect over the long term?

During a redesign, properly map all old URLs to their equivalents via 301. Even if some pages no longer have a direct equivalent, redirect to the closest category or parent page. A well-thought-out 301 is always preferable to a 404, especially if the old URL had backlinks or traffic.

For truly obsolete URLs (discontinued products without replacements, closed sections), accept the 404. But document these choices: keep a list of URLs intentionally removed to later justify why they weren't redirected. This avoids nasty surprises when, three years later, someone asks why a frequently crawled URL returns 404.

Analyze your server logs to quantify the actual crawl on dead URLs (if < 3%, ignore)
Identify old URLs with active backlinks and redirect them in 301 to relevant content
Never block 404 URLs in robots.txt — this prevents Google from confirming their status
During a redesign, systematically map old URLs to their equivalents or parent pages
Document URLs intentionally left as 404 to justify these choices in the long run
Monitor logs after migration to detect any abnormal crawl patterns

Google's persistence on old URLs is normal behavior that does not directly impact your SEO unless your crawl budget is already tight. Focus on clean redirect management during migrations and let Google naturally space out its attempts on dead URLs. If your site has a complex history with multiple redesigns and you want to fine-tune crawl budget distribution, these analyses can get technical. In this case, partnering with an SEO agency specialized in crawling and architecture can help you prioritize truly impactful actions rather than wasting time on marginal optimizations.

❓ Frequently Asked Questions

Combien de temps Google garde-t-il une URL 404 en mémoire ?

Au minimum 7 à 8 ans selon John Mueller, parfois plus selon le profil initial de l'URL. Ces URLs rejoignent une file d'attente de faible priorité avec tentatives de crawl espacées progressivement.

Le code 410 accélère-t-il vraiment la suppression d'une URL de l'index de Google ?

Le 410 peut légèrement accélérer la désindexation initiale, mais Google continue à tenter de crawler l'URL pendant des années comme pour un 404. Sur le long terme, la différence est minime.

Ces tentatives de crawl sur anciennes URLs consomment-elles beaucoup de crawl budget ?

Non, elles représentent généralement moins de 1-3% du crawl total. Le problème ne devient visible que sur des sites massifs avec un historique de migrations multiples et un crawl budget déjà tendu.

Faut-il bloquer les URLs 404 dans le robots.txt pour forcer Google à les oublier ?

Jamais. Bloquer une URL 404 dans le robots.txt empêche Google de confirmer son statut, ce qui la maintient indéfiniment en mémoire. Laisse Googlebot constater le 404 et espacer naturellement ses tentatives.

Comment gérer les anciennes URLs qui reçoivent encore des backlinks actifs ?

Redirige-les en 301 vers la page la plus pertinente ou la catégorie parent. Une 301 bien pensée conserve une partie du jus de lien et améliore l'expérience utilisateur, tout en normalisant le crawl.

🏷 Related Topics

crawl budget URLs 404 code 410 Googlebot logs serveur redirections 301 migrations SEO indexation

Domain Age & History Crawl & Indexing AI & SEO Domain Name

🎥 From the same video 38

Other SEO insights extracted from this same Google Search Central video · duration 985h14 · published on 26/02/2021

🎥 Watch the full video on YouTube →

Related statements

« Previous

New sites require external quality signals...

Sitemaps help inform Google of page changes...

« Back to results