Why does Google keep rechecking your old 404 URLs for years?

Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

Google keeps old 404 URLs in its systems and periodically rechecks them (sometimes once a year) to ensure they still return 404. This is not a problem. On older sites, the number of 404 URLs naturally increases over the years. This is a normal behavior.

51:54

🎥 Source video

Extracted from a Google Search Central video

⏱ 1h01 💬 EN 📅 05/02/2021 ✂ 48 statements

Watch on YouTube (51:54) →

✂ Other statements from this video 47 ▾

📅

Official statement from February 5, 2021 (5 years ago)

⚠ A more recent statement exists on this topic Does Google Really Keep Testing Your Old 301 Redirects for Years? John Mueller · August 23, 2021 View statement →

TL;DR

Google retains all URLs that have returned a 404, even years after their discovery, and periodically rechecks them (sometimes once a year). This behavior is normal and does not penalize your site. For SEO, an increasing number of 404 URLs in Search Console is not alarming for an older site, but it's essential to distinguish these historical errors from recent 404s that may indicate real linking or migration issues.

What you need to understand

Why does Google remember URLs that no longer exist?

The search engine operates by data accumulation. Every discovered URL — whether through crawling, sitemap, or backlink — is recorded in Google's index. Even if this URL returns a 404 code, it is not immediately removed from the systems.

Google adopts a periodic verification strategy. The engine recrawls these URLs at irregular intervals to ensure they have not been restored or redirected. This frequency varies depending on the site's authority, the age of the URL, and the availability of crawl budget. On some domains, this cycle can extend over 12 months or more.

Does this accumulation of 404 URLs harm SEO?

No. John Mueller is clear: this is a normal behavior. On a site that has been evolving for several years, the number of 404 error URLs in Search Console mechanically increases. Have you removed outdated pages? Reorganized categories? Changed your CMS? Each operation generates dead URLs that Google continues to check.

The real problem is when these 404s concern pages that are still referenced in your internal linking or in active sitemaps. There, you signal to Google that these pages exist, even though they return an error. It is this inconsistency that can degrade crawl experience, not the volume of historical 404s.

How long does Google keep these 404 URLs?

There is no fixed duration. Google can keep track of a URL for years, especially if it had backlinks or an indexing history. The engine periodically reevaluates the relevance of recrawling these URLs based on external signals (new links pointing to the dead URL, mentions on the web).

As long as a 404 URL does not receive new signals of interest, the frequency of rechecking decreases. But it never completely disappears from the systems. That’s why you might see 404 error URLs in Search Console that are several years old — they are simply recrawled from time to time to confirm they are still dead.

Google indefinitely retains 404 URLs in its systems and periodically rechecks them.
The frequency of rechecking varies (sometimes once a year), depending on the site's authority and the availability of crawl budget.
An increasing number of 404 URLs is normal on an older site and does not affect ranking.
The real risk: 404s pointed to by your active internal linking or XML sitemaps.
It is impossible to force Google to forget these URLs — the only option is to 301 redirect them if they still receive traffic or links.

SEO Expert opinion

Is this statement consistent with observed practices in the field?

Absolutely. For years, it has been observed that Search Console reports very old 404 URLs, sometimes stemming from migrations that occurred 5 or 10 years ago. These URLs sporadically reappear in coverage reports, even if they have never been recrawled in between, according to server logs.

Two hypotheses: either Google uses ultra-long crawl cycles for these low-priority URLs, or it tests them via secondary systems without going through the main Googlebot. In either case, this confirms that the engine retains memory of far more URLs than it actively indexes.

What nuances should be added to this advice?

Mueller says it's "normal," but there is a difference between normal and optimal. If you have 50,000 404 URLs in Search Console and 20,000 of them are still internally linked from your navigation, you have an issue of editorial coherence. Google crawls these pages because you indicate to it that they exist.

The raw volume of 404s is not a penalty signal. But the ratio of 404s to active pages can reveal significant technical debt. A site with 500 pages and 10,000 error URLs likely indicates poorly managed migrations or undocumented structural changes. [To verify]: Google might adjust the crawl budget of a site that generates massive 404s through its internal linking, even if no official communication confirms this.

In what cases does this rule not apply?

If you manage an e-commerce site with thousands of product listings that disappear each season, you cannot afford to let Google indefinitely recrawl dead URLs. The best practice: 301 redirect to a category or equivalent page, or return a 410 (Gone) code to explicitly signal that the URL is permanently removed.

The 410 does not necessarily speed up the forgetting process, but it is semantically more accurate than a 404. On high-volume sites, this distinction can help optimize crawl budget by clearly indicating to Google that there is no reason to recheck this URL.

Attention: If you notice an abnormal volume of 404s in Search Console after a migration or redesign, do not assume that "it's normal." First, check your redirects, XML sitemap, and internal linking. Historical 404s are normal; recent massive 404s signal a technical problem.

Practical impact and recommendations

What should you concretely do with these historical 404 URLs?

Nothing, in the majority of cases. If these URLs no longer have backlinks, do not generate traffic, and are not linked anywhere on your site, leave them as 404. Google will recrawl them from time to time, will see that they are still dead, and will continue on its way. You do not need to waste time redirecting or removing them from Search Console.

However, do a smart sorting. Export the list of 404 URLs from Search Console, cross-check it with your server logs and your backlink analysis tools. Identify those that still receive visits or that have quality incoming links. Those deserve a 301 redirect to an equivalent page or relevant category.

How to distinguish harmless historical 404s from problematic ones?

Segment your 404 errors by last detected date in Search Console. URLs that have not been crawled for over 6 months are probably historical residues. Those that appear regularly (every month or week) signal an active issue: broken internal link, improperly configured sitemap, or recent backlink.

Use a tool like Screaming Frog or Botify to cross-reference the 404 URLs with your internal linking. If an error URL is still linked from your navigation, footer, or articles, fix the link. If it appears in your XML sitemap, remove it immediately. Google should never discover a 404 through a file you voluntarily submit to it.

Should you massively clean up 404s after a migration?

Yes, but methodically. After a site migration, you have two types of 404s: those you have intentionally deleted (outdated pages, duplicates), and those resulting from redirection errors. The former can remain as 404. The latter should be redirected with a 301 to their closest equivalent.

Never massively redirect all your 404s to the homepage. This is a black-hat practice detected by Google as an attempt to manipulate. Better to leave a URL as 404 than to redirect it to a thematically unrelated page.

Export 404 URLs from Search Console and cross-reference with server logs.
Identify 404s that still receive traffic or backlinks and redirect them with a 301.
Remove 404 URLs from all active XML sitemaps and internal linking.
Use the 410 (Gone) code for permanently removed pages on high-volume sites.
Never redirect massively to the homepage — better a 404 than an incoherent redirection.
Monitor new appearances of 404s in Search Console to detect redesign or migration errors.

Historical 404 URLs do not harm SEO, but active 404s — linked in your navigation or your sitemaps — degrade the crawl experience and may signal technical issues. Regularly auditing your 404 errors, coupled with a targeted redirection strategy, can optimize your crawl budget without wasting time on URLs that have been dead for years. If your site has undergone multiple migrations or redesigns and you can no longer distinguish legitimate 404s from structural errors, assistance from a specialized SEO agency can help you restore order to your architecture and maximize your crawl potential.

❓ Frequently Asked Questions

Faut-il supprimer les URLs 404 du rapport Search Console ?

Non. Vous ne pouvez pas forcer Google à oublier ces URLs. Même si vous les marquez comme corrigées dans Search Console, le moteur les recrawlera un jour ou l'autre pour vérifier qu'elles renvoient toujours 404.

Un nombre élevé d'URLs 404 peut-il pénaliser mon site ?

Non, à condition que ces 404 soient des résidus historiques. En revanche, si vos 404 proviennent de liens internes cassés ou de pages présentes dans vos sitemaps, cela dégrade la qualité du crawl et peut nuire indirectement au référencement.

Le code 410 est-il plus efficace qu'un 404 pour supprimer une URL de l'index ?

Le 410 signale explicitement que la page est définitivement supprimée, mais Google le traite de manière similaire au 404. Il n'accélère pas forcément le processus de désindexation, mais peut aider à optimiser le crawl budget sur des sites à forte volumétrie.

Google crawle-t-il toutes les URLs 404 avec la même fréquence ?

Non. La fréquence dépend de l'autorité du site, de l'ancienneté de l'URL, de ses backlinks et du crawl budget disponible. Certaines URLs peuvent être revérifiées une fois par an, d'autres plus souvent si elles reçoivent de nouveaux signaux.

Comment éviter que Google découvre de nouvelles URLs 404 après une migration ?

Mettez en place un plan de redirections 301 exhaustif avant la migration, testez chaque URL avec un crawler, et retirez toutes les anciennes URLs de vos sitemaps XML. Surveillez ensuite les rapports Search Console pour corriger rapidement les erreurs résiduelles.

🏷 Related Topics

erreur 404 crawl budget indexation redirection 301 Search Console migration site maillage interne code 410

Domain Age & History Domain Name

🎥 From the same video 47

Other SEO insights extracted from this same Google Search Central video · duration 1h01 · published on 05/02/2021

🎥 Watch the full video on YouTube →

Related statements

« Previous

Google converts SVGs to PNGs internally...

301 redirects transfer ranking signals...

« Back to results