Why does Google still crawl your deleted old URLs?

Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

Google occasionally continues to crawl old URLs (returning 404) for years, especially if they had backlinks or were important. It is at low priority and does not block normal site crawling.

46:46

🎥 Source video

Extracted from a Google Search Central video

⏱ 53:08 💬 EN 📅 29/10/2020 ✂ 26 statements

Watch on YouTube (46:46) →

✂ Other statements from this video 25 ▾

📅

Official statement from October 29, 2020 (5 years ago)

⚠ A more recent statement exists on this topic Why does Google continue crawling 301 redirected URLs for over a year? John Mueller · May 7, 2021 View statement →

TL;DR

Google continues to crawl 404-returning URLs for years, especially if they had backlinks or some historical significance. This behavior is normal, operates at low priority, and does not impact the crawl budget allocated to your site's active pages. Therefore, there’s no need to panic seeing these requests in your logs: they don’t block anything.

What you need to understand

Does Google really crawl dead pages for years?

Yes, it is well documented. Googlebot periodically revisits URLs that return a 404 code, even after the content has been permanently deleted. The reason? These URLs have left a mark in the index: external backlinks, historical mentions, accumulated authority signals.

The engine keeps track of these URLs and occasionally checks if they are back online. This is not a bug; it's a deliberate mechanism to detect a potential restoration of content. Specifically, if you delete a high-authority page and then republish it six months later, Google should be able to rediscover it.

Does crawling these old URLs consume my crawl budget?

No. Mueller is clear: this crawl occurs at low priority. The resources allocated to crawling your active pages are not diverted to these dead URLs. Google clearly distinguishes between priority crawling (new pages, updates, important content) and opportunistic crawling (sporadic checks, historical URLs).

In your server logs, these requests do appear, but they do not warrant any urgent corrective action. If your site generates enough fresh content, the total crawl budget remains mostly allocated to live pages.

Should I block these URLs in robots.txt to clean up the logs?

That's a bad idea. Blocking a 404 URL in robots.txt prevents Google from noticing that the page no longer exists. Result: the URL remains indefinitely in the index with an uncertain status, instead of being properly deindexed.

Allowing the 404 to occur enables the engine to confirm the permanent disappearance of the content and, ultimately, to remove the URL from the index. Blocking the crawl artificially prolongs the ghostly presence of these pages. Counterproductive.

Google crawls historical 404s for years if they had backlinks or importance
This behavior is normal and intentional, not a malfunction
The crawl occurs at low priority and does not penalize the budget allocated to active pages
Blocking these URLs in robots.txt hampers proper deindexation
The server logs reflect this traffic, but it requires no corrective action

SEO Expert opinion

Is this statement consistent with observed practices on the ground?

Absolutely. Log analysts have long noted that Googlebot revisits URLs deleted years ago. What often surprises is the duration of this persistence: some 404 URLs continue to receive requests five, six, or even ten years after their disappearance.

The key variable? The backlink profile. A URL with 50 quality external links will be crawled much longer than a page without any incoming links. Google clearly applies a cost/benefit logic: as long as there is a non-zero probability that the page may reappear, occasional crawling remains justified.

What nuances should be added to this assertion?

Mueller talks about low priority crawl, but does not quantify. What proportion of the total crawl budget? How many requests exactly? [To be verified]. Without figures, it's difficult to assess the real impact on very large sites (millions of pages) with a massive history of deleted URLs.

Another vague point: the definition of an

Practical impact and recommendations

What should you actually do with this information?

First, don’t panic when you see 404 URLs in your server logs. If these pages were historically significant, it’s normal for them to be revisited. Focus on crawling your active pages: as long as your new content is being discovered quickly, everything is fine.

Next, ensure your HTTP codes are correct. A 404 should be a real 404, not a soft 404 ("not found" page served as 200). Google needs to formally acknowledge the disappearance of content to adjust its long-term crawling behavior.

What mistakes should you absolutely avoid?

Never block in robots.txt the URLs you want to deindex. This common practice is counterproductive: it freezes the URL in an uncertain state and delays its definitive removal from the index. Let the 404 express itself freely.

Also avoid massively transforming your 404s into generic 301 redirects to the homepage. Some do this to "clean up" logs, but it creates a chaotic signal: hundreds of disparate URLs redirecting to unrelated content. Google detects this pattern and may consider it disguised soft 404.

How to optimize the management of your deleted URLs?

If you delete a page with backlinks, ask yourself: is there equivalent content on the site? If so, redirect with a 301 to that page. If not, own the 404 and let Google naturally acknowledge the disappearance.

For migrations or redesigns, plan a comprehensive mapping of redirects. Each historical URL should point to its most relevant equivalent, not to a catch-all destination. Yes, it’s tedious on large sites, but it’s what preserves your accumulated authority.

Analyze your server logs to identify the most crawled 404 URLs (strong backlinks = persistent crawl)
Ensure your 404s return a true 404 code, not a soft 404 with code 200
Never block these URLs in robots.txt — let the 404 speak
When migrating, create precise 301 redirects to equivalent content, not to the homepage
Monitor the proportion of the crawl budget consumed by 404s: if it exceeds 10-15%, audit your redirects
For deleted pages without equivalents, own the 404 and do not create artificial redirects

Google crawls your old 404 URLs for years if they had backlinks — it's normal and poses no danger to your active crawl budget. Don't block this crawl, let the 404 allow for clean deindexing. Focus on the quality of your redirects during migrations and on the consistency of your HTTP codes. If your site has undergone multiple redesigns or complex migrations, these optimizations can quickly become time-consuming. In this context, relying on a specialized SEO agency helps to precisely audit your logs, map strategic redirects, and avoid costly mistakes in the long run.

❓ Frequently Asked Questions

Combien de temps Google continue-t-il de crawler une URL 404 ?

Cela dépend principalement du profil de backlinks de l'URL. Une page avec de nombreux liens externes de qualité peut être crawlée pendant des années, voire une décennie. Sans backlinks, le crawl cesse généralement après quelques mois.

Ce crawl des 404 impacte-t-il mon ranking ?

Non, pas directement. Le crawl des 404 s'effectue à basse priorité et ne détourne pas les ressources allouées à vos pages actives. En revanche, un nombre massif de 404 sans redirections appropriées peut signaler une mauvaise gestion du site.

Faut-il supprimer les URLs 404 de la Search Console ?

Non, c'est inutile. La Search Console signale ces erreurs pour information, mais elles ne pénalisent pas votre site. Si l'URL est volontairement supprimée, le 404 est la réponse correcte. Concentrez-vous sur les 404 involontaires (liens internes cassés).

Puis-je accélérer la désindexation d'une URL 404 ?

Oui, en demandant la suppression via l'outil dédié dans la Search Console. Mais si l'URL a des backlinks forts, Google peut continuer à la crawler occasionnellement même après désindexation formelle.

Les redirections 301 sont-elles meilleures que les 404 pour les URLs supprimées ?

Seulement si elles pointent vers un contenu vraiment équivalent. Une redirection 301 vers un contenu sans rapport est contre-productive et sera traitée comme un soft 404. Si aucun équivalent n'existe, le 404 est la réponse honnête et appropriée.

🏷 Related Topics

crawl budget erreur 404 backlinks indexation robots.txt redirections 301 logs serveur désindexation

Domain Age & History Crawl & Indexing AI & SEO Links & Backlinks Domain Name

🎥 From the same video 25

Other SEO insights extracted from this same Google Search Central video · duration 53 min · published on 29/10/2020

🎥 Watch the full video on YouTube →

Related statements

« Previous

Google does not analyze podcast audio...

URL Parameters: URL Parameters Tool Works but No D...

« Back to results