Why does Google keep crawling your old 404 URLs?

Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

Google continues to occasionally crawl old URLs that return 404, particularly if they had backlinks or were significant. This crawl is done at very low priority and does not block the crawl of new pages. It's a normal behavior.

47:09

🎥 Source video

Extracted from a Google Search Central video

⏱ 53:08 💬 EN 📅 29/10/2020 ✂ 26 statements

Watch on YouTube (47:09) →

✂ Other statements from this video 25 ▾

📅

Official statement from October 29, 2020 (5 years ago)

⚠ A more recent statement exists on this topic Why does Google keep crawling your old URLs even after a migration? Google · June 24, 2021 View statement →

TL;DR

Google occasionally crawls URLs that return 404, especially if they had backlinks or were deemed important. This crawl is done at very low priority and does not impact the budget allocated for new pages. It's a normal behavior of the engine, not a warning signal.

What you need to understand

Why does Googlebot insist on dead pages?

The behavior may seem counterintuitive: why crawl 404 URLs when they don't return any usable content? The answer lies in Google's long memory. When a page has accumulated significant backlinks or has played a role in the site's historical architecture, the engine keeps it in its monitoring index.

Googlebot periodically checks if these URLs have come back online. A site may restore an important page, merge content, or correct a technical error. The crawler thus maintains a reminder list for these URLs — but with minimal priority.

Does this crawl eat away at the budget allocated to active pages?

No. This is Mueller's key assertion. Google uses an internal prioritization system that clearly separates the resources allocated to active content from those dedicated to peripheral monitoring. Historical 404s fall into a distinct queue, crawled at a very spaced-out pace.

In practical terms, if your site publishes 50 new URLs per day, the occasional visit to 200 old 404s does not reduce the number of times Googlebot will visit those new pages. The two processes coexist without competing for crawl budget.

What URLs are affected by this behavior?

Not all 404s receive this residual attention. Google prioritizes those that had authority signals: volume of backlinks, historical traffic, position in the internal link structure of the time. A product page that generated 1000 visits/month for 3 years will remain monitored, while a typo corrected six months ago will be forgotten quickly.

The crawl continues as long as the external backlinks remain active. If these links disappear or are corrected, Google eventually stops monitoring. The exact timeframe remains opaque — probably several months to several years depending on the page's history.

404s with active backlinks are crawled periodically to detect possible restoration
This crawl uses a separate queue at very low priority and does not affect new pages
The behavior is normal and requires no corrective action if your logs show this pattern
The duration of monitoring depends on the page's history and the persistence of incoming links

SEO Expert opinion

Is this statement consistent with on-the-ground observations?

Yes, and it’s actually one of the rare cases where the official communication perfectly aligns with what we see in the server logs. Crawl audits consistently reveal that Googlebot visits historical 404 URLs — often old product listings, migrated categories, or expired campaign pages. The frequency remains very low: once every 15-45 days for moderately important URLs.

The distinction between priority and secondary queues is also confirmed. When analyzing the temporal distribution of the crawl, 404s appear in distinct time slots, often during off-peak hours. The engine seems to effectively manage two parallel routes.

What nuances need to be added to this assertion?

Mueller speaks of “very low priority,” but this concept remains relative to the size of the site. On a small site of 500 pages, crawling 200 old 404s every 3 weeks still represents 40% of the total URL volume. The impact may not be direct on the budget, but it pollutes the logs and complicates the analysis.

Another point: the definition of “important old URLs” lacks precision. [To be verified] No quantitative threshold is provided regarding the number of backlinks or the duration of retention in the monitoring queue. Is a link from a DR20 site sufficient? How long after backlinks disappear does Googlebot really stop?

In what cases can this behavior become problematic?

On sites that have undergone multiple migrations or major redesigns, the volume of historical 404s can become massive. I’ve seen logs where 30-40% of Googlebot requests targeted URLs that have been dead for 2-3 years. Even though theoretically this doesn't affect the crawl of active pages, it generates unnecessary server load and muddles monitoring metrics.

Another problematic case: e-commerce sites that massively deindex seasonal products. If these pages retain backlinks (buying guides, comparison sites), they remain on Google’s radar for months. The cumulative load can become significant on catalogs of 50,000+ items.

Attention: If your logs show intensive 404 crawling (>20% of total volume), this is NOT the normal behavior described by Mueller. Look for chained redirects, broken internal links, or a polluted sitemap that forces Googlebot to re-crawl these URLs in a loop.

Practical impact and recommendations

Should you take action on these old 404 URLs?

In most cases, no action is required. If logs confirm spaced crawling (once every 2-4 weeks) and the volume remains marginal (<10% of total hits), this is the normal behavior described by Mueller. You can ignore these lines in your reports.

However, if some of these historical URLs point to content you have moved or merged, now is the time to implement 301 redirects. You capitalize on existing backlinks instead of letting Google monitor a 404 indefinitely. Bonus: you recover lost SEO juice.

How do you distinguish normal crawling from a technical problem?

Analyze the distribution of crawl in your logs. A healthy pattern shows: intense crawling on active pages, sporadic visits to historical 404s, and total absence of crawling on recent 404s (created less than 3 months ago). If Googlebot is hammering fresh 404 URLs, it’s discovering them from somewhere — sitemap, internal links, broken redirects.

Also check the temporal distribution. Historical 404s should appear sparsely, never in a massive block on the same day. A clustered crawl suggests that Google has rediscovered these URLs via an external source (new influx of backlinks, exploring a web archive).

What optimizations should be put in place concretely?

Start with an audit of backlinks pointing to your 404s. Tools: Ahrefs, Majestic, Search Console (Links section). For each URL receiving more than 5 quality backlinks, decide: redirect to equivalent content, restore the page, or contact the source site for link update.

Next, clean up your internal linking. Even if Google theoretically separates the queues, each broken internal link forces unnecessary crawling. A crawler like Screaming Frog detects these links in 10 minutes. Fix or remove them.

Extract the list of 404 URLs crawled in the last 30 days (server logs or Search Console)
Cross-reference with a backlink audit to identify those retaining active incoming links
Implement 301 redirects to equivalent content when appropriate
Verify the absence of these URLs in sitemap.xml and robots.txt
Audit the internal linking to remove any links pointing to 404s
Monitor the evolution of the 404/crawl total ratio over 3 months

The occasional crawl of your old 404s is normal and requires no panic intervention. Focus your efforts on URLs retaining exploitable backlinks through redirects, and ensure that your internal linking never points to dead pages. These optimizations require a fine analysis of logs and in-depth knowledge of the site's history. If your team lacks the resources or expertise to conduct this in-depth audit, the support of a specialized SEO agency can expedite diagnosis and implementation of targeted corrections.

❓ Frequently Asked Questions

Combien de temps Google continue-t-il de crawler une URL en 404 ?

Tant que des backlinks pointent vers elle ou qu'elle conserve un historique d'importance. Le délai exact n'est pas communiqué mais s'étend probablement sur plusieurs mois voire années selon le profil de la page.

Ce crawl de 404 consomme-t-il mon crawl budget ?

Non selon Mueller. Google utilise une file d'attente séparée à très basse priorité pour ces URLs, ce qui n'affecte pas le crawl des pages actives et nouvelles.

Faut-il bloquer ces URLs dans robots.txt ou les mettre en noindex ?

Non, c'est inutile et contre-productif. Une 404 propre est la réponse correcte. Bloquer dans robots.txt empêche Google de constater que la page n'existe plus, prolongeant potentiellement la surveillance.

Comment savoir si mes 404 sont crawlées normalement ou trop souvent ?

Analysez vos logs serveur. Un crawl normal = visite espacée (toutes les 2-4 semaines) sur <10% du volume total. Au-delà, cherchez des liens internes cassés ou un sitemap pollué.

Vaut-il mieux rediriger systématiquement toutes les 404 vers la homepage ?

Non, c'est une mauvaise pratique. Redirigez uniquement vers un contenu équivalent pertinent. Une 404 honnête vaut mieux qu'une redirection non pertinente qui dégrade l'expérience utilisateur et dilue le PageRank.

🏷 Related Topics

crawl budget erreurs 404 backlinks Googlebot logs serveur redirections 301 maillage interne indexation

Domain Age & History Crawl & Indexing AI & SEO Links & Backlinks Domain Name

🎥 From the same video 25

Other SEO insights extracted from this same Google Search Central video · duration 53 min · published on 29/10/2020

🎥 Watch the full video on YouTube →

Related statements

« Previous

Do Not Cancel a Domain Migration...

Google does not analyze podcast audio...

« Back to results