Why does Googlebot stubbornly recrawl your 404 pages?

Quick SEO Quiz

Test your SEO knowledge in 5 questions

Less than a minute. Find out how much you really know about Google search.

🕒 ~1 min 🎯 5 questions

Official statement

Pages returning a 404 code can be recrawled from time to time by Googlebot, especially if new links to these pages appear. This is done to check if they exist again, as they may return one day.

4:17

🎥 Source video

Extracted from a Google Search Central video

⏱ 1h00 💬 EN 📅 14/08/2015 ✂ 9 statements

Watch on YouTube (4:17) →

✂ Other statements from this video 8 ▾

📅

Official statement from August 14, 2015 (10 years ago)

⚠ A more recent statement exists on this topic Should You Worry if Google Keeps Crawling Your 404 Pages? John Mueller · March 24, 2026 View statement →

TL;DR

Googlebot regularly recrawls 404 pages, especially when new links point to them. Google wants to check if these resources are back online. This mechanic impacts your crawl budget and can reveal structural issues with internal linking or broken backlinks that need to be identified and resolved quickly.

What you need to understand

Does Googlebot really have a soft spot for dead pages?

Google's logic is based on a simple observation: the web is volatile. A missing page can come back. A temporarily broken URL can be restored. Googlebot adopts an opportunistic strategy rather than a definitive one.

In practical terms, the bot does not immediately classify a 404 as permanently dead. It schedules spaced crawling attempts to check if the resource re-emerges. This frequency increases if new links appear pointing to the concerned URL—this is a signal that someone, somewhere believes that this page still exists.

What triggers these repeated recrawls?

Two main factors drive this behavior. First, the appearance of new backlinks: if a third-party site creates a link to your 404, Googlebot interprets this as a sign that the page could be coming back. Next, the past popularity of the URL: a page that previously generated a lot of traffic or links stays on the radar longer.

The bot is not foolish. It regulates its efforts based on the probability of the page's resurrection. An old 404 without new signals ends up being crawled less frequently until it almost fades into oblivion.

What impact does this have on crawl budget?

Each request to a 404 consumes crawl budget without indexing anything. On a small site, the impact is negligible. On a large e-commerce catalog with thousands of archived references, it can quickly become a burden.

Google does not provide specific figures on the portion of budget wasted, but field observations show that sites with 30-40% of 404s in their logs can see their crawl efficiency drop drastically. The bot spends time on empty pages instead of discovering your new strategic pages.

Googlebot recrawls 404s to check for their potential return, especially when new links appear
This behavior consumes crawl budget without providing immediate indexable value
The intensity of recrawling depends on the past popularity of the URL and recent external signals
Old 404s without new signals gradually become neglected
On large sites, a high proportion of 404s can seriously degrade crawl efficiency

SEO Expert opinion

Does this statement align with field observations?

Yes, and it's quite reassuring to see Google admitting this officially. Log audits have confirmed this for years: Googlebot never completely lets go of a 404, especially if it has had a rich past. We observe spaced recrawls (weekly, monthly) on dead URLs for 18 months, sometimes longer.

What becomes interesting is when we correlate these recrawls with the appearance of new backlinks. Sites engaging in neglected link building—by obtaining links to pages that are now 404—fuel this unnecessary cycle. Google comes back to check, always finds an error, and resumes a few weeks later.

What nuances should be added to Mueller's statement?

Mueller remains vague on the exact frequency and the thresholds that trigger these recrawls. [To verify]: Google never specifies how many new links are sufficient to restart an intensive cycle or how long a 404 stays on the active radar. These parameters probably vary depending on the overall authority of the site.

Another blurry point: the treatment difference between a true 404 (page never recreated) and a temporary 404 that actually returns. Google claims to check “just in case,” but does not say if it learns from its mistakes. If a URL returns 404 for 3 years without interruption, the recrawl should logically become very rare. Data is lacking for a definitive answer.

In what cases does this logic pose problems?

On high-content turnover sites—media, seasonal e-commerce, classifieds—the volume of 404s naturally skyrockets. Thousands of dead URLs still accumulate external backlinks for months. The result: Googlebot spends its time checking corpses, detracting from active pages.

Another problematic case: poorly managed migrations. If you haven't properly redirected your old URLs and they continue to receive links, Google will stubbornly crawl them indefinitely. This is wasted crawl budget, while these visits could have been used to explore your new content.

Warning: A log audit revealing more than 25% of Googlebot requests to 404s signals a serious structural problem—broken internal linking, missing redirects, or uncleaned toxic backlinks. This is not normal and it hampers your SEO.

Practical impact and recommendations

What concrete actions can be taken to limit wasted crawl budget?

The first action: clean your broken backlinks. Use Search Console to identify error URLs that still receive external clicks. If these pages had value, redirect them with a 301 to a relevant equivalent. If they didn’t have value, contact the source sites to request the removal of the link or its update.

The second action: audit your internal linking. Tools like Screaming Frog or Oncrawl detect internal links pointing to 404s. Correct them immediately. Each internal link to an error invites Googlebot to waste a request. You create the problem, not Google.

When should the 410 code really be used instead of 404?

The 410 (Gone) code indicates to Google that the page is permanently dead and will never return. Theoretically, this should speed up the abandonment of recrawling. In practice, the effect remains marginal according to field reports—Google seems to treat 404s and 410s quite similarly in the medium term.

Use the 410 mainly for sensitive content that you want to disappear quickly from the index (products removed for legal reasons, outdated pages of high stakes). For everything else, a standard 404 suffices, provided it is not fed with internal linking or active backlinks.

How can you check if your site is overusing crawl budget on 404s?

Download your server logs for a minimum of 30 days. Filter Googlebot requests and calculate the proportion that hits 4xx codes. If you exceed 15-20%, you have a problem. Identify the most crawled error URLs and prioritize them—redirecting, correcting internal links, or disavowing poor backlinks.

Complement with Search Console: Coverage section, Excluded tab. The “Not Found (404)” URLs that appear with recent detection dates signal that Google is still recrawling them. Cross-check with your backlinks to understand why these pages stay on the radar.

Audit your server logs to measure the proportion of crawl consumed by 404s
301 redirect old strategic URLs that still receive backlinks
Clean all internal links pointing to 404 errors
Contact source sites for broken backlinks for updates or removal
Use the 410 code only for definitively removed high-stakes content
Monitor Search Console to identify recently crawled 404s and act quickly

Managing 404s directly impacts your SEO efficiency. A site that allows hundreds of errors fueled by broken internal linking or obsolete backlinks wastes its crawl budget. Google will recrawl these pages indefinitely as long as signals suggest they might return. Methodically clean up, redirect smartly, and regularly monitor your logs. These optimizations require detailed analysis and sometimes complex technical interventions. If your site accumulates several thousand 404s or if your logs reveal chronic crawl budget waste, seeking assistance from a specialized SEO agency can be a wise choice to accurately diagnose the causes and deploy a structured action plan.

❓ Frequently Asked Questions

Faut-il supprimer les 404 de la Search Console pour éviter qu'elles soient recrawlées ?

Non, les supprimer de la Search Console ne change rien au comportement de Googlebot. Le robot crawle selon ses propres règles, pas selon ce que vous validez ou masquez dans l'interface. Traitez la cause : redirigez ou nettoyez les liens.

Une page 404 peut-elle consommer plus de crawl budget qu'une page active ?

Non, chaque requête coûte grosso modo la même chose en crawl budget. Le problème, c'est que la 404 ne rapporte rien — aucun contenu indexable. Vous échangez une requête utile contre du vide.

Combien de temps Google continue-t-il de recrawler une 404 sans nouveaux signaux ?

Google ne donne pas de durée précise. Les observations terrain montrent que sans nouveaux backlinks ni liens internes, le recrawl s'espace progressivement sur plusieurs mois, voire années, mais ne s'arrête jamais complètement.

Le code 410 accélère-t-il vraiment la désindexation par rapport au 404 ?

Marginalement. Google traite les deux codes de façon similaire à moyen terme. Le 410 peut légèrement accélérer l'abandon du recrawl sur des pages à fort historique, mais l'effet reste difficile à mesurer précisément.

Les redirections 301 depuis des 404 vers la homepage sont-elles une bonne pratique ?

Non, c'est une mauvaise pratique classique. Redirigez uniquement vers une page thématiquement équivalente. Sinon, laissez la 404 : au moins elle informe honnêtement Google que la ressource n'existe plus.

🏷 Related Topics

crawl budget erreurs 404 Googlebot redirections 301 maillage interne backlinks cassés logs serveur indexation

Domain Age & History Crawl & Indexing AI & SEO Links & Backlinks

🎥 From the same video 8

Other SEO insights extracted from this same Google Search Central video · duration 1h00 · published on 14/08/2015

🎥 Watch the full video on YouTube →

Related statements

« Previous

Disputes Over Canonical Tags...

Effect of Nofollow Links on Ranking...

« Back to results