What does Google say about SEO? /
Quick SEO Quiz

Test your SEO knowledge in 5 questions

Less than a minute. Find out how much you really know about Google search.

🕒 ~1 min 🎯 5 questions

Official statement

Google continues to occasionally crawl old URLs that return 404, particularly if they had backlinks or were significant. This crawl is done at very low priority and does not block the crawl of new pages. It's a normal behavior.
47:09
🎥 Source video

Extracted from a Google Search Central video

⏱ 53:08 💬 EN 📅 29/10/2020 ✂ 26 statements
Watch on YouTube (47:09) →
Other statements from this video 25
  1. 1:41 Should you really use cross-domain canonicals to consolidate multiple thematic sites?
  2. 2:00 Do 302 redirects really pass PageRank like 301 redirects?
  3. 2:00 Does the canonical tag really transfer 100% of PageRank without any loss?
  4. 14:00 Should you really avoid putting all your outbound links in nofollow?
  5. 14:10 Should you really avoid setting all your outbound links to nofollow?
  6. 16:16 Is the URL Parameters Tool in Search Console a zombie or still useful for your SEO?
  7. 16:36 Does Google's URL Parameters tool still work even when its interface is broken?
  8. 20:01 Why does blocking robots.txt prevent noindex from working?
  9. 22:03 Are Core Web Vitals really the only speed criterion that counts for ranking?
  10. 23:03 Core Web Vitals: Why does Google ignore other performance metrics for Page Experience?
  11. 25:15 Do PageSpeed tests really mislead you about your Core Web Vitals?
  12. 26:50 Is alt text truly crucial for your visibility in Google Images?
  13. 26:50 Does alternative text for images really enhance SEO?
  14. 28:26 Do 302 redirects really pass as much PageRank as 301s?
  15. 30:17 Should you really hide cookie consent banners from Googlebot?
  16. 30:57 Should you really block cookie banners for Googlebot?
  17. 34:46 Why does Google still display old content in your meta descriptions?
  18. 34:46 Why does Google sometimes show your old meta descriptions in the SERPs?
  19. 36:57 Should you really show cookie banners to Googlebot?
  20. 37:56 Do 302 redirects really turn into 301s over time?
  21. 40:01 Should you really return a 404 for products that are permanently unavailable?
  22. 40:01 Should you return a 404 or a 200 on a product page that's out of stock?
  23. 43:37 Should you sync visible and technical dates to enhance your crawl?
  24. 43:38 Should you really differentiate between the visible date and the structured data date?
  25. 46:46 Why does Google still crawl your deleted old URLs?
📅
Official statement from (5 years ago)
TL;DR

Google occasionally crawls URLs that return 404, especially if they had backlinks or were deemed important. This crawl is done at very low priority and does not impact the budget allocated for new pages. It's a normal behavior of the engine, not a warning signal.

What you need to understand

Why does Googlebot insist on dead pages?

The behavior may seem counterintuitive: why crawl 404 URLs when they don't return any usable content? The answer lies in Google's long memory. When a page has accumulated significant backlinks or has played a role in the site's historical architecture, the engine keeps it in its monitoring index.

Googlebot periodically checks if these URLs have come back online. A site may restore an important page, merge content, or correct a technical error. The crawler thus maintains a reminder list for these URLs — but with minimal priority.

Does this crawl eat away at the budget allocated to active pages?

No. This is Mueller's key assertion. Google uses an internal prioritization system that clearly separates the resources allocated to active content from those dedicated to peripheral monitoring. Historical 404s fall into a distinct queue, crawled at a very spaced-out pace.

In practical terms, if your site publishes 50 new URLs per day, the occasional visit to 200 old 404s does not reduce the number of times Googlebot will visit those new pages. The two processes coexist without competing for crawl budget.

What URLs are affected by this behavior?

Not all 404s receive this residual attention. Google prioritizes those that had authority signals: volume of backlinks, historical traffic, position in the internal link structure of the time. A product page that generated 1000 visits/month for 3 years will remain monitored, while a typo corrected six months ago will be forgotten quickly.

The crawl continues as long as the external backlinks remain active. If these links disappear or are corrected, Google eventually stops monitoring. The exact timeframe remains opaque — probably several months to several years depending on the page's history.

  • 404s with active backlinks are crawled periodically to detect possible restoration
  • This crawl uses a separate queue at very low priority and does not affect new pages
  • The behavior is normal and requires no corrective action if your logs show this pattern
  • The duration of monitoring depends on the page's history and the persistence of incoming links

SEO Expert opinion

Is this statement consistent with on-the-ground observations?

Yes, and it’s actually one of the rare cases where the official communication perfectly aligns with what we see in the server logs. Crawl audits consistently reveal that Googlebot visits historical 404 URLs — often old product listings, migrated categories, or expired campaign pages. The frequency remains very low: once every 15-45 days for moderately important URLs.

The distinction between priority and secondary queues is also confirmed. When analyzing the temporal distribution of the crawl, 404s appear in distinct time slots, often during off-peak hours. The engine seems to effectively manage two parallel routes.

What nuances need to be added to this assertion?

Mueller speaks of “very low priority,” but this concept remains relative to the size of the site. On a small site of 500 pages, crawling 200 old 404s every 3 weeks still represents 40% of the total URL volume. The impact may not be direct on the budget, but it pollutes the logs and complicates the analysis.

Another point: the definition of “important old URLs” lacks precision. [To be verified] No quantitative threshold is provided regarding the number of backlinks or the duration of retention in the monitoring queue. Is a link from a DR20 site sufficient? How long after backlinks disappear does Googlebot really stop?

In what cases can this behavior become problematic?

On sites that have undergone multiple migrations or major redesigns, the volume of historical 404s can become massive. I’ve seen logs where 30-40% of Googlebot requests targeted URLs that have been dead for 2-3 years. Even though theoretically this doesn't affect the crawl of active pages, it generates unnecessary server load and muddles monitoring metrics.

Another problematic case: e-commerce sites that massively deindex seasonal products. If these pages retain backlinks (buying guides, comparison sites), they remain on Google’s radar for months. The cumulative load can become significant on catalogs of 50,000+ items.

Attention: If your logs show intensive 404 crawling (>20% of total volume), this is NOT the normal behavior described by Mueller. Look for chained redirects, broken internal links, or a polluted sitemap that forces Googlebot to re-crawl these URLs in a loop.

Practical impact and recommendations

Should you take action on these old 404 URLs?

In most cases, no action is required. If logs confirm spaced crawling (once every 2-4 weeks) and the volume remains marginal (

❓ Frequently Asked Questions

Combien de temps Google continue-t-il de crawler une URL en 404 ?
Tant que des backlinks pointent vers elle ou qu'elle conserve un historique d'importance. Le délai exact n'est pas communiqué mais s'étend probablement sur plusieurs mois voire années selon le profil de la page.
Ce crawl de 404 consomme-t-il mon crawl budget ?
Non selon Mueller. Google utilise une file d'attente séparée à très basse priorité pour ces URLs, ce qui n'affecte pas le crawl des pages actives et nouvelles.
Faut-il bloquer ces URLs dans robots.txt ou les mettre en noindex ?
Non, c'est inutile et contre-productif. Une 404 propre est la réponse correcte. Bloquer dans robots.txt empêche Google de constater que la page n'existe plus, prolongeant potentiellement la surveillance.
Comment savoir si mes 404 sont crawlées normalement ou trop souvent ?
Analysez vos logs serveur. Un crawl normal = visite espacée (toutes les 2-4 semaines) sur <10% du volume total. Au-delà, cherchez des liens internes cassés ou un sitemap pollué.
Vaut-il mieux rediriger systématiquement toutes les 404 vers la homepage ?
Non, c'est une mauvaise pratique. Redirigez uniquement vers un contenu équivalent pertinent. Une 404 honnête vaut mieux qu'une redirection non pertinente qui dégrade l'expérience utilisateur et dilue le PageRank.
🏷 Related Topics
Domain Age & History Crawl & Indexing AI & SEO Links & Backlinks Domain Name

🎥 From the same video 25

Other SEO insights extracted from this same Google Search Central video · duration 53 min · published on 29/10/2020

🎥 Watch the full video on YouTube →

Related statements

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.