Official statement
Other statements from this video 38 ▾
- 21:28 Do sitemaps really trigger a quick recrawl of your modified pages?
- 21:28 Can you really force Google to recrawl immediately after a price change?
- 40:33 Does font size really influence Google rankings?
- 40:33 Does CSS font size really impact your positions on Google?
- 70:28 Is it true that content concealed behind a Read More button is actually indexed by Google?
- 70:28 Is it true that content hidden behind a 'Read More' button is actually indexed by Google?
- 98:45 Does internal linking truly overshadow the sitemap in signaling your strategic pages to Google?
- 98:45 Is Internal Linking Really More Crucial Than a Sitemap for Prioritizing Your Pages?
- 111:39 Why Doesn't the Search Console API Show Referring URLs for 404 Errors?
- 182:01 Should you really be worried about having 30% of URLs as 404s on your site?
- 182:01 Can a high 404 rate really hurt your SEO rankings?
- 217:15 How can you effectively target multiple countries with a single domain without losing your local SEO?
- 217:15 Can you really target different countries on the same domain without using subdomains?
- 227:52 Should you really use hreflang when targeting multiple countries with the same language?
- 227:52 Should you really combine hreflang and geographical targeting in Search Console?
- 276:47 Why do your structured data breadcrumbs not show up in the SERPs?
- 285:28 Why do your rich results vanish from the standard SERPs while still appearing in site searches?
- 293:25 Do Invisible Breadcrumbs Really Block Your Rich Results on Google?
- 325:12 Should you really be optimizing JavaScript hydration for Googlebot in SSR?
- 347:05 Is it true that word count doesn't matter for ranking on Google?
- 347:05 Is the number of words really a ranking factor for Google?
- 400:17 Does the traffic volume of your site affect your Core Web Vitals score?
- 415:20 Does traffic volume really influence your Core Web Vitals?
- 420:26 Does content relevance truly outweigh Core Web Vitals in Google rankings?
- 422:01 Can Core Web Vitals Really Boost Your Ranking Without Relevant Content?
- 510:42 Is it true that Google can't always show the right local version of your site?
- 529:29 Is it really necessary to duplicate all country codes in hreflang for targeting multiple regions?
- 531:48 Why does hreflang in Latin America require each country code individually?
- 574:05 Does PageSpeed Insights really measure your site's performance?
- 598:16 Is it really possible to shift from long-tail to short-tail without changing strategy?
- 616:26 Can you really hide dates from Google search results?
- 635:21 Should you stop updating publication dates to boost your SEO?
- 649:38 Does Google really rewrite your titles to help you out?
- 650:37 Can you really stop Google from rewriting your title tags?
- 688:58 Should you really report SERP bugs with generic queries to expect a response from Google?
- 870:33 Should new e-commerce sites prove their legitimacy outside of Google first?
- 937:08 Is it true that the length of the title really impacts Google rankings?
- 940:42 Is it true that the length of title tags really impacts Google's rankings?
Google remembers dead URLs for at least 7 to 8 years and occasionally retries them, even if they consistently return 404 or 410. These URLs end up in a low-priority queue and consume a tiny portion of the crawl budget. For an SEO practitioner, this means that URLs removed a long time ago can still appear in server logs and managing old redirects remains relevant over time.
What you need to understand
What is the actual lifespan of a URL in Google's memory?
John Mueller reveals that Google keeps track of crawled URLs for at least 7 to 8 years, even if they no longer exist. This duration significantly exceeds what most practitioners imagine. Specifically, a deleted page from 2016 can still receive sporadic crawl attempts.
These URLs join a low-priority queue where Google occasionally attempts to check if the content has returned. The search engine doesn’t abandon a URL at the first 404 — it marks it as inactive but doesn’t forget it completely. This persistence can be explained by the historical operation of the index: Google prefers to keep a record rather than delete it permanently.
How does this low-priority queue function?
The exact mechanism remains unclear, but field observations confirm that Google gradually spaces out its crawl attempts on URLs that consistently return 404 or 410. A URL may be tried once a week initially, then once a month, and subsequently every quarter.
This low-priority queue consumes only a marginal fraction of the total crawl budget. However, on a site with a heavy history (re-designs, multiple migrations, massive removals), the cumulative volume can become visible in logs. These crawl attempts do not directly penalize SEO but reveal Google’s long memory.
Why does Google maintain this persistence on dead URLs?
The search engine does not want to miss a content resurrection. If a historical URL with a good backlink profile comes back online, Google wants to detect it quickly. This logic applies especially to URLs that had visibility, inbound links, or significant traffic in the past.
Moreover, Google knows that some sites practice temporary downtime or poorly managed migrations where 404 URLs may return months later. Rather than erasing all traces, the engine prefers to keep a list of “watch” URLs. It’s an insurance policy against false negatives.
- Google remembers URLs for 7-8 years minimum, even after permanent deletion
- Dead URLs join a low-priority queue with spaced crawl attempts
- This persistence aims to detect potential content resurrections, especially if the URL had weight
- The crawl volume consumed remains marginal but can be visible on historically heavy sites
- HTTP codes 410 (Gone) and 404 (Not Found) are treated similarly in the long term
SEO Expert opinion
Is this statement consistent with field observations?
Absolutely. Server logs from sites that have undergone multiple redesigns confirm that Googlebot regularly attempts to crawl URLs deleted for years. Hits are frequently observed on paths dating back to 2015-2017, with a low but consistent frequency. Mueller merely officially confirms what technical SEOs have seen in their logs for a long time.
However, the exact duration of 7-8 years remains a ballpark figure, not a strict rule. Some sites report attempts on even older URLs, while others see attempts stop after 3-4 years. The initial priority of the URL, its link profile, and its traffic history likely play a role in this retention duration. [To Verify]: No official data specifies the exact criteria for prioritization in this queue.
Should we treat 404s and 410s differently to expedite forgetting?
Let's be honest: the distinction between 404 (Not Found) and 410 (Gone) is theoretically clear, but in practice, Google treats them very similarly in the long run. The 410 is supposed to signal a permanent deletion, but Mueller clarifies that even these URLs remain in the low-priority queue.
Using a 410 can slightly speed up the initial deindexing, but it does not guarantee that Google stops its crawl attempts entirely. The difference is mainly in the first weeks after deletion. After that point, both codes converge towards the same treatment: retained in memory with spaced attempts. Don’t rely on the 410 as a magic erase button.
What are the hidden implications for managing crawl budget?
On a medium-sized site with a comfortable crawl budget, this persistence has no measurable impact. Googlebot dedicates most of its resources to active and fresh URLs. Attempts on old dead URLs represent a negligible portion, often less than 1% of the total crawl.
The problem emerges on massive sites with a history of multiple migrations or thousands of deleted URLs. If your crawl budget is already stretched (low crawl frequency, important pages updated slowly), every hit on a dead URL is a hit that isn’t going to active content. In these specific cases, monitoring logs and identifying old URLs still being crawled can help diagnose inefficiencies. But let’s be pragmatic: optimizing the current structure of the site will have 100 times more impact than trying to erase Google’s memory.
Practical impact and recommendations
What to do with old URLs that linger in the logs?
First step: identify the actual crawl volume consumed by these dead URLs. Parse your server logs (Screaming Frog Log Analyzer, Botify, OnCrawl, or a custom script) and filter Googlebot hits on URLs returning 404 or 410. If the volume is less than 2-3% of total crawl, ignore them — this isn’t where your SEO performance is at stake.
If the volume is significant (>5% of crawl), dig deeper. Do these URLs still have active backlinks? If yes, 301 redirect them to the most relevant page. If not, leave the 404 in place and focus on optimizing active content. Don’t waste time cleaning up URLs that only consume a marginal fraction of the budget.
Should you block these URLs in robots.txt to force forgetting?
No. Blocking 404 URLs in robots.txt is a classic mistake that worsens the situation. If Googlebot can no longer crawl the URL, it cannot confirm that it actually returns 404 — so it remembers it indefinitely, in a “blocked” status. You replace an occasional crawl with a permanent uncertainty.
The only exception concerns sensitive URLs that you absolutely want to disappear from the index. In this case, keep them accessible as 404/410 until Google fully deindexes them, then possibly block them. But for ordinary dead URLs, robots.txt adds no value. Let Google see the 404 and naturally space out its attempts.
How to manage migrations and redesigns to limit this effect over the long term?
During a redesign, properly map all old URLs to their equivalents via 301. Even if some pages no longer have a direct equivalent, redirect to the closest category or parent page. A well-thought-out 301 is always preferable to a 404, especially if the old URL had backlinks or traffic.
For truly obsolete URLs (discontinued products without replacements, closed sections), accept the 404. But document these choices: keep a list of URLs intentionally removed to later justify why they weren't redirected. This avoids nasty surprises when, three years later, someone asks why a frequently crawled URL returns 404.
- Analyze your server logs to quantify the actual crawl on dead URLs (if < 3%, ignore)
- Identify old URLs with active backlinks and redirect them in 301 to relevant content
- Never block 404 URLs in robots.txt — this prevents Google from confirming their status
- During a redesign, systematically map old URLs to their equivalents or parent pages
- Document URLs intentionally left as 404 to justify these choices in the long run
- Monitor logs after migration to detect any abnormal crawl patterns
❓ Frequently Asked Questions
Combien de temps Google garde-t-il une URL 404 en mémoire ?
Le code 410 accélère-t-il vraiment la suppression d'une URL de l'index de Google ?
Ces tentatives de crawl sur anciennes URLs consomment-elles beaucoup de crawl budget ?
Faut-il bloquer les URLs 404 dans le robots.txt pour forcer Google à les oublier ?
Comment gérer les anciennes URLs qui reçoivent encore des backlinks actifs ?
🎥 From the same video 38
Other SEO insights extracted from this same Google Search Central video · duration 985h14 · published on 26/02/2021
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.