Official statement
Other statements from this video 21 ▾
- 1:22 Is it true that Google delays mobile-first migration for some sites?
- 3:10 Does mobile-first indexing really improve your ranking in Google?
- 5:13 Should you really prioritize every Search Console issue as a crisis?
- 7:07 Do you really need to optimize internal link anchors, or is it a waste of time?
- 8:42 Should you really avoid having multiple pages for the same keyword?
- 9:58 Can you really prove the editorial quality of your content to Google with structured data tags?
- 11:33 Do you really need to stick to the supported page types for the reviewed-by schema?
- 14:02 Is Google really tolerant of technical cloaking?
- 19:36 How does Google group your URLs to prioritize crawling?
- 22:04 Why does your traffic really drop after a publishing break?
- 24:16 Why is Google Discover more demanding than traditional search for showcasing your content?
- 26:31 Does unsupported structured data really affect ranking?
- 28:37 Do technical errors on a main domain really penalize its subdomains?
- 30:44 Why do your review snippets seem to disappear and then reappear every week?
- 32:16 Is Domain Authority Really Useless for Your SEO Strategy?
- 32:16 Are manually posted backlinks in forums and comments really useless for SEO?
- 34:55 Why aren't all your Disqus comments indexed in the same way?
- 44:52 Is Google really confusing your local pages with duplicates because of URL patterns?
- 50:51 Should you really use unavailable_after to manage past events on your site?
- 50:51 Why does your massive no-index take 6 months to a year to be processed by Google?
- 55:39 Do flat URLs really hinder Google's understanding?
Redirecting 404 pages to the homepage—even with a 5-second meta-refresh—creates soft 404s that Google will continue to crawl unnecessarily. Users get lost, bots waste crawl budget, and your site sends inconsistent signals. The solution? A proper user-friendly 404 page with a clean HTTP 404 code.
What you need to understand
What is a soft 404 and why does Google detect it?
A soft 404 occurs when the server returns an HTTP 200 (success) code even though the requested resource no longer exists. Google sees an ‘active’ page, but its content resembles an error: often generic, text-poor, and lacking added value.
The engine detects these inconsistencies through heuristic signals: lack of unique content, identical layout to other ‘empty’ pages, and standardized title/meta tags. Result: Google marks the page as soft 404 in Search Console and continues to crawl it regularly to check if it has changed.
Why don’t meta-refreshes resolve anything?
Adding a 5-second delay before redirecting doesn’t change the diagnosis. Google largely ignores meta-refreshes for its indexing—it analyzes the initial content served to the bot, not what happens after a JavaScript timer.
The user lands on a page that doesn’t meet their expectations, waits a few seconds without understanding, then ends up on a homepage unrelated to their initial query. The bounce rate skyrockets, and the UX signal sent to Google is catastrophic.
How does this concretely affect crawl budget?
Every soft 404 remains in the index with an ambiguous status. Google recrawls it regularly to determine whether the page has returned or if it's still a disguised error. On a site with thousands of poorly managed historical URLs, this represents hundreds of wasted crawl requests each week.
A true 404 code is understood immediately: the page is dead, no need to return frequently. Google adjusts its crawl frequency accordingly and concentrates its budget on active resources.
- Soft 404s unnecessarily consume crawl budget by forcing frequent recrawls
- The HTTP 200 code on an empty page creates an inconsistency that Google has to resolve manually
- Meta-refreshes are not considered for indexing—only the initial content counts
- A real 404 page allows Google to quickly de-index and optimize its resources
- User experience severely degrades with redirects to the homepage without context
SEO Expert opinion
Does this recommendation contradict widespread historical practices?
Yes, and that’s precisely where many sites still fail. For years, redirecting 404 → homepage was considered a ‘best practice’ to ‘not lose the visitor.’ Some mainstream CMS platforms even integrated it by default.
However, this logic completely ignores the crawl perspective and the medium-term SEO impact. We optimize for a hypothetical visitor at the expense of clear structural signals for the search engine. Field observations consistently show an inflation of the number of soft 404s in Search Console on these configurations.
In what cases is a redirect from a 404 still acceptable?
There are legitimate exceptions: if a product page is deleted but a direct and relevant alternative exists in the same category, a 301 redirect to that alternative makes sense. The user finds a close answer, and Google understands the substitution.
But the key is contextual relevance. Redirecting /nike-air-max-2018 to /nike-shoes works. Redirecting to the generic homepage, never. [To be verified]: Google has never published a precise quantitative threshold regarding the soft 404/total pages ratio triggering a crawl penalty, but field feedback suggests that beyond 10-15% of soft 404s in Search Console, overall crawl frequency begins to drop.
What is the real value of a well-designed 404 page?
A user-friendly 404 page does not just display ‘page not found.’ It offers a built-in search engine, links to main sections, and even contextual suggestions based on the requested URL. It’s an opportunity to regain engagement rather than a dead end.
From an SEO perspective, it sends a clear signal: the server returns a HTTP 404 code, Google quickly de-indexes without ambiguity, and crawl budget is no longer wasted. Some well-optimized e-commerce sites even show measurable conversion rates from their 404 pages thanks to intelligent design.
Practical impact and recommendations
What should you prioritize checking on your site?
Start by auditing the HTTP codes actually served. Use a crawler like Screaming Frog, Oncrawl, or Botify in ‘URL list’ mode with a sample of old deleted pages. Compare the returned HTTP code (server response header) with what Google sees in Search Console.
Next, check the ‘Coverage’ or ‘Pages’ report in Search Console: look for the ‘Excluded’ section and filter for ‘Soft 404.’ If you find hundreds or thousands of URLs, it’s a red flag. These pages siphon crawl budget for nothing.
How to set up a genuine effective 404 page?
From a technical standpoint, ensure your server returns a HTTP 404 code in the response header—not a 200, not a 302. Test with curl, using browser DevTools (Network tab), or with an online tool like HTTP Status Code Checker.
Content-wise, design a branded 404 page with: a clear message (‘this page no longer exists’), a built-in search engine, links to the main sections of the site, and contextual suggestions based on the URL (e.g., if the URL contains ‘shoes’, suggest the shoes category). Avoid an impersonal tone—some humor or empathy improves UX.
What critical mistakes should absolutely be avoided?
Never use meta-refreshes, nor client-side JavaScript redirects to ‘improve’ a 404. Google crawls the initial HTML and ignores these tricks—you’ll just create more soft 404s.
Second trap: DNS wildcards or server configurations that redirect to the homepage by default for any unknown URL with a 200 code. This is common on some poorly configured shared hosting. Result: thousands of soft 404s generated automatically.
- Audit HTTP codes with a crawler or curl on a sample of deleted URLs
- Check the Search Console ‘Coverage’ report section ‘Soft 404’
- Configure the server to return a true HTTP 404 code on non-existent pages
- Create a user-friendly 404 page with internal search and contextual navigation
- Remove all meta-refresh or JavaScript redirects from 404s
- Regularly test with DevTools and HTTP tools to confirm server codes
❓ Frequently Asked Questions
Un code 410 Gone est-il préférable à un 404 pour les pages définitivement supprimées ?
Les soft 404 peuvent-ils provoquer une pénalité algorithmique ?
Comment gérer les anciennes URLs de produits e-commerce supprimés ?
Faut-il bloquer les 404 dans le robots.txt pour économiser du crawl ?
Combien de temps Google continue-t-il de crawler une page 404 après la première détection ?
🎥 From the same video 21
Other SEO insights extracted from this same Google Search Central video · duration 57 min · published on 23/06/2020
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.