Why does sending a HTTP 200 on your errors sabotage your crawl budget?

Official statement

HTTP status codes help Googlebot and browsers determine how to handle a response. In single-page apps, the server no longer directly handles errors, but it is crucial to return the correct HTTP status codes rather than an HTTP 200 for all situations, including errors.

4:02

🎥 Source video

Extracted from a Google Search Central video

⏱ 5:53 💬 EN 📅 14/10/2020 ✂ 8 statements

Watch on YouTube (4:02) →

✂ Other statements from this video 7 ▾

□ JavaScript peut-il vraiment contrôler l'intégralité du cycle de vie d'une Single Page App pour le SEO ?
2:05 Pourquoi Googlebot refuse-t-il la géolocalisation et comment éviter les erreurs d'indexation liées aux chemins de code ?
2:38 Pourquoi Googlebot rate-t-il systématiquement vos pages si l'URL ne change pas ?
2:38 Comment rendre une single-page app crawlable par Google sans perdre son indexation ?
3:09 Pourquoi Google insiste-t-il sur des titres et meta descriptions uniques pour chaque vue ?
4:47 Comment gérer correctement les codes HTTP d'erreur dans une single-page app ?
4:47 Les redirections JavaScript vers des pages d'erreur déclenchent-elles réellement un signal d'erreur pour Googlebot ?

What you need to understand

Why are HTTP status codes still critical in modern architecture?

The rise of single-page applications (SPA) has shifted routing logic from the server to the browser. Historically, an Apache or Nginx server would automatically return a 404 Not Found if a URL did not exist. With React, Vue, or Angular, the server often serves a single index.html file that always responds HTTP 200, leaving it to JavaScript to display "Page not found."

Googlebot interprets this 200 as a success signal. It indexes the page, attempts to crawl it again, and consumes your crawl budget on content that does not exist. The bot only sees the initial HTML, not the JavaScript-rendered error — unless you have implemented SSR (server-side rendering) or pre-rendering that generates the correct status code right from the server.

What are the real consequences of a bad status code?

A soft 404 (error page returning 200) leads to several cumulative problems. Googlebot repeatedly crawls dead URLs instead of exploring your new strategic pages. The Search Console alerts you about "indexed but not found" pages, but without an automatic mechanism to quickly remove them from the index.

The second impact affects the perceived quality of the site. Google detects that your pages have an abnormal bounce rate or empty content and adjusts its overall rating. Users click, encounter an error masked as 200, and leave — which degrades your behavioral signals. A clean 404, on the other hand, is understood by all parties: bot, browser, CDN, analytics.

How to manage status codes in a SPA without SSR?

If your stack does not allow for SSR (technical constraints, legacy, budget), you need to implement a server-side middleware that inspects the URL before serving the HTML. A simple Express.js router or a Cloudflare edge worker can check if the route exists in your sitemap or database and return 404 or 410 Gone accordingly.

Alternatively, use a pre-rendering service (Prerender.io, Rendertron) that generates HTML snapshots with the correct status codes specifically for Googlebot. This hybrid solution preserves the SPA experience for humans while serving correctly configured static HTML to bots. However, be cautious: Google considers cloaking between users and bots a violation if the content differs, but not if only the status codes and rendering differ.

Soft 404: error page returning HTTP 200, endlessly crawled by Googlebot
Crawl budget: the number of pages Google allows to crawl per day on your domain, limited and precious
SSR / Pre-rendering: techniques that generate HTML on the server side with the correct status codes before sending to the client
Server middleware: software layer that intercepts requests to inject the appropriate status codes before SPA rendering
Edge workers: JavaScript functions executed at the CDN level (Cloudflare, Fastly) to manipulate HTTP responses on the fly

SEO Expert opinion

Does this statement align with real-world observations?

Absolutely, and it’s one of the few points where Google's official doctrine perfectly matches reality. Technical audits regularly reveal dozens, if not hundreds, of soft 404 errors on poorly configured SPA sites. The Search Console flags them as "Excluded: Page not found (404)" while the server returns 200 — evidence that Google detects the inconsistency but does not automatically correct it.

Tests with tools like Screaming Frog or OnCrawl confirm that Googlebot recrawls these URLs multiple times a week, artificially inflating server logs and consuming crawl budget that is missing elsewhere. On a site with 50,000 pages and 10% soft 404s, that represents 5,000 phantom pages unnecessarily capturing budget.

What nuances should be considered regarding this recommendation?

Martin Splitt does not clarify a crucial edge case: empty result pages. Should a product search page with 0 results or a temporarily exhausted category return 404 or 200? The answer depends on your strategy. If the category will be back in stock within 48 hours, a 200 with alternative content (similar products, newsletter) preserves indexing. If the category is permanently empty, a 410 Gone is more appropriate than a 404.

Another nuance concerns 500/503 errors in SPAs. A client-side JavaScript crash should not return 200, but the server may not necessarily know this. Active monitoring (Sentry, LogRocket) should detect these crashes and signal the server to temporarily return 503 Service Unavailable to avoid severe deindexing. [To verify]: Google has never clarified the tolerance threshold before deindexing for a recurring 500 — observations suggest 7 consecutive days without official confirmation.

In what cases does this rule not strictly apply?

Offline-first PWAs represent a unique case. A Progressive Web App that operates offline can legitimately return 200 with cached content even if the server is unreachable. Googlebot does not crawl offline, so this situation does not arise for it — but it creates a theoretical inconsistency that Google currently tolerates.

Paywalls and restricted content pose another dilemma. Should a premium article inaccessible without a subscription return 401 Unauthorized, 403 Forbidden, or 200 with truncated content? Google recommends 200 + structured data Paywall, as a 401/403 prevents indexing. This is the explicit exception to the rule "always return the actual status code," and it deserves to be documented in your SEO strategy.

Practical impact and recommendations

What should you prioritize auditing on your SPA site?

Start with a complete crawl using Screaming Frog with "JavaScript rendering" mode enabled. Compare the HTTP status codes returned by the server (before JS rendering) with the final displayed content. Any page showing "404" or "Page not found" in the DOM but returning 200 OK in HTTP is a soft 404 that needs immediate correction.

Cross-reference this data with the Search Console, section "Coverage" → "Excluded". URLs marked as "Not Found (404)" while your server responds 200 are alarm signals. Export the list and check if they are pages that have actually been deleted (in which case the code needs to be corrected) or false positives detected by Google's semantic analysis (in which case the content needs enhancement).

What technical modifications should be implemented concretely?

If you are using Next.js or Nuxt.js, the solution is native: getServerSideProps() or asyncData() can be used to define the HTTP status code on the server side before rendering. For example, in Next.js: res.statusCode = 404 in getServerSideProps for a not found page. The framework manages the rest automatically.

For a pure SPA (React, Vue, Angular without SSR), add an Express.js middleware or a Cloudflare edge worker that inspects the URL before serving index.html. Create a whitelist of valid routes (from your client-side router or an API) and return 404 for everything else. For example, nginx: map $request_uri $status_code then return $status_code based on a regex of known routes.

How to verify that your implementation works correctly?

Use curl -I https://yoursite.com/nonexistent-page in the command line to check the raw status code without JavaScript. If you get HTTP/1.1 200 OK instead of 404, the problem persists. Also test with the URL inspection tool in the Search Console: it shows the captured HTTP code by Googlebot, separate from the final rendering.

Set up an automated alert in your server logs (Cloudflare Analytics, AWS CloudWatch, Datadog) to detect an abnormal ratio of 200 on URLs containing /404, /error, /not-found. A sudden spike often indicates a regression after deployment. Also monitor the soft 404 rate in the Search Console: it should trend towards 0% after corrections.

These technical optimizations, although critical for the SEO health of your SPA, require sharp expertise in web architecture and ongoing rigorous monitoring. If your team lacks resources or you suspect blind spots in your configuration, an audit conducted by a specialized SEO agency can identify invisible weaknesses and secure your crawl budget long-term.

Crawl the site in JavaScript rendering mode and extract all HTTP codes vs DOM content
Check in Search Console section "Coverage" for excluded pages due to soft 404
Implement a server middleware or edge worker that returns 404/410 for invalid routes
Test with curl -I and Google’s URL inspection tool the raw HTTP codes before JS rendering
Set up automatic alerts for abnormal 200 ratios in server logs
Document in a playbook the valid routes and the expected status code mapping

HTTP status codes are not a relic of static web — they remain the universal language between servers, bots, and browsers. In a SPA, you must actively manage them on the server-side or through pre-rendering, otherwise Googlebot will index your errors as valid content. A clear 404 is better than an ambiguous 200: it frees crawl budget, cleans up the index, and avoids false quality signals. Treat each status code as an explicit instruction given to Google about the true nature of your content.

❓ Frequently Asked Questions

Un soft 404 peut-il pénaliser le ranking de mes pages valides ?

Indirectement, oui. Les soft 404 ne déclenchent pas de pénalité manuelle, mais ils dégradent les signaux qualité globaux du site (taux de rebond élevé, faible engagement) et consomment du crawl budget qui manque pour découvrir ou mettre à jour vos vraies pages. Un site avec 20% de soft 404 crawlés quotidiennement perd mécaniquement 20% de capacité de crawl.

Faut-il utiliser un 404 ou un 410 pour une page définitivement supprimée ?

Le 410 Gone signale explicitement que la ressource ne reviendra jamais, ce qui accélère la désindexation (Google le retire généralement sous 24-48h au lieu de plusieurs semaines pour un 404). Utilisez 410 pour les produits abandonnés, catégories fusionnées, contenus obsolètes. Réservez le 404 pour les erreurs temporaires ou les URLs jamais créées.

Peut-on corriger un soft 404 uniquement avec une balise meta robots noindex ?

Non, c'est une erreur fréquente. Un noindex empêche l'indexation mais ne corrige pas le code HTTP 200, donc Googlebot continuera de crawler la page. Vous gaspillez toujours du crawl budget. La seule solution propre est de renvoyer le bon code d'état HTTP au niveau serveur, avant même que le HTML soit parsé.

Les erreurs JavaScript côté client affectent-elles les codes d'état HTTP vus par Google ?

Oui et c'est vicieux : si votre SPA crashe en JavaScript après que le serveur a renvoyé 200, Googlebot voit un 200 mais un contenu vide ou cassé. Il détectera potentiellement un soft 404 par analyse sémantique, mais c'est incertain. Mettez en place un error boundary React/Vue qui communique au serveur (via API) pour ajuster le code d'état en temps réel, ou utilisez un pre-rendering qui teste le rendu avant de servir.

Un CDN peut-il interférer avec les codes d'état HTTP renvoyés au bot ?

Absolument. Cloudflare, Fastly ou Akamai cachent les réponses HTTP y compris les codes d'état. Si votre origin renvoie 404 mais que le CDN a mis en cache un 200 précédent, Googlebot verra le 200 jusqu'à expiration du cache. Configurez des TTL courts (max 5 min) pour les codes d'erreur, ou utilisez un edge worker pour forcer la réévaluation du code d'état à chaque requête bot.

🎥 From the same video 7

Other SEO insights extracted from this same Google Search Central video · duration 5 min · published on 14/10/2020

🎥 Watch the full video on YouTube →