Can 500 errors really destroy your crawl budget?

Quick SEO Quiz

Test your SEO knowledge in 5 questions

Less than a minute. Find out how much you really know about Google search.

🕒 ~1 min 🎯 5 questions

Official statement

Repeated 500 errors signal a potential problem with the server. To prevent overloading the server, Google might reduce the crawl speed.

55:10

🎥 Source video

Extracted from a Google Search Central video

⏱ 53:12 💬 EN 📅 14/06/2018 ✂ 10 statements

Watch on YouTube (55:10) →

✂ Other statements from this video 9 ▾

4:26 Comment rediriger une page réorganisée en plusieurs nouvelles URLs sans perdre son PageRank ?
5:43 Les liens en texte brut transmettent-ils vraiment du PageRank ?
8:22 Faut-il vraiment limiter le nombre de versions hreflang pour concentrer les signaux SEO ?
18:53 Une balise noindex finit-elle par tuer définitivement vos liens ?
29:01 Faut-il vraiment exclure toutes les pages de résultats de recherche interne de l'indexation ?
34:04 Faut-il inverser les balises canonical avec le mobile-first indexing ?
37:00 Faut-il vraiment s'inquiéter des erreurs 404 sur votre site ?
42:42 Pourquoi vos positions fluctuent-elles même sans mise à jour algorithm confirmée ?
48:49 Les balises alt servent-elles vraiment au référencement web classique ?

📅

Official statement from June 14, 2018 (7 years ago)

⚠ A more recent statement exists on this topic Does Google Merchant Center crawling count against your SEO crawl budget? John Mueller · April 30, 2024 View statement →

TL;DR

Google automatically slows down its crawl when a site repeatedly generates 500 errors to avoid overloading a seemingly failing server. This means that a technical issue on the server side can quickly degrade your indexing, even if your content is excellent. The real issue is the vague definition of 'repeated': how many errors, over what period, and with what tolerance based on the size of the site?

What you need to understand

What does 'repeated 500 errors' really mean for Google?

Google does not crawl your site with infinite kindness. Every Googlebot request consumes server resources: CPU, RAM, bandwidth. When the bot encounters 500 errors (Internal Server Error), it interprets this as a signal of a struggling server.

The term 'repeated' remains deliberately vague. No official threshold is communicated. Based on field experience, a pattern of systematic failure on a section of the site (10-15% of 500 errors in a day, for example) is enough to trigger throttling. A one-time incident lasting 5 minutes is not a problem. It is the recurrence that activates the protection mechanism.

How does Google actually adjust the crawl rate?

The mechanism is gradual. Google does not suddenly cut the crawl to zero. It starts by spacing out the requests, then reduces the number of parallel threads. If errors persist, the time between visits can stretch from a few seconds to several minutes or even hours.

This adjustment occurs by section of the site, not globally. If your internal search module generates 500 errors, Google may only slow down on those URLs while maintaining normal crawling on your product pages. The bot is smarter than we think: it maps out problematic areas.

Why does Google take this cautious approach?

The answer is simple: responsibility. Google crawls billions of pages every day. Overloading an already fragile server could lead to a complete crash, affecting human visitors. This is a reputational and technical risk that Google refuses to take.

Moreover, crawling an unstable server generates unreliable indexing data. It is better to slow down and obtain clean data than to force through and index partial, corrupted, or outdated content. This logic prioritizes the quality of the index over the quantity of pages crawled.

Failure pattern: Google analyzes the error/success ratio over a sliding time window, likely 24-72 hours
Granular adjustment: Throttling applies by section/type of URL, not necessarily site-wide
Recovery time: Once errors are resolved, the normal crawl rate may take 3-7 days to fully recover
Indirect quality signal: Frequent 500 errors suggest an undersized infrastructure, which can affect overall user experience
Impact on freshness: Less crawling = increased delay between publication and indexing, critical for news or e-commerce pricing

SEO Expert opinion

Does this statement truly reflect observed behavior in the field?

Yes, and it’s one of the rare instances where Google communicates a mechanism that can be easily verified in logs. Apache/Nginx log analyses clearly show a correlation between spikes in 500 errors and a drop in the number of Googlebot requests in the following 24-48 hours. This is not theory; it's measurable.

The problem is the lack of transparency regarding thresholds. 'Repeated' can mean 5 errors for a small site of 100 pages, or 500 for a giant with 10 million URLs. Google likely adapts its tolerance based on the crawl budget allocated to the site, which in turn is based on its popularity, authority, and update frequency. This opacity makes diagnosis difficult: it’s hard to know if you are just above the threshold or far below it.

What nuances should we consider regarding this rule?

First nuance: not all 500 errors are equal. A 30-second timeout followed by a 500 can be perceived differently from an instant 500. The bot also analyzes the response time before the error. A server that crashes after 10 seconds signals an overload, while an instant 500 may indicate a misconfigured application.

Second nuance: the context of the site matters significantly. A news site publishing 200 articles a day requires aggressive crawling. A 500 error that slows down this crawl directly impacts ranking on fresh queries. A corporate site that is static and updated once a month can absorb a reduction in crawl without visible consequences. The urgency of response thus depends on your publishing model.

What should you do if Google does not specify exact thresholds?

This is where it gets tricky. The absence of official metrics forces us to infer thresholds through empirical observation [To be verified]. The standard recommendation is to aim for a 5xx error rate below 0.5% of total crawled requests. However, this figure has never been validated by Google; it is a professional convention.

Another annoying point: no indication on the duration of penalties. After fixing errors, how long before the crawl rate returns to normal? Field observations suggest 3 to 10 days, but this can vary significantly. A site with a history of stability recovers faster than a chronically unstable one. Google seems to apply some form of 'infrastructure trust score', which is never documented.

Watch out for false positives: Some CMS or WAF return 500 errors on URLs like ?s= or /wp-json/ crawled by third-party bots. If Googlebot crawls them too, you may incur throttling while your actual pages function perfectly. Filter your logs to isolate true failures from configuration artifacts.

Practical impact and recommendations

How can I identify if my 500 errors are already impacting my crawl?

Start by cross-referencing Search Console and your server logs. In Search Console, go to 'Settings' > 'Crawl statistics' and look at the evolution of the total crawl requests and server response rate. A downward graph correlated with an increase in server errors confirms the diagnosis.

On the logs side, extract all Googlebot requests with a 500 code. Analyze the temporal distribution: errors grouped over 2-3 hours suggest a one-time incident, while errors spread over several days indicate a structural problem. Use tools like GoAccess, AWStats, or a homemade Python script to automate this analysis. If you identify recurring patterns (the same URLs always at the same time), it’s a debugging lead.

What urgent actions should be taken to limit damage?

First priority: identify the source of 500 errors and fix it. Obvious, but too often neglected in favor of workarounds. Common causes include: saturated database, poorly configured PHP/Python timeout, unanticipated load spike, Redis/Memcached cache issue, or a poorly optimized SQL query blocking the application.

If fixing it takes time, temporarily add these problematic URLs to robots.txt as Disallow. This prevents Googlebot from crawling these failing sections while leaving the rest accessible. Be careful: this solution is a band-aid, not a cure. URLs in Disallow gradually drop out of the index if they were already there. Use it only on non-critical sections (filters, internal search, deep pagination pages).

How to prevent this problem in the long run?

Set up proactive monitoring of returned HTTP codes. Tools like UptimeRobot, Pingdom, or custom solutions via Prometheus/Grafana can alert you as soon as a 5xx error threshold is crossed. Configure differentiated alerts: warning at 1% errors over 1 hour, critical at 5% over 30 minutes.

Then, audit your infrastructure to identify bottlenecks. 500 errors are rarely related to application code alone: insufficient RAM, undersized PHP/Gunicorn workers, limited DB connections, lack of CDN to absorb spikes. A load test with Apache Bench or Locust simulates aggressive crawling and reveals weaknesses before Googlebot discovers them.

Enable detailed logs (error.log PHP/Apache + slow query log MySQL) to diagnose root causes
Set up real-time monitoring of HTTP codes with alert thresholds (>0.5% 5xx errors = warning)
Implement a CDN with an origin shield to absorb crawling load variations
Optimize slow DB queries (>1s) that generate application timeouts
Size application workers (PHP-FPM, Gunicorn, Puma) based on observed crawl rate, not just user traffic
Test resilience with an automated weekly load test simulating 10x the normal crawl rate

500 errors trigger a negative spiral: less crawling = slowed indexing = loss of visibility on fresh content. The solution is twofold: immediate responsiveness to incidents (monitoring + alerts) and structural prevention through appropriately sized infrastructure. These optimizations, especially on sites with a high volume of pages or fast editorial velocity, require a cross-disciplinary expertise in technical SEO and DevOps. If your internal team lacks resources or specific skills on these topics, working with an SEO agency specialized in infrastructure can significantly accelerate resolution and secure your crawl budget in the long term.

❓ Frequently Asked Questions

Combien d'erreurs 500 faut-il pour déclencher une baisse de crawl ?

Google ne communique pas de seuil précis. L'algorithme analyse probablement le ratio erreurs/requêtes et la fréquence. Un site de 10 pages avec 2 erreurs 500 sera plus impacté qu'un site de 100 000 pages avec 50 erreurs ponctuelles.

Les erreurs 500 intermittentes sont-elles aussi pénalisantes que les erreurs permanentes ?

Non. Google distingue les erreurs temporaires des défaillances structurelles. Une erreur 500 ponctuelle lors d'un redémarrage serveur ne déclenche pas de throttling. Le pattern d'échec compte plus que l'incident isolé.

Est-ce qu'une réduction du crawl rate impacte directement le classement ?

Indirectement oui. Moins de crawl = moins de pages découvertes/mises à jour = dégradation potentielle de la fraîcheur de l'index. Sur un site d'actualité ou e-commerce, cela peut affecter le ranking de pages time-sensitive.

Comment savoir si Google a réduit mon crawl à cause d'erreurs 500 ?

Search Console affiche les statistiques d'exploration avec les erreurs serveur. Croise ces données avec vos logs serveur. Une chute brutale du nombre de requêtes Googlebot corrélée à un pic d'erreurs 500 confirme le throttling.

Faut-il retourner un 503 plutôt qu'un 500 pendant une maintenance ?

Absolument. Le 503 (Service Unavailable) avec un en-tête Retry-After signale une indisponibilité temporaire planifiée. Google ajuste son crawl sans pénalité, contrairement au 500 qui indique une défaillance non contrôlée.

🏷 Related Topics

crawl budget erreurs serveur crawl rate indexation googlebot monitoring serveur logs Apache budget exploration

Crawl & Indexing AI & SEO Web Performance

🎥 From the same video 9

Other SEO insights extracted from this same Google Search Central video · duration 53 min · published on 14/06/2018

🎥 Watch the full video on YouTube →

Related statements

« Previous

The Utility and Impact of Alt Tags for Images...

Long-Term Effects of Noindex Tags...

« Back to results