Can Googlebot really stop crawling your site due to server error codes?

Official statement

Googlebot automatically slows down if the site sends 429 or 50x codes, or if response speed significantly decreases. Crawling can even stop completely if these signals persist.

18:51

🎥 Source video

Extracted from a Google Search Central video

⏱ 31:53 💬 EN 📅 09/12/2020 ✂ 16 statements

Watch on YouTube (18:51) →

✂ Other statements from this video 15 ▾

2:49 Pourquoi Google rend-il quasi systématiquement vos pages avant de les indexer ?
3:52 Faut-il abandonner le modèle des deux vagues d'indexation ?
7:35 Google utilise-t-il une sandbox ou une période de lune de miel pour les nouveaux sites ?
8:02 Google devine-t-il vraiment où classer un nouveau site avant même d'avoir des données ?
9:07 Pourquoi les nouveaux sites connaissent-ils des montagnes russes dans les SERP ?
13:59 Faut-il vraiment se préoccuper du crawl budget pour son site ?
15:37 Faut-il vraiment s'inquiéter du crawl budget sous le million d'URLs ?
16:09 Le crawl budget existe-t-il vraiment ou est-ce juste un mythe SEO ?
17:42 Google bride-t-il volontairement son crawl pour ménager vos serveurs ?
20:24 Comment détecter un vrai problème de crawl budget sur votre site ?
21:57 Élaguer le contenu faible améliore-t-il vraiment le crawl budget ?
22:28 Faut-il sacrifier la vitesse serveur pour économiser du crawl budget ?
23:32 Pourquoi vos requêtes API explosent-elles votre crawl budget à votre insu ?
24:36 Le crawl budget : toutes vos URLs comptent-elles vraiment autant que Google l'affirme ?
25:39 Faut-il vraiment s'inquiéter du cache agressif de Googlebot sur vos ressources statiques ?

What you need to understand

What are the back-off signals that Google monitors?

Google uses three main indicators to decide to slow down or stop crawling: HTTP 429 (Too Many Requests) codes, 50x (internal server error) codes, and deteriorating response times. These signals indicate to the bot that the server is struggling.

The 429 code is particularly interesting — it is a code that some site publishers voluntarily send to regulate crawling when they detect too high a load. Google respects it and backs off. The 50x codes, on the other hand, are unintentional errors that reflect a real technical failure.

How does Googlebot decide to slow down or stop?

The decision is based on the persistence of the signal. An isolated spike of 503 during planned maintenance only triggers a temporary slowdown. But if Google detects repeated errors over several hours or days, it interprets this as a structural issue and may suspend crawling to avoid overloading the server.

In practice, Googlebot does not crawl linearly — it adjusts its pace according to what the site can handle. It’s a sort of dynamic regulation that protects both the server and Google's resources.

What happens if crawling stops completely?

A total halt in crawling means that no new pages are discovered, no content updates are taken into account, and existing pages risk stagnating in the index. On a news site or an e-commerce platform with a high turnover of products, this is catastrophic.

The stoppage is never permanent — Google periodically returns to test the site. However, the recovery time can vary from a few hours to several days depending on the severity and recurrence of the errors. During this time, the site remains invisible for any new queries.

429 and 50x codes: trigger a slowdown or stop in crawling if repeated
Response times: increased latency causes Googlebot to back off even without HTTP errors
Signal persistence: it is the duration and frequency of errors that determine the severity of the reaction
Total stoppage: halts discovery, indexing of updates, and may last several days
Recovery: Google periodically retests the site but with no guarantee of timing

SEO Expert opinion

Is this statement consistent with real-world observations?

Absolutely. For years, we’ve observed that sites with fragile infrastructures see their crawl budget collapse after traffic spikes or poorly managed server migrations. Crawl logs clearly show that Googlebot backs off when responses exceed 500-800 ms repeatedly, even without a 50x error.

What is less documented is the exact threshold of degradation that triggers back-off. Gary Illyes does not provide a figure — and this is probably intentional. Google likely adjusts this threshold according to the category of the site: a major news site is likely to have more tolerance than a personal blog. [To be verified] with controlled tests across different types of sites.

What nuances should be added to this rule?

First point: not all 50x codes are equal. A 502 Bad Gateway occurring during a server restart is generally tolerated. A 500 Internal Server Error affecting 20% of crawled URLs for three consecutive days is another story.

Second nuance — and this is critical — sites with high authority and stable history probably benefit from a wider margin of error. Google knows that a site like Le Monde or Amazon isn't going to stay down for three weeks. It waits, tests, returns. A new or lesser-known site won’t have that patience. [To be verified] but consistent with the logic of differentiated crawl budget.

In what cases does this rule not apply completely?

Critical pages such as homepages or main categories may still be crawled as a priority even if the rest of the site returns errors. Google maintains a minimum crawl on strategic URLs to monitor the site's overall availability.

Another case: sites with actively submitted XML sitemaps via Search Console may sometimes trigger targeted recrawls even if the automated crawl is slowed. However, this is not a guarantee — if the server continues to return errors, even the URLs in the sitemap will be temporarily ignored.

Attention: Never rely on manually submitting URLs via Search Console to offset degraded crawling. Google may ignore these requests if the server is unstable.

Practical impact and recommendations

What concrete steps should you take to avoid back-off?

First action: actively monitor server response times and HTTP codes in real-time. Set up alerts for critical thresholds — for example, alert if more than 5% of responses exceed 1 second, or if more than 10 requests per minute return a 50x.

Second lever: configure an intelligent rate limiting that sends 429 codes before the server saturates. It’s better to slow down Googlebot cleanly with a 429 than to allow it to cause cascading 503 errors. Some CDNs and reverse proxies can detect Googlebot and apply specific rules to it.

How can you check if your site is currently experiencing back-off?

Analyze the crawl logs from the last 30 days. Look at the trend of the number of Googlebot requests per day and the error code rate. If you notice a sharp drop in crawl volume without major editorial changes, this is likely a back-off signal.

In Google Search Console, the

❓ Frequently Asked Questions

Un code 429 est-il préférable à un 503 pour réguler le crawl de Googlebot ?

Oui, le 429 indique explicitement que le serveur demande un ralentissement temporaire et que Googlebot doit revenir plus tard. Le 503 est interprété comme une erreur serveur non intentionnelle, ce qui peut déclencher un back-off plus sévère si répété.

Combien de temps faut-il pour que Googlebot reprenne un crawl normal après résolution des erreurs ?

Cela dépend de la durée et de la gravité des erreurs précédentes. En général, Googlebot reteste le site sous 24-48h après stabilisation, mais le volume de crawl peut prendre plusieurs jours à revenir au niveau initial.

Les temps de réponse lents sans erreur HTTP peuvent-ils vraiment stopper le crawl ?

Oui, Google confirme qu'une dégradation significative de la vitesse de réponse déclenche un ralentissement, même sans code d'erreur. Le seuil exact n'est pas communiqué, mais des observations terrain montrent un impact dès 500-800 ms de latence persistante.

Est-ce que soumettre un sitemap XML peut compenser un crawl ralenti par back-off ?

Non, le sitemap XML peut aider à prioriser certaines URLs, mais si le serveur continue de renvoyer des erreurs ou des réponses lentes, Google ignorera même les URLs du sitemap pour éviter de surcharger le serveur.

Les sites avec forte autorité sont-ils exemptés du back-off de Googlebot ?

Non, mais ils bénéficient probablement d'une tolérance plus large et d'un temps de rétablissement plus court. Google sait qu'un site majeur ne reste pas indisponible longtemps, donc il reteste plus fréquemment et avec plus de patience.

🎥 From the same video 15

Other SEO insights extracted from this same Google Search Central video · duration 31 min · published on 09/12/2020

🎥 Watch the full video on YouTube →