How does Google truly calculate the crawl budget for your site?

Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

The crawl budget depends on two main factors: 1) Google's need (overall site quality, actual frequency of content changes) which determines how much Google wants to crawl, and 2) the server's capacity (response time, server errors) which determines how much Google can crawl without causing issues. Google automatically adjusts between these two limits.

45:35

🎥 Source video

Extracted from a Google Search Central video

⏱ 55:06 💬 EN 📅 14/08/2020 ✂ 17 statements

Watch on YouTube (45:35) →

✂ Other statements from this video 16 ▾

📅

Official statement from August 14, 2020 (5 years ago)

⚠ A more recent statement exists on this topic Does Google Merchant Center crawling count against your SEO crawl budget? John Mueller · April 30, 2024 View statement →

TL;DR

Google determines the crawl budget based on two axes: what it wants to crawl (site quality, actual update frequency) and what it can crawl (server performance, error rates). The algorithm automatically adjusts the crawling frequency between these two limits. In practice, a site with high-quality content that is frequently updated but has a slow server will be crawled less often than an average site with a fast infrastructure.

What you need to understand

What does Google mean by "the need" to crawl a website?

The need for Google is based on two pillars: overall site quality and the actual frequency of content changes. Google doesn't say "I'll visit every X days"; it assesses whether it's worthwhile to return.

The overall quality encompasses content relevance, user experience, page freshness, and likely authority signals. A site with thin, duplicate, or outdated content will see its crawl need decrease drastically. Conversely, a site that regularly publishes original, engaging content justifies frequent visits.

The actual frequency of change is crucial — and this is where many get it wrong. Google doesn't rely on your XML sitemap declaring "lastmod" every day if the content remains the same. It detects real changes: new articles, substantial updates, not just a modified date in the footer.

Why does server capacity limit crawling?

Google doesn't want to break your infrastructure. The response time and server error rates (especially 5xx errors) act as regulators. If your server lags at 3 seconds per page, Googlebot slows down its pace to avoid overload.

This isn't altruism: a server that crashes during a crawl session forces Google to restart, so it optimizes its own resources. Repeated 503 errors are a major alarm signal that triggers an immediate reduction in crawl budget.

Sites on low-end shared servers or with poorly optimized CMS (heavy DB queries, lack of caching) are structurally disadvantaged. It's not about editorial intent; it's technical.

How does Google mediate between need and capacity?

The adjustment is automatic and dynamic. Google doesn't set a fixed quota of 1000 pages/day — it tests, observes, adapts. If your server responds well, it speeds up. If it detects latency, it slows down.

This logic explains why two similarly sized sites can have radically different crawl budgets. A news medium with 50 new articles/day and AWS infrastructure gets preferential treatment. A corporate site with 200 static pages unchanged for 6 months will be crawled out of courtesy, even if the server is ultra-fast.

The crawl budget is not an entitlement: it must be earned through quality and negotiated with technical performance.
Google optimizes its own costs: crawling is resource-intensive, and it only does so if it's profitable for them.
The two factors are inseparable: a perfect site on a poor server will be under-crawled and vice versa.
The adjustment is continuous: Google does not set your budget once and for all; it continually reevaluates it.
The actual frequency of change takes precedence over declarations: lying in your sitemap is pointless.

SEO Expert opinion

Is this statement consistent with real-world observations?

Overall, yes. Server log audits confirm that a site with many 5xx errors sees its crawl drop in just a few days. Similarly, publishing fresh, quality content mechanically increases Googlebot's crawl rate.

However, Google remains deliberately vague about thresholds. At what response time does crawling decrease? What is the relative weight between quality and performance? We don't know. [To be verified]: the exact impact of a shift from 500ms to 1s on crawl budget is undocumented.

One annoying point: Google talks about "overall quality" without defining the precise metrics. Is it the bounce rate? Session duration? CTR in the SERPs? Probably a mix, but the opacity remains total.

What nuances need to be added to this logic?

The statement implies that Google crawls what "deserves" to be crawled. However, this logic creates a self-reinforcing bias: a site that is crawled less quickly indexes its new content slower, thus generating less traffic, being perceived as less qualitative, and crawled even less.

Large sites with millions of pages have a specific problem: even with a generous crawl budget, some deep sections are never visited. Internal linking then becomes critical — it's your only real leeway to prioritize what needs to be prioritized.

Another nuance: Google says it "automatically adjusts," but does not specify the reaction latency. If you fix your server errors today, the crawl does not bounce back instantly — it can take several days or even weeks for the algorithm to validate that it's sustainable.

When does this rule not really apply?

Sites of very high authority (Wikipedia, Amazon, government sites) likely receive special treatment. Their pages are crawled almost in real-time, even if the content changes little. The "need × capacity" rule matters less for them.

News sites indexed via Google News have a dedicated circuit: crawling is actively triggered as soon as they submit a new article, regardless of the classic crawl budget. This is not the same queue.

Domain migrations or technical overhauls can temporarily disrupt these automatons. Google must relearn the site — during this phase, the crawl budget can be erratic, with unexplained peaks and troughs. Patience is required.

Practical impact and recommendations

What should you prioritize optimizing to improve your crawl budget?

The server performance is the quick win. If your pages respond in 200-300ms instead of 1-2s, you mechanically unlock more capacity. Invest in good hosting, enable server caching, optimize DB queries — it's immediate ROI.

Next, eliminate server errors. A 5xx error rate above 1% is toxic. Monitor your logs, set up alerts, and fix them quickly — each error is a URL that Google won't crawl, which could have been useful.

Finally, focus on the true quality of the content. There's no point in publishing mediocre content every day to "show activity." Google detects it and penalizes it. Better to have 2 solid articles a month than 30 thin articles.

What common mistakes hinder crawl budget?

Duplicate content is a drain: if Google crawls 10 variations of the same page (poorly managed canonical URLs, filter facets, session IDs), it wastes its budget on nothing. The same impact occurs with poorly structured pagination pages or meaningless blog archives.

Redirection chains (A → B → C → D) are costly in crawling. Each jump counts as a distinct request. The same goes for temporary redirects 302/307 instead of permanent 301s: Google rechecks at every visit.

Leaving obsolete or useless pages crawlable (old promotions, expired content, test pages) dilutes your budget. If they have no SEO value anymore, 404 or noindex — free up space for what matters.

How to verify that your infrastructure makes the best use of crawling?

Analyze your server logs with a tool like Screaming Frog Log File Analyzer or OnCrawl. Identify which sections are over-crawled (often low-value) and which are ignored (sometimes your best content). Adjust your internal linking accordingly.

Check the coverage report in Google Search Console: a mass of "Discovered - currently not indexed" pages signals insufficient crawl budget to index everything. Prioritize via the sitemap and internal links.

Test the response time of your server under load with a tool like Loader.io or Apache Bench. If your response time skyrockets at 50 requests/s, that's exactly what Googlebot will trigger — and what will limit your crawl.

Migrate to high-performance hosting (scalable cloud or dedicated server)
Implement a caching system (Varnish, Redis, CDN like Cloudflare)
Clean up unnecessary pages: 404 or noindex on expired content
Correct all 5xx errors detected in the logs
Simplify redirections: always in 301, never in a chain
Structure internal linking to push strategic pages
Declare only important pages in the XML sitemap

The crawl budget is not fatalistic: you can directly influence it through technique and editorial decisions. That said, finely optimizing the interaction between server performance, information architecture, and content quality requires cross-disciplinary expertise. If log analysis, server tuning, or strategic prioritization of sections seems complex to manage internally, consulting a specialized SEO agency can provide a precise diagnosis and an action plan suited to your context.

❓ Frequently Asked Questions

Le crawl budget est-il un problème pour les petits sites ?

Non, pour un site de moins de 10 000 pages avec une infrastructure correcte, le crawl budget n'est généralement pas un facteur limitant. Google crawle largement assez pour indexer tout ce qui mérite de l'être.

Un sitemap XML augmente-t-il le crawl budget ?

Pas directement. Le sitemap aide Google à découvrir les URLs, mais il n'augmente pas le volume total de crawl. Il permet surtout de prioriser les pages importantes et de signaler les mises à jour.

Les erreurs 404 consomment-elles du crawl budget ?

Oui, si Google continue de crawler des URLs qui renvoient 404, il gaspille du budget. Mieux vaut retourner 410 (Gone) pour les pages définitivement supprimées, ou corriger les liens internes pointant vers ces 404.

Peut-on forcer Google à augmenter le crawl budget ?

Non, on ne peut pas le forcer. Par contre, on peut l'influencer en améliorant la performance serveur, en publiant du contenu de qualité régulièrement, et en nettoyant les sections inutiles. Google ajuste ensuite automatiquement.

La Search Console affiche-t-elle le crawl budget de mon site ?

Pas explicitement, mais le rapport "Statistiques sur l'exploration" montre le nombre de pages crawlées par jour, les Ko téléchargés et le temps de réponse moyen. C'est un proxy utile pour évaluer l'évolution de votre crawl budget.

🏷 Related Topics

crawl budget googlebot indexation performance serveur logs serveur erreurs 5xx sitemap XML maillage interne

Content Crawl & Indexing

🎥 From the same video 16

Other SEO insights extracted from this same Google Search Central video · duration 55 min · published on 14/08/2020

🎥 Watch the full video on YouTube →

Related statements

« Previous

Frequent outbound links to the same domain...

Recipe Images: Formats and Automatic Cropping...

« Back to results