What does Google say about SEO? /
Quick SEO Quiz

Test your SEO knowledge in 5 questions

Less than a minute. Find out how much you really know about Google search.

🕒 ~1 min 🎯 5 questions

Official statement

Googlebot tries to be courteous by limiting itself to a certain number of pages each day, automatically adjusting this number based on the recognized capacity of a site.
0:32
🎥 Source video

Extracted from a Google Search Central video

⏱ 1:34 💬 EN 📅 28/02/2018 ✂ 2 statements
Watch on YouTube (0:32) →
Other statements from this video 1
  1. 1:03 Googlebot crawle-t-il vraiment vos pages importantes tous les quelques jours ?
📅
Official statement from (8 years ago)
TL;DR

Google claims that Googlebot voluntarily limits the number of pages crawled daily and adjusts this quota according to the detected capacity of the site. This self-determination of the crawl budget means that slow or unstable servers see their exploration automatically decreased. For SEOs, the challenge becomes twofold: optimizing server speed AND prioritizing truly strategic resources to avoid wasting a precious quota.

What you need to understand

What does Google really mean by 'courtesy' in crawling?

Googlebot does not crash onto your infrastructure like a bulldozer. 'Courtesy' refers to respecting the technical limits of your server to avoid causing crashes, slowdowns, or unavailability. Google continuously scans response times, 5xx errors, timeouts, and adjusts its pace accordingly.

Specifically, if your server takes 2 seconds to respond instead of 200 milliseconds, Googlebot will automatically slow down its pace. This mechanism protects your infrastructure but creates a perverse effect: a subpar server mechanically limits your crawl, thus your potential indexing. A technically handicapped site cannot compensate solely through editorial quality.

How does Googlebot determine this so-called 'capacity'?

Google does not publish a precise algorithm, but several signals come into play. HTTP response times (TTFB) are the primary indicator: a server that responds in under 200ms on 95% of requests signals a higher capacity. 503 error rates (service unavailable) also weigh heavily.

Stability over time matters as much as raw performance. A server that fluctuates between 100ms and 5 seconds depending on the hour confuses Googlebot more than a server that consistently runs at 800ms. Load spikes during the crawl itself also reveal your limits: if your CPU hits 95% as soon as 5 bots arrive simultaneously, Google records this fragility.

Why does this automatic adjustment pose a problem in SEO?

Because it creates a vicious cycle that is hard to break. A slow site gets less crawl, so its new pages take longer to be indexed, therefore traffic doesn’t increase, leaving a limited budget to improve infrastructure. Technical migrations become nightmares: redesigning a site with 50,000 redirected URLs? If Googlebot can only crawl 500 pages a day due to a sluggish server, you will wait 100 days for a full reindexing.

The opacity of the system exacerbates the situation. Google does not communicate a precise quota or a performance threshold to reach. You must interpret signals in Search Console (crawling statistics) and guess whether your problem stems from server speed, tree depth, or content quality. This statement confirms the principle but does not offer any quantifiable lever of action.

  • Googlebot calibrates its pace based on observed HTTP responses (latency, 5xx errors)
  • A slow server mechanically limits the number of pages crawled daily
  • This limit directly impacts the speed of indexing new content and updates
  • The absence of public metrics makes crawl budget optimization partially blind
  • Large technical sites (e-commerce, media) suffer the most from this threshold effect

SEO Expert opinion

Does this statement align with field observations?

Overall yes, with important nuances. Correlations between server speed and crawl frequency have been documented for years in Apache/Nginx logs. A shift from an average TTFB of 1.2s to 300ms systematically leads to an increase in daily crawl within 10-15 days. Search Console data confirms this in 80% of CDN/hosting migrations I have monitored.

But Google is intentionally simplifying. 'Capacity' does not only depend on the server: the size/weight of pages, the quality of HTML code, and the generation time of PHP/Python/Node on the application side all play roles. A powerhouse server generating 5MB pages stuffed with JavaScript will receive less crawl than a modest server serving light HTML. [To verify]: Google does not publicly distinguish between network latency, server latency, and application latency in its adjustments.

What limitations does this automatic logic present?

The system does not always differentiate between intentional slowdown and technical incapacity. If you throttle Googlebot via robots.txt (Crawl-delay) or server rules to protect a fragile database, Google might interpret this as a low capacity rather than a strategic choice. The result: an unintended double penalty.

Another blind spot: highly seasonal sites. An e-commerce site that experiences 10x its usual traffic in November-December sees its servers slow precisely when it launches new product lines. Googlebot detects this degradation and reduces crawl at the worst possible moment. The 'automatic' adjustment lacks business context; it is purely reactive to technical signals.

In what cases does this rule not really apply?

Highly authoritative sites receive differentiated treatment. A national media outlet with millions of daily visitors gets a floor-level crawl budget even if its TTFB temporarily spikes. Google prioritizes editorial freshness for these players. Conversely, a small site can have an ultra-efficient server yet remain capped at 200 URLs/day simply because Google deems its content less important.

Popularity and update frequency count as much as server speed. A blog that publishes daily will be recrawled more often than a static showcase site at equal server performance. The algorithm mixes technical capacity AND editorial relevance without revealing the weights. This statement by Mueller artificially isolates the infrastructure dimension when the crawl budget is multifactorial.

Beware: Do not overestimate the impact of isolated server optimization. A shift from 800ms to 200ms TTFB may only increase crawl by 15-25% if Google judges your content to be non-strategic or your structure poorly organized. Infrastructure is necessary but not sufficient.

Practical impact and recommendations

How can you diagnose if your crawl is constrained by server capacity?

Search Console > Crawl Statistics remains your primary tool. Look at the curve 'Total Crawl Requests': a stagnation plateau despite the regular addition of content often indicates throttling. Compare it with the 'Download Time' graph (in milliseconds): if it consistently exceeds 500ms, you probably have a bottleneck.

Dive into your raw server logs (Nginx/Apache). Calculate the median TTFB and 95th percentile for Googlebot hits over the past 30 days. If P95 exceeds 1 second, Google is certainly penalizing you. Cross-reference with HTTP codes: more than 2% of 5xx errors or timeouts on bot requests? Your infrastructure sends a clear signal of 'limited capacity.'

Which optimizations yield the fastest ROI?

Server caching (Redis, Varnish, CDN) generates the most immediate impact. Caching static or nearly-static pages: you shift from dynamic generation at 800ms to cached response at 50ms. Googlebot detects this change within 48-72 hours and adjusts crawl upwards. I have seen daily crawls double in 10 days after activating Cloudflare APO on WordPress.

The second lever: reducing HTML weight and the number of external requests. A 300KB page with 15 third-party resources (analytics, ads, widgets) slows rendering even if TTFB is good. Googlebot measures the full download time. Aim to keep the initial HTML under 100KB and limit dependencies: the crawl effect emerges in 2-3 weeks.

What must you absolutely avoid doing?

Never throttle Googlebot artificially via Crawl-delay or aggressive rate limiting unless absolutely necessary (dying server). You are encoding a low capacity that Google will remember for a long time. If your infra cannot keep up, solve the problem at its core: upgrade your server, optimize your database, cache your application.

Avoid migrations to a new host without prior testing phase. A new server that is poorly configured (undersized PHP-FPM, unoptimized MySQL) can degrade TTFB despite superior CPU/RAM specs. Google will detect the regression and reduce crawl right when you are planning your redesign. First test on a mirrored subdomain with simulated bot traffic.

  • Monitor Googlebot’s median TTFB < 300ms via server logs
  • Keep 5xx error rates for Googlebot below 0.5%
  • Activate server caching (Varnish/Redis) or CDN with edge caching
  • Reduce strategic page HTML weight below 100KB
  • Avoid Crawl-delay in robots.txt unless in absolute emergency
  • Test new infrastructures on a mirror environment before switching
The automatic adjustment of crawl according to server capacity transforms technical infrastructure into a direct SEO lever. A TTFB below 300ms and an error rate under 0.5% usually unlock crawl potential. These optimizations touch system, application, and network layers simultaneously: precise diagnostics and clean implementation often require an external expert perspective. Engaging a specialized technical SEO agency helps quickly identify bottlenecks (server, database, application code) and prioritize projects based on their real impact on crawl budget, without wasting months on trial and error.

❓ Frequently Asked Questions

Google communique-t-il le crawl budget exact alloué à un site ?
Non, Google ne publie pas de quota chiffré. Search Console affiche le nombre de pages crawlées quotidiennement mais pas la limite théorique. Vous devez déduire le plafond en observant les plateaux dans les statistiques d'exploration.
Un serveur ultra-rapide garantit-il un crawl illimité ?
Non. La vélocité serveur est nécessaire mais pas suffisante. Google ajuste aussi selon la fréquence de mise à jour, l'autorité du domaine et la qualité éditoriale. Un site rapide mais statique ou peu pertinent reste plafonné.
Les erreurs 5xx impactent-elles immédiatement le crawl budget ?
Oui, rapidement. Un taux supérieur à 1-2% d'erreurs 503 ou timeouts déclenche une réduction du crawl sous 24-48h. Google privilégie la stabilité serveur pour éviter de surcharger une infrastructure défaillante.
Peut-on demander à Google d'augmenter manuellement le crawl ?
Officiellement non. L'outil Inspection d'URL permet de soumettre ponctuellement des pages, mais pas de négocier un quota global. Seule l'amélioration technique convaincra Googlebot d'accélérer.
Le passage en HTTPS ou HTTP/2 augmente-t-il le crawl budget ?
Indirectement oui. HTTP/2 réduit la latence des requêtes multiples, améliorant le TTFB perçu. HTTPS est désormais requis pour certaines fonctionnalités (indexation mobile-first optimale), ce qui favorise indirectement un crawl plus complet.
🏷 Related Topics
Domain Age & History Crawl & Indexing AI & SEO

🎥 From the same video 1

Other SEO insights extracted from this same Google Search Central video · duration 1 min · published on 28/02/2018

🎥 Watch the full video on YouTube →

Related statements

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.