Why does it take Google 3 to 6 months to refresh an entire large site?

Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

For large sites, Google cannot crawl everything in one day. The crawl budget balances the discovery of new content and refreshment. A complete site can take 3 to 6 months to be fully refreshed, prioritizing important pages.

🎥 Source video

Extracted from a Google Search Central video

💬 EN 📅 13/11/2020 ✂ 40 statements

Watch on YouTube →

✂ Other statements from this video 39 ▾

📅

Official statement from November 13, 2020 (5 years ago)

⚠ A more recent statement exists on this topic Does Google really refresh your Merchant Center product data multiple times per ... Irina Tuduce · September 5, 2024 View statement →

TL;DR

Google cannot crawl an entire large site in a single day. The crawl budget forces the engine to choose between discovering new content and refreshing existing pages. As a result, a complete site can take 3 to 6 months to be fully re-crawled, with priority given to pages deemed important by the algorithm.

What you need to understand

John Mueller presents a figure that challenges some preconceived notions: a large site can wait 3 to 6 months before Google has fully re-crawled all its pages. This timeframe is not a bug; it is a direct consequence of the crawl budget.

The crawl budget is the allocation of resources that Google dedicates to your site. The larger your site, the more Google has to make choices: re-crawl existing pages or explore new URLs. And that’s where the issue arises.

What does Google mean by "large site"?

Mueller does not provide a specific threshold. Generally, we talk about sites with several tens of thousands of indexable pages. An e-commerce site with 50,000 product listings, a media site with 200,000 articles, a directory with millions of URLs — all are affected.

The sheer volume of URLs is not the only criterion. Crawl depth, quality of internal links, server response time, and the perceived freshness of content also influence how often Googlebot visits.

Why can't Google crawl everything quickly?

Let’s be honest: Google is not going to mobilize infinite servers for your site. Crawling has a cost — bandwidth, computation, storage. Google optimizes its visits based on the site’s popularity, response speed, and the expected freshness of content.

A site that publishes 10 articles a day will get more crawl than a dormant site. A fast site (TTFB < 200 ms) will be crawled more often than a slow site. And a site with a high internal PageRank concentrates crawl on its strategic pages.

How does Google prioritize which pages to crawl?

Mueller refers to "prioritizing important pages". In practice, Google intersects several signals: PageRank (both internal and external), historical detected update frequency, incoming links, and user popularity (CTR, time spent, engagement signals).

A bestseller product page updated every week will be re-crawled more often than a blog article published three years ago and never touched again. It’s an algorithmic optimization — Google seeks to maximize the freshness of its index without wasting resources.

The crawl budget is finite: Google cannot crawl everything in one day, even on a medium-sized site.
Priority goes to important pages: PageRank, freshness, and user popularity direct the crawl.
A complete site can take 3 to 6 months to be fully refreshed — this is normal, not a malfunction.
Server speed matters: a fast TTFB boosts the allocated crawl budget.
New content takes precedence: Google balances discovery and refreshment.

SEO Expert opinion

This statement is consistent with field observations — but remains deliberately vague on several critical points. Mueller does not clarify from how many pages one qualifies as a "large site", nor how Google concretely calculates the "priority" of pages.

The 3 to 6 months referenced align with what is observed on e-commerce sites with 50,000+ pages. However, this figure conceals a more nuanced reality: some pages are re-crawled daily, while others wait several months. The average can be misleading.

Is this statement consistent with observed practices?

Yes. On large sites, there are regularly discrepancies of 2 to 4 months between when a deep page is modified and when it is actually re-crawled. Orphan pages or those with low internal PageRank may wait much longer — or may never be re-crawled if they do not receive any links.

Sites that optimize their internal linking and server speed see a measurable increase in their crawl budget. A site that goes from 800 ms to 150 ms TTFB may see its daily crawl doubled or tripled. This is not trivial.

What nuances should be added?

Mueller talks about "complete refreshment", but Google does not re-crawl all pages with the same depth. Some URLs are simply checked with HTTP 200 without the content being re-analyzed. Others undergo a complete JavaScript rerendering — which is far more resource-intensive. [To be verified]

The figure of 3 to 6 months does not apply to news sites or sites with a high update rate. A media site publishing 50 articles a day enjoys a much more aggressive crawl. Google adjusts its behavior to the detected publishing pace.

When does this rule not apply?

Small sites (< 10,000 pages) are generally re-crawled much faster — often within a few weeks. Sites with high traffic and high user engagement also receive more crawl. And sites using IndexNow can notify Google in real-time of changes, partially bypassing the crawl budget.

Be careful: a slow site (TTFB > 1 s) or one with frequent server errors will see its crawl budget drastically reduced. Google will not pursue a site that is costly in resources. In such cases, the refreshment delay can explode — we have seen sites wait 9 to 12 months for certain pages.

Warning: If you notice an abnormally low crawl despite a good server speed, check for 5xx errors in Search Console and the wasted budget on unnecessary URLs (parameters, facets, duplicates). Inefficient crawling is often a symptom of underlying technical issues.

Practical impact and recommendations

What should you do to optimize the crawl budget?

First lever: server speed. A fast TTFB (< 200 ms) mechanically increases the number of pages that Google can crawl in the same period. Optimize your hosting, enable a CDN, compress responses (Brotli or Gzip), and avoid costly database queries on priority pages.

Second lever: internal linking. Orphan pages or those more than 5 clicks from the homepage are seldom crawled. Reinforce links to your strategic pages, create thematic hubs, and use pagination or filters to make your content quickly accessible.

What mistakes should be avoided to prevent wasting the crawl budget?

Don’t let Google crawl unnecessary URLs: infinite facets, sorting parameters, empty result pages, duplicates. Block them via robots.txt or the noindex tag. Every unnecessary URL crawled is a useful URL that has to wait.

Avoid chain redirections (3xx → 3xx → 200). Each jump consumes budget. Also avoid massive 404 errors — Google eventually reduces its crawl on unstable sites. And watch out for redirect loops: they block Googlebot and kill your budget.

How can I check if my site is being crawled correctly?

Use the "Crawl Stats" section in Search Console. Look at the number of pages crawled per day, the average download time, and crawling errors. A sudden drop in crawl signals a technical issue or a loss of priority.

Cross-check with your server logs: you will see which pages Google is actually crawling, how often, and how much budget it allocates to unnecessary URLs. Tools like Oncrawl or Botify can cross-reference logs and Search Console for precise diagnostics.

Optimize TTFB (< 200 ms) to maximize crawl per session
Strengthen internal linking to strategic pages
Block unnecessary URLs (facets, duplicates, parameters) via robots.txt
Monitor Crawl Stats in Search Console
Analyze server logs to detect crawl budget wastage
Use IndexNow to notify Google of changes in real-time

The crawl budget is a real constraint on large sites. Optimizing it requires a combination of technical performance (server speed, architecture), content strategy (prioritization, freshness), and continuous monitoring (logs, Search Console).

These optimizations can be complex to orchestrate alone, especially on heavy technical platforms. If your site exceeds 20,000 pages or if you notice an abnormally low crawl, specialized support can accelerate results — a technical SEO agency can audit your architecture, identify budget leaks, and establish a tailored optimization plan.

❓ Frequently Asked Questions

Combien de pages Google peut-il crawler par jour sur mon site ?

Cela dépend de la vitesse serveur, de la popularité du site et de la fraîcheur du contenu. Un site moyen reçoit entre 500 et 5000 requêtes Googlebot par jour. Un site rapide (TTFB < 200 ms) peut monter à 10 000+.

Comment savoir si mon site manque de crawl budget ?

Regarde les Statistiques sur l'exploration dans Search Console. Si le nombre de pages crawlées par jour est inférieur à 10% de ton total indexable, ou si des pages stratégiques ne sont pas recrawlées depuis 2+ mois, c'est un signal.

Les sitemaps XML augmentent-ils le crawl budget ?

Non, ils aident Google à découvrir des URL mais n'augmentent pas le budget alloué. Un sitemap mal construit (URL inutiles, erreurs 404) peut même gaspiller du budget. Garde-le propre et limité aux pages indexables stratégiques.

Faut-il utiliser IndexNow pour contourner le crawl budget ?

IndexNow notifie Google (et Bing) des modifications en temps réel, ce qui peut accélérer le recrawl des pages modifiées. C'est un complément utile, pas un remplacement du crawl classique. À tester si tu publies souvent.

Un CDN améliore-t-il vraiment le crawl budget ?

Oui, si le CDN réduit le TTFB. Google crawle plus de pages par session quand le serveur répond vite. Mais attention : certains CDN mal configurés peuvent dégrader le TTFB au lieu de l'améliorer. Mesure avant et après.

🏷 Related Topics

crawl budget indexation googlebot maillage interne TTFB logs serveur PageRank robots.txt

Domain Age & History Content Crawl & Indexing

🎥 From the same video 39

Other SEO insights extracted from this same Google Search Central video · published on 13/11/2020

🎥 Watch the full video on YouTube →

Related statements

« Previous

Core Web Vitals will become a ranking factor in Ma...

301 Redirects and site: Queries Retain History...

« Back to results