Does Google really mean it when they say every URL counts toward your crawl budget?

Official statement

Every crawled URL counts against the crawl budget: alternate language versions, CSS files, images. Even 170 language variations of a page all consume budget; they are not exempt.

24:36

🎥 Source video

Extracted from a Google Search Central video

⏱ 31:53 💬 EN 📅 09/12/2020 ✂ 16 statements

Watch on YouTube (24:36) →

✂ Other statements from this video 15 ▾

2:49 Pourquoi Google rend-il quasi systématiquement vos pages avant de les indexer ?
3:52 Faut-il abandonner le modèle des deux vagues d'indexation ?
7:35 Google utilise-t-il une sandbox ou une période de lune de miel pour les nouveaux sites ?
8:02 Google devine-t-il vraiment où classer un nouveau site avant même d'avoir des données ?
9:07 Pourquoi les nouveaux sites connaissent-ils des montagnes russes dans les SERP ?
13:59 Faut-il vraiment se préoccuper du crawl budget pour son site ?
15:37 Faut-il vraiment s'inquiéter du crawl budget sous le million d'URLs ?
16:09 Le crawl budget existe-t-il vraiment ou est-ce juste un mythe SEO ?
17:42 Google bride-t-il volontairement son crawl pour ménager vos serveurs ?
18:51 Googlebot peut-il vraiment arrêter de crawler votre site à cause de codes d'erreur serveur ?
20:24 Comment détecter un vrai problème de crawl budget sur votre site ?
21:57 Élaguer le contenu faible améliore-t-il vraiment le crawl budget ?
22:28 Faut-il sacrifier la vitesse serveur pour économiser du crawl budget ?
23:32 Pourquoi vos requêtes API explosent-elles votre crawl budget à votre insu ?
25:39 Faut-il vraiment s'inquiéter du cache agressif de Googlebot sur vos ressources statiques ?

What you need to understand

What is crawl budget and why is everyone talking about it?

The crawl budget is the number of pages that Googlebot is willing to crawl on your site over a given period. Google does not have the time or resources to crawl the entire web continuously. Therefore, it allocates a quota to each site based on its popularity, freshness, and technical health.

Most small sites don’t need to worry. But as soon as you exceed a few thousand pages — e-commerce, media, multilingual portals — crawl budget becomes a strategic issue. A non-crawled URL will never be indexed, period.

Why is Gary Illyes' statement so blunt?

Because it debunks a persistent myth: the idea that there are “free” URLs that do not consume crawl budget. Some SEOs believed that CSS files, images, or hreflang versions were treated differently, like secondary resources that do not count against the quota.

Gary Illyes cuts through this: every crawled URL counts, without discrimination. Your French page? It counts. The German version? It counts. The loaded CSS? It counts. The hero image? It also counts. 170 language variants of a page = 170 URLs that will nibble away at your budget.

What does this mean for a multilingual or multi-regional site?

This is where it gets tricky. A site available in 20 languages with 10,000 pages potentially generates 200,000 URLs to crawl. If your architecture is not optimized — chaotic pagination, unmanaged URL parameters, duplicated content — you are wasting your crawl budget on low-value pages.

Sites that multiply versions without a strong editorial strategy find themselves in a deadlock: Google crawls thousands of almost identical variations while strategic pages wait their turn. Crawl prioritization then becomes a decisive lever to maintain control.

Every crawled URL consumes budget, including CSS, images, and alternative language versions.
Multilingual sites must streamline their architecture to avoid wasting crawl resources.
The multiplication of variants without added value slows down the indexing of priority pages.
Crawl budget is not infinite: it should be managed like a strategic asset.
Small sites (< 5,000 pages) are rarely affected by this issue.

SEO Expert opinion

Is this statement consistent with what we observe in the field?

Yes and no. In principle, it’s consistent: server logs show that Googlebot crawls everything — CSS, JS, images, PDFs. No glaring exceptions. But reality is more nuanced than the official discourse.

First, not all URLs carry the same strategic weight. Google itself has internal prioritization mechanisms: it crawls more often the popular, fresh pages or those linked from the homepage. Saying that “everything counts the same” is technically true at the raw quota level, but strategically reductive. [To be verified]: Google has never published an official weighting between an HTML page and a CSS file in the budget calculation.

What nuances should we add to this statement?

The first nuance is that crawling does not mean indexing. Google can crawl your 170 language versions, but it will only index those that it deems relevant. The real problem isn’t so much crawling as the waste of resources on pages that will never serve as SEO entry points.

The second point is that some files — especially static resources like CSS or JS — are often cached on Google's side. Once crawled, they are not re-crawled at every page visit. This is a substantial gain for well-architected sites. But Gary Illyes doesn’t mention this, which makes his argument somewhat alarmist.

In what cases does this rule pose a real problem?

Let’s be honest: this rule becomes problematic when your architecture generates unnecessary URLs. E-commerce filter facets, poorly managed pagination, obsolete AMP versions, duplicated images... all of this consumes budget for zero SEO return.

The sites that suffer the most are those that have accumulated technical debt: chain redirects, uncleaned soft 404s, automatically generated content without value. Google crawls, crawls, crawls... and indexes nothing. The crawl budget thus becomes a symptom of deeper structural issues.

Warning: If you have a site with more than 50,000 URLs and notice unusual indexing delays, check your server logs. You will probably discover that Googlebot is wasting time on low-value areas.

Practical impact and recommendations

What should you do concretely to optimize your crawl budget?

Your first reflex should be to audit your server logs. You need to know which URLs Google is actually crawling, how often, and with what HTTP status. A good log analysis tool often reveals surprises: hundreds of 404s crawled daily, old URLs redirecting in chains, unnecessary resources repeatedly requested.

Next, streamline your multilingual architecture. If you have 170 versions of a page, ask yourself: do they all deserve to exist? Do some languages or regions generate traffic? If a version has never had a visitor in 6 months, it consumes budget for nothing. Consolidate, merge, or remove.

What mistakes should you absolutely avoid?

Don’t multiply unmanaged URL parameters in Search Console. Each variation (sort, filter, session ID) is viewed as a distinct URL. Google will crawl them all if you don’t correctly configure the parameter management tool or if you don't canonicalize.

Another classic mistake: leaving old versions of content accessible. Obsolete separate mobile versions, deprecated AMP pages, old domains that are still crawlable... all of this drains your budget. If a URL no longer has a reason to exist, do a proper 410 on it.

How can you verify that your site is well-optimized?

Keep an eye on the “Crawl Stats” report in Search Console. Look at the evolution of the number of pages crawled per day, the average download time, and host responses. A sudden drop in crawl counts or an increase in errors are warning signs.

Also compare the number of crawled URLs with the number of truly strategic URLs. If Google crawls 80,000 pages but you only have 10,000 that are genuinely useful, you have a content governance problem. Ask yourself, where are those 70,000 parasitic URLs?

Analyze your server logs monthly to identify areas of crawl waste.
Set up URL management in Search Console to exclude unnecessary variations.
Consistently canonicalize duplicate or nearly duplicate pages.
Remove (410) or noindex old versions of content without SEO value.
Optimize server response time to maximize the number of pages crawled per session.
Limit chain redirects to a maximum of 1 hop.

The crawl budget is not a theoretical concept — it is a real constraint that directly impacts your ability to index your strategic pages. Optimizing your architecture, cleaning up parasite URLs, and monitoring your logs are prerequisites. These technical projects can be complex to manage internally, especially on high-volume sites. Enlisting a specialized SEO agency can provide a precise diagnosis, prioritized recommendations, and support for implementation — often more cost-effective than fumbling around alone for months.

❓ Frequently Asked Questions

Le crawl budget affecte-t-il tous les sites de la même manière ?

Non, les petits sites (moins de 5000 pages) sont rarement limités par le crawl budget. Google crawle généralement l'intégralité de leur contenu sans problème. En revanche, les gros sites — e-commerce, médias, portails multilingues — doivent activement gérer ce budget pour garantir l'indexation de leurs pages prioritaires.

Les fichiers CSS et JavaScript consomment-ils vraiment autant de crawl budget qu'une page HTML ?

Techniquement oui, chaque URL crawlée compte. Mais en pratique, Google met souvent en cache les ressources statiques après un premier crawl, ce qui réduit leur impact réel. Le problème survient surtout quand ces fichiers changent fréquemment ou sont dupliqués inutilement.

Comment savoir si mon site souffre d'un problème de crawl budget ?

Consultez le rapport « Statistiques d'exploration » de la Search Console. Si vous constatez que Google crawle massivement des URLs sans valeur (anciennes pages, paramètres, duplicata) pendant que vos nouvelles pages stratégiques mettent des semaines à être indexées, vous avez un problème de priorisation du crawl.

Faut-il bloquer certaines URLs dans le robots.txt pour économiser du crawl budget ?

Bloquer dans le robots.txt empêche le crawl, mais Google ne peut alors pas découvrir les liens ni évaluer la qualité de ces pages. Préférez le noindex (crawlable mais non indexable) pour les contenus à faible valeur, et le 410 pour supprimer définitivement les URLs inutiles.

Les versions hreflang consomment-elles du budget même si elles pointent vers du contenu identique ?

Oui. Chaque version linguistique ou régionale est une URL distincte que Google doit crawler. Si vous avez 50 langues pour 10 000 pages, c'est potentiellement 500 000 URLs à gérer. D'où l'importance de n'activer les versions hreflang que pour les langues/régions réellement stratégiques.

🎥 From the same video 15

Other SEO insights extracted from this same Google Search Central video · duration 31 min · published on 09/12/2020

🎥 Watch the full video on YouTube →