Why is your Search Console crawl budget skyrocketing for seemingly no reason?

Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

Crawl statistics in Search Console include all crawled URLs (HTML, images, CSS, JS, server responses) and all requests passing through the Googlebot infrastructure, including checks for advertising and e-commerce landing pages. This can explain a crawl volume significantly higher than the number of indexable pages.

23:14

🎥 Source video

Extracted from a Google Search Central video

⏱ 56:47 💬 EN 📅 04/08/2020 ✂ 39 statements

Watch on YouTube (23:14) →

✂ Other statements from this video 38 ▾

📅

Official statement from August 4, 2020 (5 years ago)

⚠ A more recent statement exists on this topic Do images really consume your crawl budget at the expense of your strategic page... Gary Illyes · September 7, 2022 View statement →

TL;DR

Crawl statistics in Search Console are not limited to indexable HTML pages: they encompass all requests through the Googlebot infrastructure (images, CSS, JS, server responses, checks of advertising and e-commerce landing pages). Therefore, a crawl budget that seems excessive compared to the number of pages on the site is not necessarily an alarm signal. Understanding this distinction helps avoid erroneous diagnoses and focus efforts on genuine crawl issues.

What you need to understand

What do we mean by "all crawled URLs" exactly?

When John Mueller talks about "all URLs", he is not limiting the scope to traditional HTML pages. The crawl statistics in Search Console aggregate every HTTP request passing through the Googlebot infrastructure: HTML pages of course, but also images, CSS stylesheets, JavaScript scripts, JSON files, API responses, fonts, and even obscure resources you may not have realized you were hosting.

This comprehensiveness explains why a site with 500 indexable pages can show 5,000 or 10,000 requests per day in Search Console. This is not an anomaly. A modern site easily loads 10 to 20 resources per page (images, CSS, JS). Multiply that by the number of crawled pages, add orphaned or redundant resources, and you get a volume that can surprise at first glance.

Why are advertising landing page checks included in these stats?

Google uses the Googlebot infrastructure to validate advertising destinations: landing pages for Google Ads campaigns, e-commerce landing pages for Shopping, URLs promoted through Performance Max. These technical checks ensure that the pages comply with advertising policies, are accessible, and do not mislead the user.

These requests have nothing to do with classic organic indexing. They do not "consume" crawl budget in the traditional sense - they do not reduce Googlebot's capacity to crawl your pages for indexing. But they do appear in Search Console statistics, artificially inflating the total displayed volume. If you are running active advertising campaigns with thousands of landing pages, expect to see this volume explode without it reflecting an SEO issue.

Does this statement change our understanding of crawl budget?

Not fundamentally, but it clarifies a common misunderstanding. Many novice SEOs panic when they see an "excessive" crawl volume in Search Console, thinking that Googlebot is wasting time on unnecessary resources. The reality: these numbers do not solely reflect the crawling of indexable HTML pages. They include everything that passes through Googlebot, including technical requests unrelated to organic indexing.

What to remember: a high crawl budget is not a problem in itself. What matters is the ratio of crawled pages to indexed pages, the server error rate, and the proportion of strategic pages that are actually crawled. If Googlebot crawls 10,000 URLs per day but only indexes 50 relevant pages, then yes, there is an issue. If the 10,000 URLs include 8,000 images and JS/CSS files essential for rendering, that’s perfectly normal.

Search Console statistics account for all Googlebot requests, not just indexable HTML pages.
Images, CSS, JS, fonts, and other resources appear in the total crawl volume.
Checks for advertising landing pages (Ads, Shopping, Performance Max) are included in the stats, without impacting organic crawl budget.
A high crawl volume does not mean a problem: analyze the ratio of crawled/indexed pages and the error rate.
Never block JS and CSS in robots.txt in an attempt to save crawl budget - Google needs them to understand your pages' rendering.

SEO Expert opinion

Is this statement consistent with what we observe on the ground?

Absolutely. For years, we have observed on client sites Search Console crawl volumes that bear no relation to the number of HTML pages. A site with 2,000 pages can display 20,000 to 30,000 requests daily. Analyzing server logs confirms this: a large portion of these requests concern static resources (images, CSS, JS) and technical endpoints (API, JSON, sitemap.xml crawled multiple times a day).

What’s new here is the explicit mention of checks for advertising landing pages. Few SEOs were aware that these checks pass through Googlebot infrastructure and appear in Search Console. This is a major source of confusion for those managing e-commerce sites with thousands of products promoted through Shopping or Performance Max. Crawl volume can spike without indicating an architectural or internal linking problem.

What nuances should we add to this statement?

Google does not say that all these requests "count" the same way in crawl budget. [To be verified]: it is likely that advertising checks and some static resources are treated differently than indexable HTML pages. Google’s internal crawl budget algorithm probably prioritizes the crawling of new or modified content on strategic pages over images or advertising checks.

Another nuance: Search Console aggregates data, but does not always detail the types of requests. To really understand what’s happening, you need to cross-reference with server log analysis. There, you will see precisely which URLs are crawled, how often, and which user-agent Google is querying them. Without that, you’re left to interpretation.

In which cases does this rule not apply?

If your site does not use any Google advertising campaigns, does not promote any products via Shopping, and does not conduct any operations via Performance Max, then advertising landing page checks do not inflate your stats. In this case, an excessive crawl volume probably indicates another issue: massive duplicate content, unmanageable URL parameters, crawling of infinite filter facets, or an excessive number of outdated static resources still present on the server.

Another borderline case: sites that block JS and CSS in robots.txt (an outdated practice still observed). These resources will not appear in the crawl stats, but Google will be unable to understand the rendering of the page, which severely undermines indexing and ranking. The crawl volume will seem artificially low, but it’s a trap: you’ve sabotaged your SEO.

Attention: Don’t confuse a high crawl volume with wasting crawl budget. Analyze the details: if 80% of the requests concern legitimate resources (images, JS, CSS) and your strategic pages are being well crawled, there is no problem. Conversely, if Googlebot is wasting time on thousands of low-value pagination pages or useless URL variants, then yes, action is needed.

Practical impact and recommendations

How can you distinguish a healthy crawl budget from a problematic crawl budget?

Look beyond the raw volume displayed in Search Console. The absolute figure means nothing if you don’t relate it to the actual size of your site and the nature of your resources. A site with 500 pages and 5,000 daily requests can be perfectly healthy if 80% of those requests relate to images, CSS, and JS needed for rendering.

On the other hand, if you notice that Googlebot is heavily crawling low-value pages (filter facets, deep pagination pages, unnecessary URL parameters), then yes, you have a problem. Cross-reference the Search Console data with your server logs to pinpoint the URLs being crawled. That’s where you’ll see if Googlebot is wasting time or not.

Should you reduce the number of resources to save crawl budget?

No, unless those resources are outdated or redundant. Never block JS and CSS in robots.txt in an effort to save on crawl budget—this has been counterproductive since at least 2015. Google needs these resources to understand page rendering, evaluate Core Web Vitals, and determine the relevance of the content visible to users.

However, clean up orphaned resources: unlinked images, old JS/CSS files from an outdated version of the site, unused fonts. These files clutter the server and can be crawled by Googlebot even if they no longer appear in the HTML code of your current pages. Regular technical audits help identify and remove these deadweights.

How can you truly optimize your crawl budget if necessary?

Focus on the classic causes of crawl budget wastage: excessive pagination, unblocked filter facets, duplicate content, non-canonicalized URL parameters, chain redirects. That’s where the real gains are made. If you have 10,000 URL variants for 500 products due to combinable filters, you have a problem. If you have 5,000 daily requests because your site loads 10 images per page, that’s normal.

Use canonical tags to consolidate URL variants, implement clean pagination with rel=prev/next or an SEO-friendly infinite scroll system, and block unnecessary parameters via robots.txt or Search Console. Monitor the server error rate (5xx) and response time: a slow or unstable server mechanically reduces the crawl budget allocated by Google.

These optimizations can be complex to implement alone, especially on large-scale sites or with specific technical architectures. Engaging a specialized SEO agency provides a detailed audit, tailored recommendations, and support in implementing fixes—without the risk of breaking existing structures or creating new problems.

Analyze your server logs to pinpoint exactly which URLs Googlebot has crawled and their types (HTML, images, CSS, JS, API).
Cross-reference Search Console data with the actual number of indexable pages to assess the crawl/indexation ratio.
Never block JS and CSS in robots.txt—Google needs them to understand your pages' rendering.
Clean orphaned resources (old CSS/JS files, unlinked images) that clutter the server and can be crawled unnecessarily.
Consolidate URL variants with canonical tags and block unnecessary parameters (filters, sorting, sessions) via robots.txt or Search Console.
Monitor server error rates (5xx) and response times—a slow server reduces the crawl budget allocated by Google.

A high crawl volume in Search Console is not a problem in itself: it reflects all requests passing through Googlebot, including static resources and advertising checks. What matters is the ratio of crawled/indexed pages, the error rate, and Googlebot's ability to regularly crawl your strategic pages. Focus your efforts on the real sources of waste: duplicate content, excessive pagination, unmanaged URL parameters, and server performance. Analyzing server logs is essential for accurately diagnosing where the problem lies.

❓ Frequently Asked Questions

Mon crawl budget est 10 fois supérieur au nombre de pages de mon site, est-ce grave ?

Non, pas nécessairement. Search Console comptabilise toutes les ressources (images, CSS, JS) et toutes les requêtes passant par l'infrastructure Googlebot, y compris les vérifications de landing pages publicitaires. Un ratio élevé est souvent normal.

Les vérifications de landing pages publicitaires consomment-elles du crawl budget réel ?

Elles apparaissent dans les statistiques Search Console mais ne pénalisent pas votre crawl budget au sens traditionnel. Ce sont des requêtes techniques de Google pour valider les destinations publicitaires, pas des explorations de contenu à indexer.

Comment distinguer un crawl budget sain d'un crawl budget problématique ?

Regardez le ratio pages explorées/pages indexées et le taux de réponses serveur 200 vs erreurs. Un volume élevé avec un taux d'erreur faible est normal. Un volume élevé avec beaucoup d'erreurs ou de pages inutiles explorées signale un problème.

Faut-il bloquer les ressources JS et CSS pour économiser du crawl budget ?

Non, c'est contre-productif depuis des années. Google a besoin d'explorer JS et CSS pour comprendre le rendu de la page. Bloquer ces ressources nuit à l'indexation et au ranking. Les inclure dans le crawl budget est normal et souhaitable.

Les images apparaissent-elles dans les statistiques d'exploration Search Console ?

Oui, absolument. Chaque image crawlée par Googlebot génère une requête comptabilisée. Sur un site riche en visuels, cela peut représenter 60 à 80 % du volume total affiché dans les stats d'exploration.

🏷 Related Topics

crawl budget Search Console Googlebot exploration indexation ressources JS CSS landing pages

Domain Age & History Crawl & Indexing E-commerce AI & SEO Images & Videos JavaScript & Technical SEO Domain Name Pagination & Structure Search Console

🎥 From the same video 38

Other SEO insights extracted from this same Google Search Central video · duration 56 min · published on 04/08/2020

🎥 Watch the full video on YouTube →

Related statements

« Previous

City Pages (Doorway Pages): User Value vs. Automat...

Hreflang with canonical: Search Console shows the ...

« Back to results