Are external resources skewing your crawl statistics?

Official statement

Resources hosted outside your site are not counted in crawl requests. For example, if your images are served from another domain like a CDN, they will not appear in the statistics.

69:24

🎥 Source video

Extracted from a Google Search Central video

⏱ 161h29 💬 EN 📅 03/03/2021 ✂ 14 statements

Watch on YouTube (69:24) →

✂ Other statements from this video 13 ▾

9:53 Le budget de crawl est-il vraiment inutile pour les petits sites ?
15:14 Comment Google décide-t-il quelles pages crawler en priorité sur votre site ?
25:55 Qu'est-ce que la demande de crawl et comment Google la calcule-t-il vraiment ?
33:45 Comment Google calcule-t-il le taux de crawl pour ne pas planter vos serveurs ?
37:38 Le crawl budget augmente-t-il vraiment avec la vitesse de votre serveur ?
41:11 Pourquoi un site lent tue-t-il votre taux de crawl Google ?
43:17 Peut-on vraiment limiter le taux de crawl de Google sans risquer son référencement ?
46:04 Le budget de crawl, simple combinaison de taux et de demande ?
61:43 Pourquoi Google réserve-t-il le rapport Crawl Stats aux propriétés de domaine uniquement ?
77:09 Le temps de réponse exclut-il vraiment le rendu de page dans Search Console ?
82:21 Pourquoi une chute brutale des requêtes de crawl peut-elle révéler un problème de robots.txt ou de temps de réponse ?
87:00 Le temps de réponse serveur influence-t-il vraiment le taux de crawl de Googlebot ?
101:16 Pourquoi un code 503 sur robots.txt peut-il bloquer tout le crawl de votre site ?

What you need to understand

What does Google mean by "external resources"?

An external resource refers to any file — image, JavaScript script, CSS stylesheet, web font — hosted on a domain different from that of your main HTML page. If your site is on example.com but your images are served from cdn.cloudflare.net, those images are external.

The important nuance: Google is referring to crawl statistics in Search Console, not the crawl itself. The bot does explore these resources to understand and render your page, but it does not count them in the metrics you check. Your dashboard displays an incomplete volume of requests.

Why does Google exclude these resources from the stats?

Search Console reports data by verified property. If you have verified example.com, you will see information pertaining to that domain only. Requests to cdn.cloudflare.net fall under another property — which you probably do not control.

It’s a matter of technical architecture: Google segments its reports by domain so that you only see what belongs to you. However, in reality, a modern page loads dozens of cross-domain resources. Therefore, the figure displayed in Search Console is structurally partial.

What are the practical implications for an SEO?

First point: if you use Search Console stats as a proxy for crawl budget, you underestimate the actual load on your origin servers and on Google’s infrastructure. Images, analytics scripts, fonts — all consume requests, rendering time, and bandwidth on the Googlebot side.

Second point: to obtain a complete view, you must cross-check server logs with Search Console data. Logs capture all requests, including those to CDNs if you control the domain. Without this dual reading, you are navigating blindly regarding the effectiveness of your crawl budget.

Resources hosted off-domain (CDN, third-party services) do not appear in Search Console stats.
Search Console only displays requests to the verified properties that you control.
For a comprehensive view of the crawl, you must analyze server logs in conjunction with Google reports.
This exclusion concerns the statistics, not the actual crawl: Googlebot does explore these resources to understand the page.
Images served via CDN often make up the majority of file volume on a modern page — their invisibility in the stats creates a massive blind spot.

SEO Expert opinion

Is this statement consistent with field observations?

Absolutely. Any audit comparing Apache/Nginx logs with Search Console reports reveals this gap. On an e-commerce site with 200 images per page served from a Cloudflare or Akamai CDN, Search Console shows only a fraction of the actual Googlebot traffic. The logs, however, record thousands of daily hits on those images.

The issue: many junior SEOs blindly rely on Search Console numbers to diagnose crawl budget issues. They see 5,000 requests/day and conclude that everything is fine, while the logs show 50,000 actual requests when including the external assets. This misunderstanding distorts the analysis.

What are the gray areas of this statement?

Google says "hosted outside your site," but what about subdomains? If your images are on cdn.example.com and your site is on www.example.com, technically these are two distinct properties in Search Console. You need to verify cdn.example.com separately to see its stats — something that 90% of sites never do. [To be verified]: Does Google treat a subdomain as "external" in this context? The documentation remains unclear.

Another nuance: resources loaded via JavaScript after the initial render. If an image is dynamically injected by a third-party script, it can escape both Search Console stats AND classic server logs if you don’t trace requests on the CDN side. In that case, you are completely blind.

When does this rule become critical?

On sites with a high volume of images or videos — online news, e-commerce, galleries, portfolios — where the external resources / HTML ratio skyrockets. I have seen news sites with 95% of page weight served by third-party CDNs: the Search Console stats then reflect only 5% of the actual Googlebot activity on the content.

Another case: sites using third-party services for fonts, analytics, ads. Google Fonts, Typekit, GTM, Facebook pixels — all invisible requests in your reports. If you optimize crawl time without considering these dependencies, you are missing the essential.

Warning: If you migrate assets from one domain to another (e.g., bringing images from the CDN to your main domain), Search Console statistics will artificially spike. Do not panic: this is not a rise in actual crawl, just an increase in visibility in the reports.

Practical impact and recommendations

How can you gain a comprehensive view of your site’s crawl?

First step: activate server log analysis. Tools like Oncrawl, Botify, or Screaming Frog Log Analyzer show you all Googlebot requests, including those to external resources if they pass through your infrastructure. Cross-reference this data with Search Console reports to identify discrepancies.

Second step: if you use a CDN, request the CDN logs. Cloudflare, Fastly, Akamai provide detailed reports on bot hits. You will see exactly how many times Googlebot loads your images, scripts, fonts — and how often. These numbers can be 10 to 50 times higher than Search Console stats.

Should you migrate resources to your main domain?

It depends. Hosting assets on your domain increases visibility in the stats and simplifies diagnostics, but it also loads your infrastructure and can degrade performance if you do not have a high-performing CDN. Third-party CDNs like Cloudflare offer low global latency and nearly unlimited bandwidth.

The real criterion: if you have proven crawl budget issues — strategic pages not crawled, progressive deindexing — and your external assets are being crawled massively, consider lazy-loading them or serving them from a non-priority separate domain. But do not migrate everything by default: you risk degrading the Core Web Vitals.

What mistakes should you avoid in interpreting the stats?

Never use Search Console numbers alone to estimate the crawl budget consumed. This is the most common mistake. A site can display 2,000 requests/day in the console while Googlebot makes 30,000 when counting the external assets. You underestimate the actual weight of your site in Google’s eyes.

Another trap: comparing two sites without considering their CDN architecture. A site with 100% of assets internal may have inflated Search Console stats compared to an identical site with externalized assets — but the actual crawl might be the same. Do not jump to conclusions based on inter-site comparisons.

Activate server log analysis to capture all Googlebot requests, including those to external resources.
Request the logs from your CDN to quantify the crawling of images, scripts, and fonts hosted off-domain.
Check subdomains separately in Search Console if they host critical assets (cdn.example.com).
Never base a crawl budget strategy solely on Search Console stats — cross-reference with actual logs.
If you migrate assets to your main domain, anticipate an artificial spike in Search Console stats without panicking.
Lazy-load non-critical images to reduce the number of external requests during the initial crawl.

External resources often make up the majority of page weight in modern pages, yet they remain invisible in Search Console statistics. To truly optimize your crawl budget and diagnose indexing issues, you must cross-reference Google reports with server and CDN logs. Without this comprehensive view, you are making decisions based on incomplete data. These cross-technical audits require specialized expertise and tools — if your team lacks internal resources or skills, working with an SEO agency specializing in log analysis and crawl optimization can save you months of trial and error.

❓ Frequently Asked Questions

Les images servies via un CDN tiers sont-elles indexées par Google ?

Oui, Googlebot crawle et indexe les images hébergées sur un CDN tiers. Elles apparaissent dans Google Images normalement. Seules les statistiques de crawl dans la Search Console ne les comptabilisent pas — l'indexation elle-même n'est pas affectée.

Si je vérifie mon sous-domaine CDN dans la Search Console, verrai-je les stats ?

Oui. Si vous ajoutez cdn.example.com comme propriété distincte dans la Search Console, vous verrez les requêtes de crawl vers ce sous-domaine. C'est la seule façon d'obtenir ces métriques côté Google.

Les scripts JavaScript tiers chargés sur ma page sont-ils comptés ?

Non, si le script est hébergé sur un domaine externe (analytics.google.com, facebook.net, etc.), il n'apparaît pas dans vos stats Search Console. Seuls les scripts hébergés sur votre domaine vérifié sont comptabilisés.

Cette exclusion affecte-t-elle le calcul du budget de crawl par Google ?

Google calcule le budget de crawl en fonction de la charge serveur réelle et de la valeur perçue de vos pages. Les ressources externes consomment du temps de rendu et de la bande passante Googlebot, donc elles influencent indirectement le budget — même si elles sont invisibles dans les stats.

Dois-je rapatrier mes images sur mon domaine principal pour améliorer le SEO ?

Pas nécessairement. Les images sur CDN sont indexées normalement et offrent de meilleures performances (latence, bande passante). Rapatrier les assets peut dégrader vos Core Web Vitals si votre infra est moins performante que le CDN. Seul intérêt : simplifier le monitoring du crawl.

🎥 From the same video 13

Other SEO insights extracted from this same Google Search Central video · duration 161h29 · published on 03/03/2021

🎥 Watch the full video on YouTube →