Official statement
Other statements from this video 13 ▾
- 9:53 Le budget de crawl est-il vraiment inutile pour les petits sites ?
- 15:14 Comment Google décide-t-il quelles pages crawler en priorité sur votre site ?
- 25:55 Qu'est-ce que la demande de crawl et comment Google la calcule-t-il vraiment ?
- 33:45 Comment Google calcule-t-il le taux de crawl pour ne pas planter vos serveurs ?
- 37:38 Le crawl budget augmente-t-il vraiment avec la vitesse de votre serveur ?
- 41:11 Pourquoi un site lent tue-t-il votre taux de crawl Google ?
- 43:17 Peut-on vraiment limiter le taux de crawl de Google sans risquer son référencement ?
- 46:04 Le budget de crawl, simple combinaison de taux et de demande ?
- 61:43 Pourquoi Google réserve-t-il le rapport Crawl Stats aux propriétés de domaine uniquement ?
- 77:09 Le temps de réponse exclut-il vraiment le rendu de page dans Search Console ?
- 82:21 Pourquoi une chute brutale des requêtes de crawl peut-elle révéler un problème de robots.txt ou de temps de réponse ?
- 87:00 Le temps de réponse serveur influence-t-il vraiment le taux de crawl de Googlebot ?
- 101:16 Pourquoi un code 503 sur robots.txt peut-il bloquer tout le crawl de votre site ?
Google does not account for resources hosted outside of your domain in the crawl statistics of Search Console. This means that your images served via CDN or third-party scripts do not appear anywhere in the reports. For an SEO, this signifies that your view of the actual crawl budget consumed is partial — and comparing Search Console figures with server logs will inevitably reveal massive discrepancies.
What you need to understand
What does Google mean by "external resources"?
An external resource refers to any file — image, JavaScript script, CSS stylesheet, web font — hosted on a domain different from that of your main HTML page. If your site is on example.com but your images are served from cdn.cloudflare.net, those images are external.
The important nuance: Google is referring to crawl statistics in Search Console, not the crawl itself. The bot does explore these resources to understand and render your page, but it does not count them in the metrics you check. Your dashboard displays an incomplete volume of requests.
Why does Google exclude these resources from the stats?
Search Console reports data by verified property. If you have verified example.com, you will see information pertaining to that domain only. Requests to cdn.cloudflare.net fall under another property — which you probably do not control.
It’s a matter of technical architecture: Google segments its reports by domain so that you only see what belongs to you. However, in reality, a modern page loads dozens of cross-domain resources. Therefore, the figure displayed in Search Console is structurally partial.
What are the practical implications for an SEO?
First point: if you use Search Console stats as a proxy for crawl budget, you underestimate the actual load on your origin servers and on Google’s infrastructure. Images, analytics scripts, fonts — all consume requests, rendering time, and bandwidth on the Googlebot side.
Second point: to obtain a complete view, you must cross-check server logs with Search Console data. Logs capture all requests, including those to CDNs if you control the domain. Without this dual reading, you are navigating blindly regarding the effectiveness of your crawl budget.
- Resources hosted off-domain (CDN, third-party services) do not appear in Search Console stats.
- Search Console only displays requests to the verified properties that you control.
- For a comprehensive view of the crawl, you must analyze server logs in conjunction with Google reports.
- This exclusion concerns the statistics, not the actual crawl: Googlebot does explore these resources to understand the page.
- Images served via CDN often make up the majority of file volume on a modern page — their invisibility in the stats creates a massive blind spot.
SEO Expert opinion
Is this statement consistent with field observations?
Absolutely. Any audit comparing Apache/Nginx logs with Search Console reports reveals this gap. On an e-commerce site with 200 images per page served from a Cloudflare or Akamai CDN, Search Console shows only a fraction of the actual Googlebot traffic. The logs, however, record thousands of daily hits on those images.
The issue: many junior SEOs blindly rely on Search Console numbers to diagnose crawl budget issues. They see 5,000 requests/day and conclude that everything is fine, while the logs show 50,000 actual requests when including the external assets. This misunderstanding distorts the analysis.
What are the gray areas of this statement?
Google says "hosted outside your site," but what about subdomains? If your images are on cdn.example.com and your site is on www.example.com, technically these are two distinct properties in Search Console. You need to verify cdn.example.com separately to see its stats — something that 90% of sites never do. [To be verified]: Does Google treat a subdomain as "external" in this context? The documentation remains unclear.
Another nuance: resources loaded via JavaScript after the initial render. If an image is dynamically injected by a third-party script, it can escape both Search Console stats AND classic server logs if you don’t trace requests on the CDN side. In that case, you are completely blind.
When does this rule become critical?
On sites with a high volume of images or videos — online news, e-commerce, galleries, portfolios — where the external resources / HTML ratio skyrockets. I have seen news sites with 95% of page weight served by third-party CDNs: the Search Console stats then reflect only 5% of the actual Googlebot activity on the content.
Another case: sites using third-party services for fonts, analytics, ads. Google Fonts, Typekit, GTM, Facebook pixels — all invisible requests in your reports. If you optimize crawl time without considering these dependencies, you are missing the essential.
Practical impact and recommendations
How can you gain a comprehensive view of your site’s crawl?
First step: activate server log analysis. Tools like Oncrawl, Botify, or Screaming Frog Log Analyzer show you all Googlebot requests, including those to external resources if they pass through your infrastructure. Cross-reference this data with Search Console reports to identify discrepancies.
Second step: if you use a CDN, request the CDN logs. Cloudflare, Fastly, Akamai provide detailed reports on bot hits. You will see exactly how many times Googlebot loads your images, scripts, fonts — and how often. These numbers can be 10 to 50 times higher than Search Console stats.
Should you migrate resources to your main domain?
It depends. Hosting assets on your domain increases visibility in the stats and simplifies diagnostics, but it also loads your infrastructure and can degrade performance if you do not have a high-performing CDN. Third-party CDNs like Cloudflare offer low global latency and nearly unlimited bandwidth.
The real criterion: if you have proven crawl budget issues — strategic pages not crawled, progressive deindexing — and your external assets are being crawled massively, consider lazy-loading them or serving them from a non-priority separate domain. But do not migrate everything by default: you risk degrading the Core Web Vitals.
What mistakes should you avoid in interpreting the stats?
Never use Search Console numbers alone to estimate the crawl budget consumed. This is the most common mistake. A site can display 2,000 requests/day in the console while Googlebot makes 30,000 when counting the external assets. You underestimate the actual weight of your site in Google’s eyes.
Another trap: comparing two sites without considering their CDN architecture. A site with 100% of assets internal may have inflated Search Console stats compared to an identical site with externalized assets — but the actual crawl might be the same. Do not jump to conclusions based on inter-site comparisons.
- Activate server log analysis to capture all Googlebot requests, including those to external resources.
- Request the logs from your CDN to quantify the crawling of images, scripts, and fonts hosted off-domain.
- Check subdomains separately in Search Console if they host critical assets (cdn.example.com).
- Never base a crawl budget strategy solely on Search Console stats — cross-reference with actual logs.
- If you migrate assets to your main domain, anticipate an artificial spike in Search Console stats without panicking.
- Lazy-load non-critical images to reduce the number of external requests during the initial crawl.
❓ Frequently Asked Questions
Les images servies via un CDN tiers sont-elles indexées par Google ?
Si je vérifie mon sous-domaine CDN dans la Search Console, verrai-je les stats ?
Les scripts JavaScript tiers chargés sur ma page sont-ils comptés ?
Cette exclusion affecte-t-elle le calcul du budget de crawl par Google ?
Dois-je rapatrier mes images sur mon domaine principal pour améliorer le SEO ?
🎥 From the same video 13
Other SEO insights extracted from this same Google Search Central video · duration 161h29 · published on 03/03/2021
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.