Why does CrUX collect your Core Web Vitals even if you block Googlebot?

Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

The Core Web Vitals use data from the Chrome User Experience Report (CrUX), based on real user experiences. The data accumulates even if Googlebot is blocked by robots.txt or noindex, as it originates from the Chrome browsers of real users.

14:54

🎥 Source video

Extracted from a Google Search Central video

⏱ 1h07 💬 EN 📅 28/01/2021 ✂ 28 statements

Watch on YouTube (14:54) →

✂ Other statements from this video 27 ▾

📅

Official statement from January 28, 2021 (5 years ago)

⚠ A more recent statement exists on this topic Why Do Core Web Vitals Show Different Results Between CrUX and Search Console? Google · August 26, 2025 View statement →

TL;DR

Google confirms that the data from the Chrome User Experience Report (CrUX) powering the Core Web Vitals comes from real users' Chrome browsers, not from Googlebot. As a result, even if you completely block the bot through robots.txt or noindex, your CWV metrics continue to be collected and utilized for ranking. For an SEO, this means you can't escape the Core Web Vitals by manipulating crawl — the only way out is to have zero Chrome traffic.

What you need to understand

Where do CrUX data actually come from?

The CrUX data do not come from Googlebot activity, but directly from Chrome browsers used by real users. When a Chrome user visits your site (with the statistics sharing option enabled), the browser automatically reports performance metrics: Largest Contentful Paint, First Input Delay, Cumulative Layout Shift.

This data is anonymously aggregated and forms the basis of the Chrome User Experience Report. Google then uses this dataset to feed the Core Web Vitals that impact search ranking. There is no connection to crawling: if your site receives Chrome traffic, it generates CrUX data, period.

What happens if I block Googlebot via robots.txt or noindex?

Blocking Googlebot via robots.txt prevents the bot from crawling your pages. Adding a noindex tag prevents indexing. But neither stops CrUX collection, as it relies on real visitors using Chrome.

In practical terms: even if your page is not crawled or indexed, as long as it receives Chrome traffic, the performance metrics continue to feed into CrUX. Therefore, Google can have Core Web Vitals data for a non-indexed URL — an apparent paradox, but a consistent technical logic.

Why did Google consider it important to clarify this point?

Because many SEO practitioners still confuse data collection and crawling activity. Some believed they could escape the Core Web Vitals by blocking the bot, thinking that without crawling, there would be no metrics.

This statement puts an end to that: the Core Web Vitals are a user-based signal, not a bot-based signal. Google reminds us that CrUX is a ground measurement device, independent of crawl infrastructure. The confusion also stemmed from the fact that other Search metrics (freshness, links, content) do depend on crawling.

CrUX collects data via Chrome, not via Googlebot
A robots.txt or noindex block does not prevent the collection of Core Web Vitals
CWV metrics can exist for non-indexed pages if they receive Chrome traffic
The only way to escape CrUX is no Chrome traffic or users who have disabled statistics sharing
This architecture clearly separates user signals from crawl signals in Google's algorithm

SEO Expert opinion

Is this statement consistent with what we see in practice?

Absolutely. SEOs tracking their PageSpeed Insights or Search Console (Core Web Vitals report) have noticed that pages blocked from crawling sometimes display CrUX metrics. This seemed strange two years ago — today, it's clear.

Google has always emphasized that the Core Web Vitals measure real experiences, not a bot simulation. This statement confirms what was suspected: CrUX is a completely separate pipeline from crawling. The two systems do not communicate for collection — they only intersect at the time of ranking.

Are there nuances or edge cases to be aware of?

First nuance: for a URL to show up in CrUX, it requires a sufficient volume of Chrome traffic. Google applies popularity thresholds to protect anonymity. If your page receives only a few visits per month, it won't appear in the public CrUX dataset, even if technically the data is being collected.

Second nuance: Chrome users can disable sharing of usage statistics. In this case, their visits do not contribute to CrUX. The exact opt-out rate is not public [To be verified], but it's believed to remain a minority. The majority of default Chrome installations send data.

Third point: this statement pertains to CrUX and the Core Web Vitals. It says nothing about other metrics or signals that Google might correlate with crawling. In other words, blocking Googlebot still has an impact (indexing, content discovery, freshness), but not on CWV.

What is the real strategic consequence for an SEO?

Let's be honest: this clarification crushes any ambition to bypass the Core Web Vitals through robots.txt maneuvering. Some sites considered temporarily blocking crawl on slow sections, hoping to escape CWV penalties. No chance.

On the other hand, it raises an interesting question for restricted access sites (paywalls, members, intranets). If these pages are not indexed but generate Chrome traffic, their CrUX metrics exist somewhere on Google's servers. Does Google use them for the ranking of a non-indexed URL? No, since the URL is not in the index. But if the page becomes public and indexable later, the historical CrUX data could theoretically be mobilized immediately — [To be verified] with real-world testing, Google has never detailed this point.

Attention: Do not confuse "CrUX collected data" with "CrUX data used for ranking." Google collects data on all pages with sufficient Chrome traffic, but only ranks indexed pages. A page blocked with noindex will not be ranked, even if its CWV are excellent.

Practical impact and recommendations

What should you do concretely if you want to optimize your Core Web Vitals?

First rule: forget about robots.txt band-aids. You won't bypass CrUX by manipulating crawl. The only method is to actually improve user experience: reduce resource weight, optimize rendering, limit blocking third-party scripts, stabilize layout.

Second rule: monitor Search Console (Core Web Vitals report) and PageSpeed Insights in "Field Data" mode. These two tools display real CrUX metrics, meaning those that Google uses for ranking. The "lab" data from Lighthouse is useful for diagnosis, but does not reflect what your Chrome visitors experience.

What mistakes should be avoided in managing CrUX?

Classic mistake: optimizing only for Lighthouse and ignoring CrUX data. Lighthouse simulates a mid-range mobile on a 4G connection. Your real users may have poor connections, old Androids, or conversely high-speed fiber and recent machines. CrUX reflects this diversity — Lighthouse does not.

Another mistake: believing a noindexed page generates no data. It does generate data, but Google does not use it for ranking since the page is not in the index. If you remove the noindex later, the accumulated CrUX metrics could (conditional, Google has never officially confirmed) be quickly considered.

Finally, do not neglect non-Chrome traffic. CrUX only captures Chrome, so if 40% of your audience is on Firefox or Safari, their metrics do not get reported. You can have excellent CWV measured by CrUX and a degraded experience on other browsers. Use RUM (Real User Monitoring) tools to cover the entire spectrum.

How can you check that your site is measured correctly by CrUX?

Go to PageSpeed Insights, enter your URL. If the "Field Data" section shows up with LCP, FID, CLS metrics, it means your page has enough Chrome traffic to be in CrUX. If you see "No data available," either your traffic is too low, or your page is too recent.

You can also directly query the CrUX API or consult the public BigQuery dataset for finer analyses (percentile distribution, temporal evolution). Search Console aggregates data at the site level and by URL groups, making large-scale diagnostics easier.

Check your CrUX metrics through Search Console and PageSpeed Insights (field data)
Optimize actual performance: reduce weight, defer non-critical scripts, stabilize layout
Do not rely on robots.txt or noindex to escape CrUX collection
Use RUM tools to also monitor non-Chrome browsers
Test your pages on varied connections and devices, not just under Lighthouse conditions
If you remove a noindex, immediately monitor CWV — CrUX data may already exist

In summary: CrUX collects Core Web Vitals via the Chrome browsers of your real visitors, regardless of any robots.txt or noindex directives. You cannot bypass these metrics through crawl tricks. The only viable strategy is to genuinely improve your pages' performance. These optimizations — caching, CDN, lazy loading, compression, elimination of render-blocking — can quickly become complex to orchestrate alone, especially on a site with thousands of pages. Engaging a specialized SEO agency in web performance allows you to benefit from an in-depth technical audit, a prioritized roadmap, and tailored support to sustainably transform your Core Web Vitals.

❓ Frequently Asked Questions

Est-ce que bloquer Googlebot via robots.txt empêche la collecte des données CrUX ?

Non. Les données CrUX proviennent des navigateurs Chrome des utilisateurs réels, pas de Googlebot. Bloquer le robot n'a aucun impact sur la collecte des Core Web Vitals.

Une page en noindex peut-elle générer des données CrUX ?

Oui, si elle reçoit suffisamment de trafic Chrome. La balise noindex empêche l'indexation mais pas la collecte CrUX, qui repose sur les visiteurs réels.

Quel volume de trafic Chrome faut-il pour apparaître dans CrUX ?

Google applique un seuil de popularité pour protéger l'anonymat, mais ne communique pas le chiffre exact. En pratique, les pages à très faible trafic ne figurent pas dans le dataset public CrUX.

Les données CrUX couvrent-elles tous les navigateurs ?

Non, uniquement Chrome (desktop et mobile). Les utilisateurs de Firefox, Safari ou Edge ne contribuent pas à CrUX, sauf si Edge utilise le moteur Chromium avec partage activé.

Si je lève un noindex, les données CrUX historiques sont-elles immédiatement exploitées pour le ranking ?

Google n'a jamais confirmé officiellement ce point. En théorie, les métriques CrUX accumulées pourraient être mobilisées dès l'indexation, mais cela reste à vérifier par des tests terrain.

🏷 Related Topics

CrUX Core Web Vitals robots.txt noindex Googlebot Chrome performance web données utilisateur

Crawl & Indexing Web Performance

🎥 From the same video 27

Other SEO insights extracted from this same Google Search Central video · duration 1h07 · published on 28/01/2021

🎥 Watch the full video on YouTube →

Related statements

« Previous

Insufficient CrUX Data: Google Uses Similar Site E...

AMP: No GA4 support announced yet...

« Back to results