Official statement
Other statements from this video 27 ▾
- 13:31 Can your slow pages drag down the rankings of your entire site?
- 13:33 Do Core Web Vitals really affect your entire site or just your slow pages?
- 14:54 Why does CrUX collect your Core Web Vitals even if you block Googlebot?
- 15:50 Does Google really underplay the true importance of Page Experience in rankings?
- 16:36 Is Page Experience really just a secondary ranking signal?
- 17:28 Does LCP truly measure the speed perceived by the user?
- 19:57 Do Core Web Vitals really measure continuously throughout the user session?
- 20:04 Do Core Web Vitals really change after the initial page load?
- 21:22 How does Google estimate your Core Web Vitals when CrUX data is lacking?
- 22:22 How does Google estimate a page's Core Web Vitals without sufficient CrUX data?
- 27:07 How does Google now assign AMP cache's CrUX data to the origin?
- 29:47 Is AMP still necessary to rank in Top Stories on mobile?
- 32:31 How can you leverage server logs to uncover 4xx errors in Search Console?
- 34:34 Why do new sites experience extreme volatility in indexing and ranking?
- 34:34 Should you really analyze server logs to diagnose 4xx errors in Search Console?
- 34:34 Why does your new site fluctuate like a yo-yo in the SERPs?
- 40:03 Should you really report copied content from your site using Google's spam form?
- 40:20 How can you effectively report copied content spam to Google?
- 43:43 Are your franchise pages considered doorway pages by Google?
- 45:46 Is duplicate content really harmless to your SEO?
- 45:46 Is it true that duplicate content won't penalize your SEO?
- 45:46 Are your franchise pages seen as doorway pages by Google?
- 51:52 Does the http:// or https:// namespace in an XML sitemap really affect crawlability?
- 52:00 Does using HTTPS for your XML sitemap namespace hurt your SEO ranking?
- 55:56 Is it really sufficient to include only one version, mobile or desktop, in your XML sitemap?
- 56:00 Should you really submit both mobile AND desktop versions in your sitemap?
- 61:54 Should you give up on AMP if you’re using GA4 to measure your performance?
Google states that the Core Web Vitals data comes from the Chrome User Experience Report (CrUX), measured from actual user experiences. Blocking crawling via robots.txt or noindexing a page does not prevent the collection of these metrics. In fact, even a page completely blocked for Googlebot can generate CWV data if it receives Chrome traffic.
What you need to understand
Where does Core Web Vitals data actually come from?
The Core Web Vitals are not metrics calculated by Google's bots during crawling. They come from the Chrome User Experience Report (CrUX), a public database that aggregates the real performance measured in the Chrome browser.
When a user visits your site with Chrome — and they have consented to share usage statistics — the browser records the LCP, CLS, and INP of each page visited. This data is then sent back anonymously to CrUX, regardless of any SEO directives.
Why do robots.txt and noindex have no effect on CrUX?
The robots.txt file controls what Googlebot can crawl. The noindex tag prevents an URL from being indexed in search results. However, neither of these actions affects the Chrome browser of your visitors.
If a page receives traffic — even minimal — and those visitors are using Chrome, the performance data will be collected. Therefore, you can have a page completely blocked for Googlebot, invisible in the Search Console, but still generating Core Web Vitals in CrUX.
How can a page that is invisible to Google impact my SEO?
This is where it gets tricky. Google uses the CrUX data to evaluate user experience at the domain level. If you have hidden pages (dev, staging, internal pages) that receive internal traffic with poor performance, they can potentially pollute your overall metrics.
Worse yet: a page blocked by robots.txt but accessible directly (via shared link, bookmark) can generate bad CWV signals that impact how Google perceives your domain, even if this page never appears in the SERPs.
- CrUX collects data from the browser side, not from the server or crawler
- Robots.txt and noindex have no influence over this collection
- Any accessible page with Chrome traffic generates Core Web Vitals
- Polluted metrics can affect the overall reputation of the domain
- Publicly accessible dev/staging environments are frequent sources of pollution
SEO Expert opinion
Is this statement consistent with real-world observations?
Yes, and it is often a source of confusion among practitioners. Clients regularly find themselves perplexed by CWV data in Search Console for URLs they believe they have blocked. The problem stems from a fundamental misunderstanding: CrUX and indexing are two completely separate streams.
I have seen sites with publicly accessible preproduction environments (protected only by obscurity of URL) pollute their domain metrics for months. Internal traffic — QA, developers, testing clients — is enough to generate CrUX data if these users are on Chrome. [To verify]: Google has never clarified the minimum traffic threshold needed for an URL to appear in CrUX, but observation suggests that a few dozen visits per month is sufficient.
What are the gray areas of this statement?
Google remains deliberately vague on several critical points. First, how are the page-level versus domain-level metrics actually aggregated and weighted in the algorithm? We know that Google looks at CWV at the domain level, but with what weight for each URL?
Next, the collection and aggregation period: CrUX compiles on a rolling 28-day basis, but how does Google smooth out fluctuations? Does a page with a spike of poor performance over 3 days permanently pollute the overall score? [To verify]: impossible to get a clear answer from Google on this point.
Finally, the case of authentication-protected pages: technically, if they receive Chrome traffic, they generate CrUX data. But are they truly taken into account in the SEO evaluation? No certainty.
When does this rule become problematic?
The main pitfall concerns sites with multiple environments: dev, staging, UAT, pre-prod. If these environments are accessible without strong authentication (IP whitelisting, VPN), they generate unwanted CrUX data. I have seen domains penalized by poor Core Web Vitals coming 80% from their poorly optimized staging.
Another vicious case: old migrated URLs that are still bookmarked or shared internally. They can remain active in CrUX for months after migration if they continue to receive residual traffic, dragging obsolete metrics along.
Practical impact and recommendations
How can you identify the pages that are polluting your Core Web Vitals?
The first step is to audit all publicly accessible URLs on your domain, including those you think are "hidden". Use a crawler configured to ignore robots.txt (Screaming Frog in "ignore robots.txt" mode) and check which pages respond with a 200 status.
Then, test each one in PageSpeed Insights. If CrUX data appears (under the "Discover what your real users are experiencing" tab), it means this page is generating metrics. Cross-reference with your server logs to identify the source of traffic: internal, partners, residual links.
What concrete actions can block CrUX pollution?
Robots.txt and noindex are useless here. The only effective method is to block HTTP access itself. For dev/staging environments, implement HTTP Basic authentication (htpasswd) or, better yet, a strict IP whitelisting.
For internal pages accessible to employees, consider a SSO authentication system or access via VPN. The goal: no "standard" Chrome traffic should reach these pages. Caution: even cookie-restricted access can generate CrUX data if the browser is not configured to block usage reporting.
How can you monitor the evolution of your domain metrics?
Set up an automated CrUX monitoring via the public API or tools like CrUX Dashboard. Pay close attention to unexplained degradation spikes not explained by your deployments on the main pages. A CWV that declines without apparent changes on your priority URLs may signal pollution from a hidden page.
Regularly check the list of URLs reported in Search Console (Core Web Vitals report). If blocked or outdated pages appear, it means they are still receiving Chrome traffic. Trace the source and cut it off at the root.
- Audit all publicly accessible URLs, including those "hidden" by robots.txt
- Test each suspicious page in PageSpeed Insights to check if it generates CrUX data
- Block HTTP access to dev/staging environments via authentication or IP whitelisting
- Protect internal pages with SSO or VPN rather than just robots.txt
- Set up automated CrUX monitoring to detect unexplained degradations
- Monitor the Core Web Vitals report in Search Console to identify pesky URLs
❓ Frequently Asked Questions
Si je bloque une page avec robots.txt, Google peut-il quand même utiliser ses Core Web Vitals ?
Une page en noindex peut-elle impacter mes Core Web Vitals globaux ?
Comment empêcher totalement la collecte de données CrUX sur certaines pages ?
Mon environnement de staging accessible par URL obscure peut-il affecter mon SEO ?
Comment savoir si une page bloquée génère quand même des données CrUX ?
🎥 From the same video 27
Other SEO insights extracted from this same Google Search Central video · duration 1h07 · published on 28/01/2021
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.