What does Google say about SEO? /
Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

Core Web Vitals data comes from the Chrome User Experience Report (CrUX) based on real users. Blocking Google via robots.txt or noindex does not prevent the collection of this data, as it is measured from the user's side.
13:33
🎥 Source video

Extracted from a Google Search Central video

⏱ 1h07 💬 EN 📅 28/01/2021 ✂ 28 statements
Watch on YouTube (13:33) →
Other statements from this video 27
  1. 13:31 Vos pages lentes peuvent-elles plomber le classement de tout votre site ?
  2. 13:33 Les Core Web Vitals impactent-ils vraiment tout votre site ou seulement vos pages lentes ?
  3. 14:54 Pourquoi CrUX collecte vos Core Web Vitals même si vous bloquez Googlebot ?
  4. 15:50 Page Experience : Google ment-il sur son véritable poids dans le classement ?
  5. 16:36 L'expérience de page est-elle vraiment un signal de classement secondaire ?
  6. 17:28 Le LCP mesure-t-il vraiment la vitesse perçue par l'utilisateur ?
  7. 19:57 Les Core Web Vitals se calculent-ils vraiment pendant toute la navigation ?
  8. 20:04 Les Core Web Vitals évoluent-ils vraiment après le chargement initial de la page ?
  9. 21:22 Comment Google estime-t-il vos Core Web Vitals quand les données CrUX manquent ?
  10. 22:22 Comment Google estime-t-il les Core Web Vitals d'une page sans données CrUX ?
  11. 27:07 Comment Google attribue-t-il désormais les données CrUX du cache AMP à l'origine ?
  12. 29:47 AMP est-il encore nécessaire pour ranker dans Top Stories sur mobile ?
  13. 32:31 Comment exploiter les logs serveur pour détecter les erreurs 4xx dans Search Console ?
  14. 34:34 Pourquoi les nouveaux sites connaissent-ils une volatilité extrême dans l'indexation et le classement ?
  15. 34:34 Faut-il vraiment analyser les logs serveur pour diagnostiquer les erreurs 4xx dans Search Console ?
  16. 34:34 Pourquoi votre nouveau site fluctue-t-il comme un yoyo dans les SERP ?
  17. 40:03 Faut-il vraiment signaler le contenu copié de votre site via le formulaire spam de Google ?
  18. 40:20 Comment signaler efficacement le spam de contenu copié à Google ?
  19. 43:43 Vos pages franchise sont-elles des doorway pages aux yeux de Google ?
  20. 45:46 Le contenu dupliqué est-il vraiment sans danger pour votre référencement ?
  21. 45:46 Le contenu dupliqué est-il vraiment sans pénalité pour votre SEO ?
  22. 45:46 Vos pages franchises sont-elles perçues comme des doorway pages par Google ?
  23. 51:52 Le namespace http:// ou https:// dans un sitemap XML influence-t-il vraiment le crawl ?
  24. 52:00 Le namespace en https dans votre sitemap XML pénalise-t-il votre référencement ?
  25. 55:56 Faut-il vraiment inclure les deux versions mobile et desktop dans son sitemap XML ?
  26. 56:00 Faut-il vraiment soumettre les versions mobile ET desktop dans votre sitemap ?
  27. 61:54 Faut-il abandonner AMP si vous utilisez GA4 pour mesurer vos performances ?
📅
Official statement from (5 years ago)
TL;DR

Google states that the Core Web Vitals data comes from the Chrome User Experience Report (CrUX), measured from actual user experiences. Blocking crawling via robots.txt or noindexing a page does not prevent the collection of these metrics. In fact, even a page completely blocked for Googlebot can generate CWV data if it receives Chrome traffic.

What you need to understand

Where does Core Web Vitals data actually come from?

The Core Web Vitals are not metrics calculated by Google's bots during crawling. They come from the Chrome User Experience Report (CrUX), a public database that aggregates the real performance measured in the Chrome browser.

When a user visits your site with Chrome — and they have consented to share usage statistics — the browser records the LCP, CLS, and INP of each page visited. This data is then sent back anonymously to CrUX, regardless of any SEO directives.

Why do robots.txt and noindex have no effect on CrUX?

The robots.txt file controls what Googlebot can crawl. The noindex tag prevents an URL from being indexed in search results. However, neither of these actions affects the Chrome browser of your visitors.

If a page receives traffic — even minimal — and those visitors are using Chrome, the performance data will be collected. Therefore, you can have a page completely blocked for Googlebot, invisible in the Search Console, but still generating Core Web Vitals in CrUX.

How can a page that is invisible to Google impact my SEO?

This is where it gets tricky. Google uses the CrUX data to evaluate user experience at the domain level. If you have hidden pages (dev, staging, internal pages) that receive internal traffic with poor performance, they can potentially pollute your overall metrics.

Worse yet: a page blocked by robots.txt but accessible directly (via shared link, bookmark) can generate bad CWV signals that impact how Google perceives your domain, even if this page never appears in the SERPs.

  • CrUX collects data from the browser side, not from the server or crawler
  • Robots.txt and noindex have no influence over this collection
  • Any accessible page with Chrome traffic generates Core Web Vitals
  • Polluted metrics can affect the overall reputation of the domain
  • Publicly accessible dev/staging environments are frequent sources of pollution

SEO Expert opinion

Is this statement consistent with real-world observations?

Yes, and it is often a source of confusion among practitioners. Clients regularly find themselves perplexed by CWV data in Search Console for URLs they believe they have blocked. The problem stems from a fundamental misunderstanding: CrUX and indexing are two completely separate streams.

I have seen sites with publicly accessible preproduction environments (protected only by obscurity of URL) pollute their domain metrics for months. Internal traffic — QA, developers, testing clients — is enough to generate CrUX data if these users are on Chrome. [To verify]: Google has never clarified the minimum traffic threshold needed for an URL to appear in CrUX, but observation suggests that a few dozen visits per month is sufficient.

What are the gray areas of this statement?

Google remains deliberately vague on several critical points. First, how are the page-level versus domain-level metrics actually aggregated and weighted in the algorithm? We know that Google looks at CWV at the domain level, but with what weight for each URL?

Next, the collection and aggregation period: CrUX compiles on a rolling 28-day basis, but how does Google smooth out fluctuations? Does a page with a spike of poor performance over 3 days permanently pollute the overall score? [To verify]: impossible to get a clear answer from Google on this point.

Finally, the case of authentication-protected pages: technically, if they receive Chrome traffic, they generate CrUX data. But are they truly taken into account in the SEO evaluation? No certainty.

When does this rule become problematic?

The main pitfall concerns sites with multiple environments: dev, staging, UAT, pre-prod. If these environments are accessible without strong authentication (IP whitelisting, VPN), they generate unwanted CrUX data. I have seen domains penalized by poor Core Web Vitals coming 80% from their poorly optimized staging.

Another vicious case: old migrated URLs that are still bookmarked or shared internally. They can remain active in CrUX for months after migration if they continue to receive residual traffic, dragging obsolete metrics along.

Attention: If you have sensitive pages (admin, internal tools) publicly accessible but "hidden" by robots.txt, they are likely feeding CrUX. Check in PageSpeed Insights if they are returning data — and if so, protect them with authentication or IP restriction.

Practical impact and recommendations

How can you identify the pages that are polluting your Core Web Vitals?

The first step is to audit all publicly accessible URLs on your domain, including those you think are "hidden". Use a crawler configured to ignore robots.txt (Screaming Frog in "ignore robots.txt" mode) and check which pages respond with a 200 status.

Then, test each one in PageSpeed Insights. If CrUX data appears (under the "Discover what your real users are experiencing" tab), it means this page is generating metrics. Cross-reference with your server logs to identify the source of traffic: internal, partners, residual links.

What concrete actions can block CrUX pollution?

Robots.txt and noindex are useless here. The only effective method is to block HTTP access itself. For dev/staging environments, implement HTTP Basic authentication (htpasswd) or, better yet, a strict IP whitelisting.

For internal pages accessible to employees, consider a SSO authentication system or access via VPN. The goal: no "standard" Chrome traffic should reach these pages. Caution: even cookie-restricted access can generate CrUX data if the browser is not configured to block usage reporting.

How can you monitor the evolution of your domain metrics?

Set up an automated CrUX monitoring via the public API or tools like CrUX Dashboard. Pay close attention to unexplained degradation spikes not explained by your deployments on the main pages. A CWV that declines without apparent changes on your priority URLs may signal pollution from a hidden page.

Regularly check the list of URLs reported in Search Console (Core Web Vitals report). If blocked or outdated pages appear, it means they are still receiving Chrome traffic. Trace the source and cut it off at the root.

  • Audit all publicly accessible URLs, including those "hidden" by robots.txt
  • Test each suspicious page in PageSpeed Insights to check if it generates CrUX data
  • Block HTTP access to dev/staging environments via authentication or IP whitelisting
  • Protect internal pages with SSO or VPN rather than just robots.txt
  • Set up automated CrUX monitoring to detect unexplained degradations
  • Monitor the Core Web Vitals report in Search Console to identify pesky URLs
CrUX data collection completely bypasses classical SEO directives. Only a strict HTTP access control prevents pollution from pages not intended for the public. This architecture can quickly become complex to manage, especially in multi-environment ecosystems. If you identify sources of pollution or if your current infrastructure does not allow for this level of granularity, consulting a specialized SEO agency can help you map out risks and implement a suitable isolation strategy for your technical context.

❓ Frequently Asked Questions

Si je bloque une page avec robots.txt, Google peut-il quand même utiliser ses Core Web Vitals ?
Oui. Les Core Web Vitals proviennent de CrUX, alimenté par les navigateurs Chrome des utilisateurs réels. Robots.txt bloque seulement Googlebot, pas la collecte côté navigateur. Si la page reçoit du trafic Chrome, elle génère des métriques CWV.
Une page en noindex peut-elle impacter mes Core Web Vitals globaux ?
Absolument. Noindex empêche l'indexation dans les résultats de recherche, mais n'empêche pas Chrome de collecter les données de performance. Si cette page a de mauvais CWV et du trafic, elle pollue potentiellement les métriques de votre domaine.
Comment empêcher totalement la collecte de données CrUX sur certaines pages ?
La seule méthode fiable est de bloquer l'accès HTTP : authentification (htpasswd, SSO), restriction IP, ou VPN. Si aucun navigateur Chrome ne peut charger la page, elle ne génère aucune donnée CrUX.
Mon environnement de staging accessible par URL obscure peut-il affecter mon SEO ?
Oui, si des utilisateurs Chrome y accèdent régulièrement (équipe interne, clients testeurs). Même non indexé, il génère des Core Web Vitals qui peuvent impacter la perception globale de votre domaine par Google.
Comment savoir si une page bloquée génère quand même des données CrUX ?
Testez-la dans PageSpeed Insights. Si l'onglet "données de terrain" (CrUX) affiche des métriques, c'est qu'elle reçoit du trafic Chrome suffisant pour alimenter la base. Vérifiez ensuite vos logs serveur pour identifier la source.
🏷 Related Topics
Crawl & Indexing AI & SEO Web Performance

🎥 From the same video 27

Other SEO insights extracted from this same Google Search Central video · duration 1h07 · published on 28/01/2021

🎥 Watch the full video on YouTube →

Related statements

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.