What does Google say about SEO? /
Quick SEO Quiz

Test your SEO knowledge in 5 questions

Less than a minute. Find out how much you really know about Google search.

🕒 ~1 min 🎯 5 questions

Official statement

If you are blocking access to pages based on the referrer, consider noindexing them. For confidential content, server-side authentication is recommended.
23:44
🎥 Source video

Extracted from a Google Search Central video

⏱ 47:39 💬 EN 📅 12/01/2016 ✂ 25 statements
Watch on YouTube (23:44) →
Other statements from this video 24
  1. 2:06 Le rel=canonical suffit-il vraiment pour gérer les tests A/B en SEO ?
  2. 2:06 Faut-il vraiment utiliser rel=canonical sur vos pages de test A/B ?
  3. 3:07 Panda intégré à l'algo principal : qu'est-ce que ça change vraiment pour votre SEO ?
  4. 5:07 Panda est-il vraiment intégré au classement de base de Google ?
  5. 5:51 Pourquoi Google découvre-t-il soudainement des milliers de nouvelles URLs sur votre site ?
  6. 6:14 Pourquoi une multiplication soudaine d'URL peut-elle déclencher un avertissement dans Google Search Console ?
  7. 6:49 Les mises à jour de Google se déploient-elles vraiment en temps réel ?
  8. 9:26 Faut-il vraiment forcer tous ses liens internes en dofollow pour ranker ?
  9. 12:07 Les liens dofollow automatisés vers vos propres contenus sont-ils finalement autorisés par Google ?
  10. 12:29 Peut-on vraiment fusionner plusieurs sites en un seul grâce à rel="canonical" ?
  11. 13:29 Les mises à jour Google sont-elles vraiment en temps réel ou s'agit-il d'un mythe SEO ?
  12. 13:51 Faut-il utiliser le rel=canonical entre sous-domaine et domaine principal pour gérer le duplicate content ?
  13. 15:38 Les interstitiels mobiles sont-ils vraiment pénalisés par Google ?
  14. 16:55 Faut-il vraiment valider ses pages AMP pour qu'elles soient prises en compte par Google ?
  15. 19:06 L'historique de recherche fausse-t-il vraiment vos tests de positionnement SEO ?
  16. 21:37 Les algorithmes Google fonctionnent-ils vraiment de la même manière dans toutes les langues ?
  17. 22:00 Suffit-il vraiment d'ajouter la date dans le contenu WordPress pour que Google reconnaisse une mise à jour ?
  18. 22:56 L'hébergement mutualisé peut-il vraiment pénaliser votre référencement ?
  19. 25:58 Les interstitiels mobile nuisent-ils vraiment au référencement Google ?
  20. 31:46 L'historique de recherche fausse-t-il vraiment vos analyses SEO ?
  21. 32:22 Pourquoi Google ne vous prévient-il presque jamais quand un algorithme vous pénalise ?
  22. 36:59 L'hébergement mutualisé nuit-il réellement au référencement de votre site ?
  23. 40:25 Le contenu dupliqué entraîne-t-il vraiment une pénalité Google ?
  24. 48:29 Panda intégré au core : cela signifie-t-il vraiment du temps réel ?
📅
Official statement from (10 years ago)
TL;DR

Google recommends noindexing pages whose access is conditioned by the HTTP referrer. For truly confidential content, this method remains insufficient: server-side authentication is essential. The referrer is an easily bypassed filter, unsuitable for protecting sensitive data but usable for light display restrictions.

What you need to understand

Is the HTTP referrer a reliable security mechanism?

The HTTP referrer is the URL from which a visitor originates. Some sites block access to pages if the referrer does not match an expected domain. This method aims to prevent direct access or access from unauthorized third-party sites.

Let's be clear: this is not security. The referrer can be spoofed in seconds using a browser extension, a proxy, or a simple curl command. Any configured crawler can ignore or modify it. Google itself can send requests with or without a referrer as needed.

Why does Google suggest noindexing these pages?

If Googlebot encounters a page blocked based on the referrer, it cannot access the content. Indexing becomes random: sometimes the bot arrives with a valid referrer (internal navigation), sometimes it does not (discovered via an external link). The result: orphaned pages, inaccessible content, conflicting signals.

The noindex directive avoids this instability. It clearly indicates to Google not to index the page, even if it occasionally manages to access it. It is a clean stance: either the page is indexable and accessible, or it is not.

What’s the difference with server-side authentication?

Server authentication (session, JWT token, OAuth) verifies the true identity of the user before serving the content. It systematically blocks Googlebot with an HTTP 401 or 403 code. No ambiguity: the content remains out of index.

This is the only viable method for confidential content (client area, private documents, sensitive data). The referrer protects nothing; it merely filters the display on the client side, which is insufficient when there are confidentiality stakes involved.

  • HTTP Referrer = light filtering, easily bypassed, unsuitable for sensitive data
  • Noindex = clear instruction to Google to avoid indexing pages blocked by the referrer
  • Server Authentication = the only real protection for confidential content, blocks Googlebot cleanly
  • Googlebot can arrive with or without a referrer depending on the context of the URL discovery
  • Mixing referrer blocking with standard indexing creates conflicting signals

SEO Expert opinion

Does this recommendation truly reflect observed real-world practices?

Yes, and it's rare enough to be highlighted. We regularly see sites that block pages based on the referrer while keeping them indexable. The result: pages that appear then disappear from the index, crawl rate skyrocketing on inaccessible URLs, skewed acquisition channels in Analytics.

Google’s recommendation is consistent with what we observe: either you accept indexing and make the page accessible, or you noindex properly. Intermediate situations create noise in logs, waste crawl budget, and repeated soft 404 errors.

When is blocking by referrer still relevant?

The referrer still has a use for filtering display without blocking access. For example: displaying a lightbox or interstitial depending on the source, adapting the UI for direct vs referral traffic, or limiting embedding via iframe. It's UX control, not security.

But as soon as it comes to truly preventing access (paid content, member space, confidential documents), the referrer fails. A trainee with Firefox Developer Edition can bypass this in 30 seconds. For these cases, server authentication is the only viable option.

Is noindex enough to protect sensitive content?

No, and that’s where the nuance matters. Noindex prevents indexing, not access. If the URL is discovered (external link, sharing, aggressive scanning), anyone can access it directly if the only filter is the referrer. The content remains exposed.

For truly confidential content, server authentication is non-negotiable. Noindex merely clarifies the SEO stance of a page already blocked on the client side. If the page contains sensitive data, it must return a 401/403 before even serving the HTML [To verify: impact on discoverability of legitimate linked pages].

Practical impact and recommendations

What should you audit on an existing site?

Start by identifying all pages subject to referrer blocking. Look in the server code (Apache .htaccess, Nginx conf, application middleware) for rules that test HTTP_REFERER. Cross-reference with the URLs indexed in Google Search Console to detect inconsistencies.

Then, classify these pages according to their nature: public content but conditionally displayed (lightbox, interstitial), semi-private content (limited access but not confidential), sensitive content (client area, personal data). The strategy differs radically depending on the case.

How to correct a currently indexed page blocked by the referrer?

If the page needs to remain accessible and indexable, remove the referrer block. Either it’s public and you accept direct access, or it’s not and you switch to server authentication. No halfway measures.

If it should not be indexed, add the noindex directive in meta robots and keep the referrer block only if it’s for UX, not for security. Then check in GSC that Google is gradually deindexing these URLs. Crawling will continue, but the index will clean itself.

What critical mistakes must be absolutely avoided?

Never block Googlebot by referrer while hoping for normal indexing. This creates a zombie index: pages discovered via sitemap or internal links, but inaccessible to crawl. Google eventually marks them as errors or deindexes them without warning.

The second trap: using the referrer as the sole protection for paid or confidential content. It’s a sieve. Any scraping tool bypasses this by default. If the content is valuable or needs to remain private, server authentication is not optional.

  • Audit server rules filtering HTTP_REFERER and cross-check with Google index
  • Classify blocked pages: light UX, semi-private, or truly confidential
  • Add noindex to any page blocked by referrer intended to remain out of index
  • Migrate to server authentication (401/403) for any sensitive or paid content
  • Check in GSC for progressive deindexing after adding noindex
  • Never mix referrer blocking with standard indexing for public content
Referrer blocking remains a UX tool, not a security measure. To clarify the SEO stance, noindex filtered pages. To truly protect content, switch to server authentication. These technical choices can quickly become complex on a medium-sized site or one with a heavy history. If the situation requires a thorough audit and a redesign of access architecture, consulting a specialized SEO agency can speed up compliance and avoid costly visibility errors.

❓ Frequently Asked Questions

Googlebot envoie-t-il systématiquement un referer lors du crawl ?
Non. Googlebot peut arriver sans referer (découverte via sitemap, lien externe sans referer transmis, accès direct). Le blocage referer peut donc empêcher l'indexation de façon aléatoire.
Le noindex empêche-t-il l'accès au contenu d'une page ?
Non, il empêche uniquement l'indexation dans les résultats de recherche. N'importe qui connaissant l'URL peut toujours y accéder si aucune authentification serveur n'est en place.
Puis-je bloquer Googlebot par referer tout en indexant la page via sitemap ?
Techniquement oui, mais c'est incohérent. Google découvrira l'URL via sitemap mais ne pourra pas crawler le contenu. La page sera marquée en erreur ou ignorée.
Quelle différence entre 401, 403 et blocage referer côté SEO ?
401/403 sont des codes HTTP serveur qui bloquent proprement Googlebot et empêchent l'indexation. Le blocage referer est côté client, facilement contournable, et crée des signaux ambigus pour le crawler.
Le blocage referer impacte-t-il le crawl budget ?
Oui. Si Googlebot tente de crawler des pages bloquées par referer de façon répétée, il gaspille du crawl budget sur des URLs inaccessibles, au détriment de pages réellement indexables.
🏷 Related Topics
Domain Age & History Content Crawl & Indexing Web Performance

🎥 From the same video 24

Other SEO insights extracted from this same Google Search Central video · duration 47 min · published on 12/01/2016

🎥 Watch the full video on YouTube →

Related statements

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.