Official statement
Other statements from this video 16 ▾
- 1:33 La structure hiérarchique améliore-t-elle vraiment le référencement par rapport à une architecture plate ?
- 2:38 La refonte de navigation fait-elle vraiment perdre du ranking ?
- 3:44 Pourquoi Google conserve-t-il les URLs 404 dans Search Console pendant des années ?
- 4:24 Peut-on injecter les balises vidéo en JavaScript sans pénalité SEO ?
- 4:44 Google recadre-t-il automatiquement vos images de recettes si vous ne fournissez pas les bons formats ?
- 5:42 Comment Google adapte-t-il l'affichage AMP selon les capacités techniques du navigateur ?
- 5:45 Faut-il vraiment remplir les dates de modification dans vos sitemaps XML ?
- 8:42 Les iframes sont-elles vraiment neutres pour le SEO ou faut-il s'en méfier ?
- 9:03 Google peut-il faire pointer les backlinks de vos concurrents vers votre PDF ?
- 12:26 Le contenu dupliqué cross-domain est-il vraiment sans risque pour votre SEO ?
- 17:20 Faut-il vraiment supprimer vos vieux contenus pour améliorer votre SEO ?
- 42:28 Faut-il limiter le nombre de liens sortants vers un même domaine pour éviter une pénalité Google ?
- 43:33 Pourquoi Google met-il plus de temps à indexer un simple changement de title ?
- 45:35 Comment Google calcule-t-il vraiment le crawl budget de votre site ?
- 47:48 Pourquoi Google n'indexe-t-il qu'une seule langue si votre site switche via JavaScript ?
- 50:53 Faut-il s'inquiéter quand le nombre de pages indexées fluctue de 50% en quelques jours ?
The nofollow, sponsored, and UGC attributes block the transfer of signals (PageRank, anchors) but do not guarantee that Google will ignore the link during crawling. To completely prevent Googlebot from following a URL, robots.txt remains the go-to tool. A hybrid technique involves redirecting these links to a directory blocked by robots.txt, providing granular control of crawl budget without polluting the directives file.
What you need to understand
What’s the difference between blocking signals and blocking crawling?
When you add rel="nofollow" (or its variants sponsored/UGC) to a link, you are asking Google not to transfer PageRank or use the anchor text as a relevance signal. It’s a directive about signal handling, not a crawling instruction.
But here’s the catch: Googlebot can still discover and crawl the target URL. The bot explores the web opportunistically — it sees a URL, it notes it, and based on its schedule, it may decide to visit it. Nofollow is not a technical lock that physically denies access.
Why does this nuance cause practical problems?
Because many SEOs believe that nofollow = URL invisible to Google. The result: pages thought to be off-radar end up indexed, consuming crawl budget, or revealing URL structures that we preferred to keep private.
Specifically? If you apply nofollow to links leading to facet filters, sort pages, or session URLs, Google can still crawl them. You save PageRank, sure, but you do not protect your technical architecture.
How can you actually block a link from being crawled?
The official method: robots.txt. You declare a directory or URL pattern as Disallow, and Googlebot will respect this directive (except for rare exceptions, like URLs already indexed with strong external backlinks).
Google also suggests a clever intermediate approach — redirecting your “questionable” links to a path blocked by robots.txt (e.g., /blocked-crawl/). The HTML link remains clickable for users if needed, but the bot comes to an immediate halt. This is particularly useful for utility links (logout, filters, printable versions) where nofollow alone is not sufficient.
- Nofollow/sponsored/UGC: blocks the transfer of signals (PageRank, anchors) but not crawling
- Robots.txt: blocks crawling but does not prevent indexing if strong external backlinks exist
- Redirecting to a blocked directory: hybrid solution for granular crawl control without polluting robots.txt
- A nofollow link can still appear in server logs — it has been crawled even if not utilized for ranking
- The choice between these methods depends on your goal: saving PageRank vs. protecting crawl budget vs. masking URLs
SEO Expert opinion
Is this statement consistent with field observations?
Absolutely. Server logs have confirmed this for years: we regularly see Googlebot crawling nofollow URLs, especially if they are present on high-crawl pages (homepage, main categories). Nofollow has never been a barrier to crawling — it’s just that many practitioners confused the two mechanisms.
The real question is why Google crawls these links despite the nofollow. Likely hypothesis: the bot wants to map the entire link graph to detect manipulation patterns, identify site networks, or simply discover new URLs before deciding whether to index them. Nofollow tells “don’t exploit this signal,” not “ignore this URL.”
Is the redirect technique risk-free?
On paper, redirecting to a directory blocked by robots.txt seems clean. In practice, it adds a layer of complexity — you create artificial 301/302 redirects, which can slow down the user experience if poorly implemented (think about logout links, for instance).
Another point: if you redirect to /blocked-crawl/ and then block that directory, Google will not crawl the final target… but it will still see the initial redirect. It remains in the logs as a crawl attempt. For pure crawl budget reasons, it’s effective. To completely mask a URL? Less certain. [To be checked]: the exact impact on crawl budget when thousands of links point to blocked redirects — Google could consider this noise.
When should you really care about this distinction?
Let’s be honest: for 80% of sites, the difference between nofollow and crawl blocking is negligible. If you have a WordPress blog with a few nofollow pages, Google may crawl them once a month. Nothing to panic about.
It becomes critical on large sites: e-commerce with millions of facets, UGC platforms with duplicate content, sites with infinite category trees. There, every unnecessary crawled URL = wasted budget. In these cases, combining nofollow (for signals) and robots.txt (for crawling) becomes a complete SEO architecture strategy.
Practical impact and recommendations
What should you audit on your site right now?
First action: analyze your server logs or Search Console (Crawl report) to identify URLs crawled but marked as nofollow. You will probably discover that Google is visiting pages you thought were protected — sort filters, internal search result pages, session URLs.
Next, cross-check this data with your actual crawl budget (pages crawled per day vs. strategic pages). If you find that 30% of the crawl is going to non-priority URLs despite nofollow, that’s a signal to switch to robots.txt or the redirect technique.
How do you choose between nofollow, robots.txt, and blocked redirection?
Use the nofollow/sponsored/UGC attribute when your goal is to avoid passing on PageRank or to prevent a manual penalty (affiliate links, sponsored content, comments). It’s sufficient for Google compliance and signal management.
Switch to robots.txt if you want to save crawl budget on entire non-strategic sections (/admin/, /api/, /print/). This is the industrial solution for high volumes.
Reserve redirection to a blocked directory for hybrid cases: links that need to remain clickable for UX (e.g., logout, currency switch) but that you want to exclude from crawling entirely. This is a solution for SEO architects, not a patch to apply everywhere.
What errors should you absolutely avoid?
NEVER block an URL you want to de-index via robots.txt — Google won’t be able to read the noindex tag. It’s the most common pitfall, especially after a migration or a cleanup of duplicate content.
Avoid mixing nofollow and canonical on the same link. If A points to B with nofollow, but B canonicalizes to C, you create contradictory signals. Google will likely sort it out, but you lose clarity and control.
- Audit server logs to identify URLs crawled despite nofollow
- Ensure that robots.txt does not block pages with noindex tags (technical contradiction)
- Test the redirect technique on a sample before mass deployment
- Document your crawl strategy: which directories are nofollow, which ones are in robots.txt, and why
- Monitor changes in crawl budget after adjustments (Search Console, Exploration Statistics report)
- For sites with complex architecture, map out the paths of prioritized vs. secondary crawls
❓ Frequently Asked Questions
Si je mets un lien en nofollow, Google peut-il quand même l'indexer ?
Quelle différence entre nofollow, sponsored et UGC au niveau du crawl ?
La technique de redirection vers répertoire bloqué ralentit-elle mon site ?
Puis-je bloquer par robots.txt une URL déjà indexée pour la faire disparaître de Google ?
Comment vérifier si Google crawle mes liens en nofollow ?
🎥 From the same video 16
Other SEO insights extracted from this same Google Search Central video · duration 55 min · published on 14/08/2020
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.