Why will your hash (#) URLs never be indexed by Google?

Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

URLs containing a hash (#) cannot be crawled or indexed by Google. For temporary content (e.g., sports match) to be findable in search before or during the event, clean routes without a hash must be used. It is normal for these URLs to return a 404 after the event, at which point Google will remove them from the index.

19:51

🎥 Source video

Extracted from a Google Search Central video

⏱ 28:49 💬 EN 📅 01/07/2020 ✂ 23 statements

Watch on YouTube (19:51) →

✂ Other statements from this video 22 ▾

📅

Official statement from July 1, 2020 (5 years ago)

⚠ A more recent statement exists on this topic Do URL hash fragments (#) create separate pages in Google's eyes? John Mueller · October 26, 2022 View statement →

TL;DR

Google clearly states that URLs containing a hash (#) cannot be crawled or indexed. For temporary content like a sports event, it is essential to use clean routes without a hash. After the event, a 404 is perfectly acceptable: Google naturally removes these URLs from the index without penalty.

What you need to understand

Are hash URLs truly invisible to Googlebot?

Yes, and this is a fundamental principle of how the web operates. The hash (#) has historically been used for client-side intra-page navigation: the browser never sends it to the server during an HTTP request. Googlebot, which operates like a standard HTTP client, never sees what follows the hash.

In practical terms for Google, example.com/match#live123 and example.com/match are strictly identical. The engine crawls the URL without the fragment, renders the page, but cannot distinguish the different states of a JavaScript application that relies on hashes to create distinct "routes".

Why do so many sites still use hashes to route content?

Because it's simple, quick to implement, and historic JavaScript frameworks (Angular.js in hashbang mode, React Router in hash mode) have popularized it. There's no need for server configuration, no management of rewrites: everything happens on the client-side.

The problem? This technical convenience comes at a cost to SEO visibility. A site generating thousands of hash URLs for product pages, articles, or temporary events creates invisible content for search engines. It's a complete blind spot.

What really happens to temporary content after the event?

Martin Splitt points out a crucial point: returning a 404 after the event is normal and acceptable. Google then removes the URL from the index without viewing it as an error or a negative signal. This aligns with the ephemeral nature of the content.

This approach is healthy: it avoids accumulating dead pages in the index and clarifies the intent. A match on March 15 no longer needs to be indexed on March 20. The 404 is an honest response, not a technical flaw.

Hash URLs (#) are never sent to the server so Googlebot cannot crawl or index them
For indexable temporary content, you need to use clean routes without a hash (e.g., /match/live-123)
A 404 after the event is a healthy practice: Google naturally removes the URL from the index
Modern JavaScript frameworks support history mode (HTML5 pushState) which generates clean URLs
Never confuse ease of development with SEO compatibility: hashes represent a complete blind spot

SEO Expert opinion

Is this statement consistent with field observations?

Yes, absolutely. This is one of the few points where Google is perfectly clear and aligned with the technical reality of the web. Hashes are never sent in HTTP requests: this is a W3C standard, not an arbitrary decision by Google.

I have audited hundreds of SPA (Single Page Application) sites, and the pattern is always the same: clients who have left hashes in their routes generate invisible content. When switching to history mode, indexing skyrockets within a few weeks. It is reproducible, measurable, and systematic.

What nuances should be applied to this statement?

The only nuance concerns hash fragments used for Ajax crawling (the hashbang #! system that Google proposed and then abandoned in 2015). Some very old sites still use it, but it is technically obsolete and Google no longer officially supports it.

The second point: Google can theoretically execute JavaScript that manipulates the URL after the initial rendering. But in practice, if your content relies on a hash to be displayed, the probability that it is correctly indexed is nearly zero. Don't count on hypothetical heuristics.

In what cases does this rule pose problems in production?

The classic case: a React/Vue app that uses hash mode by default to avoid server configuration. The dev team delivers quickly, the content works perfectly in client-side navigation, but Google sees nothing. The SEO discovers the problem three months after the launch.

Another common situation: dynamically generated temporary events (webinars, matches, flash promotions). If the technical team took the shortcut of a hash to avoid touching the backend infrastructure, the content remains invisible. Result: zero organic traffic during the event, exactly when the search peak is highest.

Note: Switching from hash mode to history mode requires clean server configuration (rewrites, 404 management). If you switch without preparing the infrastructure, you will generate 404 errors on all your routes. Test in staging before deploying to production.

Practical impact and recommendations

What concrete steps should be taken to make temporary content indexable?

First step: audit your current routes. If you are using a JavaScript framework, check the configured routing mode. React Router, Vue Router, Angular: all offer a history mode (HTML5 pushState) that generates clean URLs without a hash.

Next, configure your server so that all routes return your main HTML (index.html or equivalent). This is the principle of universal fallback: Nginx, Apache, Netlify, Vercel all have standard configurations for this. Without this fallback, a manual refresh on /match/live-123 returns a server 404.

How to properly manage the end of life of temporary content?

The approach recommended by Martin Splitt is simple: return a true 404 once the event is over. Not a soft 404, not a redirect to the homepage: a clear HTTP 404.

You can soften the user experience with a custom 404 page suggesting similar upcoming events, but the HTTP code must be 404. Google will interpret the signal correctly and will remove the URL from the index within a few days or weeks, depending on your site's crawl frequency.

What mistakes should absolutely be avoided in this context?

Mistake number 1: using hashes for content you want to index. It’s a complete blind spot; no heuristic will come to your rescue. Don't rely on Google's JavaScript execution to circumvent this structural problem.

Mistake number 2: allowing temporary content URLs to return 200 with a message "event ended". This is a soft 404; Google will take longer to deindex, and you pollute your index with pages of no value. A 404 is an honest and effective response.

Check your JavaScript framework's routing mode (hash vs history)
Configure the server fallback so that all routes return the main HTML
Test direct navigation (manual refresh) on a deep route to validate the configuration
Return a clear HTTP 404 once the temporary content has expired
Monitor deindexing through Search Console to validate that Google is indeed removing the URLs
Document the logic of generation/expiration of temporary URLs for the tech teams

For indexable temporary content, always prioritize clean routes without hashes. Configure your server to support history mode, and return a clear 404 after the event. These technical adjustments, although simple in theory, may require coordination among dev, ops, and SEO teams. If your infrastructure is complex or if you lack internal resources, considering support from a specialized SEO agency can accelerate compliance and avoid costly visibility errors.

❓ Frequently Asked Questions

Les URLs avec hash sont-elles vraiment jamais crawlées par Google ?

Correct. Le hash (#) est un fragment côté client qui n'est jamais envoyé au serveur dans une requête HTTP. Googlebot ne le voit donc jamais, et ne peut ni crawler ni indexer ce qui suit le hash.

Peut-on utiliser des hash pour de la navigation interne sans impact SEO ?

Oui, si vous utilisez les hash uniquement pour des ancres intra-page (scroll vers une section), il n'y a aucun problème. Le souci survient quand les hash servent à router du contenu distinct que vous voulez indexer.

Comment basculer d'un routage hash vers un routage history propre ?

Dans React Router, Vue Router ou Angular, changez le mode de 'hash' à 'history'. Ensuite, configurez votre serveur (Nginx, Apache, etc.) pour qu'il retourne toujours votre index.html sur toutes les routes, sinon un refresh manuel génère un 404 serveur.

Un 404 après un événement temporaire nuit-il au référencement global du site ?

Non. Google comprend parfaitement que du contenu temporaire expire. Un 404 est une réponse honnête qui permet à Google de retirer proprement l'URL de l'index sans signal négatif pour le reste du site.

Combien de temps Google met-il à désindexer une URL qui retourne un 404 ?

Cela dépend de la fréquence de crawl de votre site. Pour un site bien crawlé, quelques jours à quelques semaines. Vous pouvez accélérer le processus via l'outil de suppression d'URL dans Search Console si nécessaire.

🏷 Related Topics

indexation crawl JavaScript SEO URL structure SPA 404 contenu temporaire routing

Content Crawl & Indexing AI & SEO Domain Name

🎥 From the same video 22

Other SEO insights extracted from this same Google Search Central video · duration 28 min · published on 01/07/2020

🎥 Watch the full video on YouTube →

Related statements

« Previous

Conditional redirection to www to avoid CORB error...

Content behind login is invisible to search engine...

« Back to results