What does Google say about SEO? /
Quick SEO Quiz

Test your SEO knowledge in 5 questions

Less than a minute. Find out how much you really know about Google search.

🕒 ~1 min 🎯 5 questions

Official statement

Google accepts the use of non-English words in URLs, especially for sites targeting non-English-speaking regions. Domain names can include non-Latin characters via punycode encoding. The rest of the URL can use Unicode, and both versions are treated equally by Google. Use hyphens to separate words in your URLs.
0:36
🎥 Source video

Extracted from a Google Search Central video

⏱ 2:09 💬 EN 📅 21/03/2018
Watch on YouTube (0:36) →
📅
Official statement from (8 years ago)
TL;DR

Google treats URLs containing Unicode characters equally to their encoded versions, whether in punycode for domain names or percent-encoding for the rest of the URL. For sites targeting non-English-speaking markets, this flexibility allows for optimizing local readability without ranking penalties. The real challenge is verifying the actual impact on CTR and any technical limitations during crawling or indexing in some edge cases.

What you need to understand

What does Google really mean by 'equivalence' between Unicode and encoding?

When Google states that both versions of a URL are treated equivalently, it means that the ranking algorithm neither favors nor penalizes either form. A URL containing 'café' will be treated as 'caf%C3%A9' on the server side.

This equivalence mainly relates to the indexing process and relevance calculation. Technically, browsers and crawlers automatically convert Unicode characters into their encoded representation during HTTP requests. The key point: Google normalizes these variations to prevent the creation of artificial duplicate content.

How does punycode work for domain names?

Punycode is a domain-specific encoding system that allows for non-ASCII characters in extensions and subdomains. For instance, 'münchen.de' becomes 'xn--mnchen-3ya.de' at the DNS level.

This conversion occurs seamlessly for the end user in the address bar. From an SEO perspective, the tricky part: backlinks can point to either version, but Google generally consolidates them. However, be cautious with analysis tools that don't always handle this duality correctly.

Why does Google insist on hyphens as separators?

Hyphens remain the recognized separator by the algorithm for isolating keywords in a URL, regardless of the language. This has been a constant for years. Underscores do not reliably fulfill this role.

In a multilingual context, this rule carries even more weight. If you use Chinese or Arabic characters in your slugs, natural spaces between words may not always exist. The hyphen then becomes the only means of clearly indicating semantic segmentation to the algorithm.

  • Google normalizes Unicode and encoding to avoid technical duplicate content
  • Punycode is mandatory for domain names with non-ASCII characters
  • Hyphens remain the standard for separating keywords across all alphabets
  • No ranking penalty is linked to the choice between Unicode and encoding according to this statement
  • URL readability can impact CTR in SERPs in certain markets

SEO Expert opinion

Is this treatment equivalence truly complete in all cases?

On paper, Mueller's assertion aligns with what we've observed for several years. Tests indeed show that Google indexes and ranks Unicode URLs correctly. However, total equivalence deserves nuance.

The first point: third-party tools and some social platforms poorly handle encoded URLs. When you share a URL with %C3%A9, it often remains in this awkward form in the shared link. The second point, more technical: some old servers or CDNs may misinterpret encoding, creating sporadic 404 errors. [To verify] the real impact of these edge cases on crawl budget in complex environments.

Does using Unicode actually improve CTR in practice?

The theory: a URL readable in the local language should enhance CTR from the SERPs. The few available A/B tests show mixed results, highly dependent on the market. In languages such as Japanese or Russian, the effect seems marginal.

The fundamental issue: Google sometimes displays the encoded URL in the breadcrumb shown in the SERP, even if the source URL is in Unicode. Result: the anticipated UX advantage does not always materialize. Without consolidated large-scale data, it's hard to make a definitive conclusion. My field advice: test on a limited sample of pages before migrating en masse.

What technical risks are underestimated with non-ASCII URLs?

The main risk: the fragmentation of backlink signals. Some sites will link to the Unicode version, while others link to the encoded version. Google claims to consolidate them, but in reality, tools like Ahrefs or Majestic may count them separately, distorting your analyses.

The second risk: migrations and redirections become more complex. If you need to transition from one URL structure to another, managing regex with Unicode characters in .htaccess or Nginx files can quickly become a nightmare. Mapping errors are common. Finally, some CMS or e-commerce frameworks encode in unpredictable ways based on their local configuration.

Attention: If you manage a multilingual site with hreflang, ensure your Unicode URLs are correctly declared in the XML sitemap. Some automatic generators encode everything by default, creating inconsistencies with your canonical tags.

Practical impact and recommendations

Should you migrate your existing URLs to Unicode or stick with ASCII?

If your site is already functioning well with transliterated ASCII URLs (for example, 'moskva' instead of 'москва'), don't change anything without a clear strategic reason. The ROI of a URL migration is rarely obvious, especially if you lose historical signals along the way.

However, if you are launching a new website or a new section targeting a strong local market (Russia, Japan, Arab countries), it makes sense to opt for Unicode directly for alignment with user queries. Test first on a limited section and measure the impact on organic traffic and actual CTR before generalizing.

How can you manage Unicode URLs without breaking your technical infrastructure?

Your first reflex: check that your technical stack supports UTF-8 end-to-end. Database, web server, CMS, CDN—all must be configured to handle encoding without wild conversions. Inconsistencies create sneaky bugs that clutter logs and crawling.

Next, standardize your approach in sitemaps and configuration files. Choose one representation (Unicode or encoded) and stick to it in all your XML files, robots.txt, and hreflang declarations. Google normalizes, of course, but it's better to avoid giving it unnecessary work. Finally, test your 301 redirections with tools that handle encoding correctly.

What mistakes should you absolutely avoid with multilingual URLs?

A classic mistake: mixing hyphens and underscores in local slugs. Some developers believe that underscores work better in certain languages. Incorrect. Hyphens remain the universal standard for keyword segmentation, regardless of language.

The second trap: forgetting to declare encoding in HTTP headers. If your server doesn't explicitly send charset=UTF-8, some browsers or bots may misinterpret special characters. The third mistake: not monitoring 404 errors related to encoding issues. Set up alerts for suspicious patterns in your server logs.

  • Ensure that the entire technical stack natively supports UTF-8
  • Standardize the representation of URLs in sitemaps and hreflang
  • Test redirections with tools managing Unicode encoding
  • Use exclusively hyphens as word separators
  • Monitor 404 errors related to encoding issues
  • Validate consistency between canonicals and declared URLs
The use of non-English characters in URLs is technically supported by Google without penalty, but it requires a heightened technical rigor and meticulous validation of each element of your infrastructure. The UX and SEO benefits remain to be proven on a case-by-case basis depending on your market. For complex multilingual sites or delicate technical migrations, support from a specialized SEO agency can help avoid costly mistakes in crawl budget and loss of historical signals, especially if your internal technical team isn't accustomed to managing Unicode encoding at scale.

❓ Frequently Asked Questions

Google pénalise-t-il les URLs avec des caractères accentués ou non-latins ?
Non, selon Mueller, Google traite les URLs Unicode de manière équivalente à leurs versions encodées, sans pénalité de ranking. L'algorithme normalise ces variations automatiquement.
Dois-je encoder manuellement mes URLs ou laisser les caractères Unicode bruts ?
Les deux approches fonctionnent. Les navigateurs et crawlers convertissent automatiquement Unicode en encodage lors des requêtes HTTP. Choisis la version la plus cohérente avec ton infrastructure technique.
Les backlinks vers la version encodée comptent-ils pour l'URL Unicode ?
Google affirme consolider les deux versions, mais certains outils d'analyse tiers peuvent les compter séparément. Cela peut fausser tes métriques de profil de liens sans impacter le ranking réel.
Le punycode pour les noms de domaine affecte-t-il le SEO différemment ?
Non, le punycode est simplement la représentation technique des caractères non-ASCII dans les noms de domaine. Google traite "münchen.de" et "xn--mnchen-3ya.de" comme identiques au niveau du ranking.
Les URLs Unicode améliorent-elles le CTR en SERP ?
Les données terrain sont mitigées. Google affiche parfois l'URL encodée même si la source est Unicode, annulant l'avantage UX théorique. Teste sur un échantillon avant de généraliser.
🏷 Related Topics
Domain Age & History AI & SEO JavaScript & Technical SEO Domain Name

Related statements

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.