What does Google say about SEO? /
Quick SEO Quiz

Test your SEO knowledge in 5 questions

Less than a minute. Find out how much you really know about Google search.

🕒 ~1 min 🎯 5 questions

Official statement

To disavow URLs with non-Latin characters, generate a UTF-8 file with these characters or use the punycode version of the domain name.
1:36
🎥 Source video

Extracted from a Google Search Central video

⏱ 57:44 💬 EN 📅 10/01/2017 ✂ 11 statements
Watch on YouTube (1:36) →
Other statements from this video 10
  1. 3:51 Faut-il vraiment respecter la casse et la syntaxe des balises noindex et nofollow ?
  2. 4:49 Le .com handicape-t-il vraiment votre géociblage international ?
  3. 6:54 Pertinence et qualité du contenu : Google les évalue-t-il vraiment séparément ?
  4. 8:27 Les mots localisés dans vos URL influencent-ils vraiment votre classement Google ?
  5. 13:18 Blog en sous-domaine ou sous-répertoire : quel impact réel sur le référencement ?
  6. 18:20 Les interstitiels mobiles peuvent-ils vraiment nuire à votre classement ?
  7. 24:39 Le passage en HTTPS résout-il vraiment les problèmes de filtre Panda ?
  8. 26:10 Les données structurées influencent-elles vraiment le classement Google ?
  9. 27:48 Les sous-répertoires peuvent-ils être pénalisés indépendamment du reste de votre site ?
  10. 46:24 L'indexation mobile-first change-t-elle vraiment votre stratégie SEO ?
📅
Official statement from (9 years ago)
TL;DR

Google clarifies that to disavow URLs containing non-Latin characters (Cyrillic, Chinese, Arabic, etc.), you must either encode your file in UTF-8 with the native characters or use the punycode version of the domain name. This technical clarification prevents processing errors in the disavow file that could make the disavowal ineffective. An important detail when cleaning up a link profile polluted by questionable international domains.

What you need to understand

Why is this technical detail about disavow file encoding important?

The disavow file operates on a simple principle: you list the URLs or domains that you want Google to ignore in your link profile calculation. However, technical reality becomes complex when these URLs contain non-Latin characters: Russian Cyrillic, Chinese ideograms, Arabic alphabet, Greek, Thai characters, etc.

Without proper encoding, Google cannot correctly interpret the domain you are trying to disavow. The file will be technically accepted, but incorrectly encoded lines will be simply ignored. You think you've cleaned up your profile, but toxic links continue to be counted.

What is the difference between UTF-8 and punycode?

UTF-8 is a character encoding standard that allows for the direct display of native characters (e.g., мойсайт.рф). It's the most human-readable method. You copy-paste the domain as it appears in your backlink tool, save the text file in UTF-8 (not ANSI or ASCII), and Google processes it correctly.

Punycode is the ASCII representation of internationalized domain names (IDN). It transforms мойсайт.рф into xn--80aqu.xn--p1ai. This version works universally without the risk of encoding errors but is less intuitive to visually check. Both methods are accepted by Google; it's a matter of technical preference.

In what contexts do we typically encounter this problem?

This case typically arises during negative SEO campaigns targeting a site with thousands of spam links from automatically created Russian, Chinese, or Arabic domains. You obtain an export from Search Console or Ahrefs that contains these exotic domains, and you build your disavow file.

The classic error: opening the file in Notepad on Windows, copy-pasting the domains, and saving without checking the encoding. The file saves in ANSI by default, non-Latin characters become question marks or squares, and the disavow fails silently. You wait weeks thinking Google is processing your request, while technically nothing is happening.

  • Disavow files must be encoded in UTF-8 to preserve non-Latin characters
  • The punycode format (xn--) is a safe alternative that avoids encoding issues
  • A poorly encoded disavow is accepted without error by Google but remains ineffective on corrupted lines
  • The affected domains are typically Russian (.ru, .рф), Chinese (.cn, .中国), Arabic, or using non-Latin alphabets
  • This issue mainly affects sites that are victims of mass international spam

SEO Expert opinion

Is this recommendation aligned with field observations?

Absolutely. SEOs managing multilingual sites or who have faced negative SEO attacks from Russian or Asian networks know this pitfall well. The silent failure rate on poorly encoded disavow files is significantly high, especially for less technical practitioners using basic text editors.

Mueller's clarification is welcome, but it comes too late. This problem has existed since the launch of the disavow tool, and Google has never provided an explicit error message when encoding fails. Search Console accepts the file, confirms the submission, and you only discover the issue months later when analyzing why your toxic link profile hasn’t decreased.

Which method should be preferred between UTF-8 and punycode?

Punycode is objectively more reliable technically. It eliminates any ambiguity in encoding and works on any system, regardless of the text editor used. This is the method I consistently recommend in agencies to avoid human errors.

UTF-8 remains valid if you perfectly control your processing chain: data extraction, file editing, encoding verification before upload. But in an automated workflow or with junior collaborators, the risk of error is real. One weak link (a poorly configured Excel export, a copy-paste into the wrong editor) and you lose weeks.

What are the unmentioned limits of this statement?

Mueller does not specify how to verify that your file is correctly encoded before submission. There is no official Google validator to test the file in advance. You submit blindly and wait for processing to occur, with no feedback on potentially ignored lines. [To verify]: no official data on the processing time for a disavow file containing thousands of international domains.

Another point: the statement assumes you correctly identify the domains to disavow. However, with non-Latin characters, the risk of typosquatting or visual confusion between similar characters (homoglyphs) increases dramatically. A Latin

Practical impact and recommendations

How to technically prepare a disavow file with non-Latin characters?

Recommended method: use an advanced text editor like Notepad++, Sublime Text, or VS Code that clearly displays the file encoding. Avoid using the standard Windows Notepad or Mac TextEdit without prior configuration. When saving, explicitly select UTF-8 without BOM (the BOM can cause parsing issues with some Google systems).

If you choose punycode, use an online IDN converter (punycoder.com or equivalent) to transform each domain before incorporating it into the file. This method eliminates any risk of encoding errors and ensures maximum compatibility. You get a purely ASCII file that any system can read, without dependency on encoding.

What mistakes should be absolutely avoided?

The number one mistake: copy-pasting directly from Excel or Google Sheets into a text file without checking the encoding. Spreadsheets handle special characters poorly during text export, and you risk a silent corruption of non-Latin characters. If you work from a spreadsheet, export in CSV UTF-8, then open this CSV in an appropriate editor to create your disavow file.

The second common mistake: mixing UTF-8 and punycode formats in the same file. Technically, Google should handle both, but to avoid any ambiguity, choose one method only and apply it consistently across the whole file. Consistency also facilitates future maintenance of the file when you need to add or remove domains.

How can you check if the disavow has been correctly processed?

No explicit confirmation is provided by Google beyond the generic message acknowledging receipt of the file. The only reliable method: monitor the evolution of your link profile in Search Console over 4 to 8 weeks. If disavowed domains still appear in link reports with the same volume, there has likely been a processing issue.

Also, test your file locally: open it in several different editors and check that non-Latin characters display correctly everywhere. If you see question marks, squares, or strange symbols, the encoding is incorrect. Fix it before submitting to Google. This manual check takes two minutes and saves you weeks of unnecessary waiting.

  • Use a text editor that explicitly supports UTF-8 without BOM (Notepad++, Sublime Text, VS Code)
  • Convert all IDN domains to punycode using an online converter to eliminate encoding risks
  • Never export directly from Excel/Sheets without checking the encoding of the resulting file
  • Maintain format consistency: either all in native UTF-8 or all in punycode, never a mix
  • Test the file display in multiple editors before submission to catch corruptions
  • Document the version (punycode vs UTF-8) used to facilitate future updates of the disavow file
Managing a disavow file with non-Latin characters requires advanced technical expertise in character encoding, parsing international domains, and format validation. Silent errors can compromise months of link profile cleaning work. For sites facing mass international spam or managing complex multilingual markets, working with a specialized technical SEO agency can secure this critical step and avoid implementation pitfalls that delay recovery after a penalty.

❓ Frequently Asked Questions

Puis-je mélanger UTF-8 et punycode dans le même fichier disavow ?
Techniquement oui, Google devrait traiter les deux formats, mais c'est déconseillé pour éviter les ambiguïtés. Choisissez une seule méthode et appliquez-la de manière cohérente à l'ensemble du fichier pour faciliter la maintenance et réduire les risques d'erreur.
Comment convertir un domaine IDN en punycode manuellement ?
Utilisez un convertisseur en ligne comme punycoder.com ou idn-convert.com. Entrez le domaine avec caractères natifs (exemple : мойсайт.рф), l'outil génère automatiquement la version punycode (xn--80aqu.xn--p1ai) que vous pouvez copier dans votre fichier disavow.
Que se passe-t-il si mon fichier disavow est mal encodé ?
Google accepte le fichier sans message d'erreur, mais les lignes avec caractères corrompus sont silencieusement ignorées. Les domaines concernés continuent d'être pris en compte dans votre profil de liens. Vous ne découvrez le problème que des semaines plus tard en constatant l'absence d'effet du désaveu.
Notepad sous Windows convient-il pour créer un fichier disavow UTF-8 ?
Non, déconseillé. Notepad standard enregistre par défaut en ANSI, ce qui corrompt les caractères non-latins. Utilisez Notepad++ (gratuit) qui permet de sélectionner explicitement l'encodage UTF-8 sans BOM lors de la sauvegarde.
Les outils de backlinks affichent-ils toujours correctement les domaines IDN ?
Non, c'est incohérent. Certains outils affichent le punycode, d'autres l'UTF-8 natif. Quand vous fusionnez des exports de sources multiples (Ahrefs + Majestic + Search Console), vous risquez des doublons ou des incohérences. Normalisez tous les domaines en punycode avant de construire votre fichier disavow pour garantir l'unicité.
🏷 Related Topics
AI & SEO JavaScript & Technical SEO Domain Name PDF & Files

🎥 From the same video 10

Other SEO insights extracted from this same Google Search Central video · duration 57 min · published on 10/01/2017

🎥 Watch the full video on YouTube →

Related statements

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.