Official statement
Other statements from this video 10 ▾
- 3:51 Faut-il vraiment respecter la casse et la syntaxe des balises noindex et nofollow ?
- 4:49 Le .com handicape-t-il vraiment votre géociblage international ?
- 6:54 Pertinence et qualité du contenu : Google les évalue-t-il vraiment séparément ?
- 8:27 Les mots localisés dans vos URL influencent-ils vraiment votre classement Google ?
- 13:18 Blog en sous-domaine ou sous-répertoire : quel impact réel sur le référencement ?
- 18:20 Les interstitiels mobiles peuvent-ils vraiment nuire à votre classement ?
- 24:39 Le passage en HTTPS résout-il vraiment les problèmes de filtre Panda ?
- 26:10 Les données structurées influencent-elles vraiment le classement Google ?
- 27:48 Les sous-répertoires peuvent-ils être pénalisés indépendamment du reste de votre site ?
- 46:24 L'indexation mobile-first change-t-elle vraiment votre stratégie SEO ?
Google clarifies that to disavow URLs containing non-Latin characters (Cyrillic, Chinese, Arabic, etc.), you must either encode your file in UTF-8 with the native characters or use the punycode version of the domain name. This technical clarification prevents processing errors in the disavow file that could make the disavowal ineffective. An important detail when cleaning up a link profile polluted by questionable international domains.
What you need to understand
Why is this technical detail about disavow file encoding important?
The disavow file operates on a simple principle: you list the URLs or domains that you want Google to ignore in your link profile calculation. However, technical reality becomes complex when these URLs contain non-Latin characters: Russian Cyrillic, Chinese ideograms, Arabic alphabet, Greek, Thai characters, etc.
Without proper encoding, Google cannot correctly interpret the domain you are trying to disavow. The file will be technically accepted, but incorrectly encoded lines will be simply ignored. You think you've cleaned up your profile, but toxic links continue to be counted.
What is the difference between UTF-8 and punycode?
UTF-8 is a character encoding standard that allows for the direct display of native characters (e.g., мойсайт.рф). It's the most human-readable method. You copy-paste the domain as it appears in your backlink tool, save the text file in UTF-8 (not ANSI or ASCII), and Google processes it correctly.
Punycode is the ASCII representation of internationalized domain names (IDN). It transforms мойсайт.рф into xn--80aqu.xn--p1ai. This version works universally without the risk of encoding errors but is less intuitive to visually check. Both methods are accepted by Google; it's a matter of technical preference.
In what contexts do we typically encounter this problem?
This case typically arises during negative SEO campaigns targeting a site with thousands of spam links from automatically created Russian, Chinese, or Arabic domains. You obtain an export from Search Console or Ahrefs that contains these exotic domains, and you build your disavow file.
The classic error: opening the file in Notepad on Windows, copy-pasting the domains, and saving without checking the encoding. The file saves in ANSI by default, non-Latin characters become question marks or squares, and the disavow fails silently. You wait weeks thinking Google is processing your request, while technically nothing is happening.
- Disavow files must be encoded in UTF-8 to preserve non-Latin characters
- The punycode format (xn--) is a safe alternative that avoids encoding issues
- A poorly encoded disavow is accepted without error by Google but remains ineffective on corrupted lines
- The affected domains are typically Russian (.ru, .рф), Chinese (.cn, .中国), Arabic, or using non-Latin alphabets
- This issue mainly affects sites that are victims of mass international spam
SEO Expert opinion
Is this recommendation aligned with field observations?
Absolutely. SEOs managing multilingual sites or who have faced negative SEO attacks from Russian or Asian networks know this pitfall well. The silent failure rate on poorly encoded disavow files is significantly high, especially for less technical practitioners using basic text editors.
Mueller's clarification is welcome, but it comes too late. This problem has existed since the launch of the disavow tool, and Google has never provided an explicit error message when encoding fails. Search Console accepts the file, confirms the submission, and you only discover the issue months later when analyzing why your toxic link profile hasn’t decreased.
Which method should be preferred between UTF-8 and punycode?
Punycode is objectively more reliable technically. It eliminates any ambiguity in encoding and works on any system, regardless of the text editor used. This is the method I consistently recommend in agencies to avoid human errors.
UTF-8 remains valid if you perfectly control your processing chain: data extraction, file editing, encoding verification before upload. But in an automated workflow or with junior collaborators, the risk of error is real. One weak link (a poorly configured Excel export, a copy-paste into the wrong editor) and you lose weeks.
What are the unmentioned limits of this statement?
Mueller does not specify how to verify that your file is correctly encoded before submission. There is no official Google validator to test the file in advance. You submit blindly and wait for processing to occur, with no feedback on potentially ignored lines. [To verify]: no official data on the processing time for a disavow file containing thousands of international domains.
Another point: the statement assumes you correctly identify the domains to disavow. However, with non-Latin characters, the risk of typosquatting or visual confusion between similar characters (homoglyphs) increases dramatically. A Latin
Practical impact and recommendations
How to technically prepare a disavow file with non-Latin characters?
Recommended method: use an advanced text editor like Notepad++, Sublime Text, or VS Code that clearly displays the file encoding. Avoid using the standard Windows Notepad or Mac TextEdit without prior configuration. When saving, explicitly select UTF-8 without BOM (the BOM can cause parsing issues with some Google systems).
If you choose punycode, use an online IDN converter (punycoder.com or equivalent) to transform each domain before incorporating it into the file. This method eliminates any risk of encoding errors and ensures maximum compatibility. You get a purely ASCII file that any system can read, without dependency on encoding.
What mistakes should be absolutely avoided?
The number one mistake: copy-pasting directly from Excel or Google Sheets into a text file without checking the encoding. Spreadsheets handle special characters poorly during text export, and you risk a silent corruption of non-Latin characters. If you work from a spreadsheet, export in CSV UTF-8, then open this CSV in an appropriate editor to create your disavow file.
The second common mistake: mixing UTF-8 and punycode formats in the same file. Technically, Google should handle both, but to avoid any ambiguity, choose one method only and apply it consistently across the whole file. Consistency also facilitates future maintenance of the file when you need to add or remove domains.
How can you check if the disavow has been correctly processed?
No explicit confirmation is provided by Google beyond the generic message acknowledging receipt of the file. The only reliable method: monitor the evolution of your link profile in Search Console over 4 to 8 weeks. If disavowed domains still appear in link reports with the same volume, there has likely been a processing issue.
Also, test your file locally: open it in several different editors and check that non-Latin characters display correctly everywhere. If you see question marks, squares, or strange symbols, the encoding is incorrect. Fix it before submitting to Google. This manual check takes two minutes and saves you weeks of unnecessary waiting.
- Use a text editor that explicitly supports UTF-8 without BOM (Notepad++, Sublime Text, VS Code)
- Convert all IDN domains to punycode using an online converter to eliminate encoding risks
- Never export directly from Excel/Sheets without checking the encoding of the resulting file
- Maintain format consistency: either all in native UTF-8 or all in punycode, never a mix
- Test the file display in multiple editors before submission to catch corruptions
- Document the version (punycode vs UTF-8) used to facilitate future updates of the disavow file
❓ Frequently Asked Questions
Puis-je mélanger UTF-8 et punycode dans le même fichier disavow ?
Comment convertir un domaine IDN en punycode manuellement ?
Que se passe-t-il si mon fichier disavow est mal encodé ?
Notepad sous Windows convient-il pour créer un fichier disavow UTF-8 ?
Les outils de backlinks affichent-ils toujours correctement les domaines IDN ?
🎥 From the same video 10
Other SEO insights extracted from this same Google Search Central video · duration 57 min · published on 10/01/2017
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.