What does Google say about SEO? /
Quick SEO Quiz

Test your SEO knowledge in 5 questions

Less than a minute. Find out how much you really know about Google search.

🕒 ~1 min 🎯 5 questions

Official statement

When assessing the damage caused by a hack, connect to the file system to check for modified or added files. Compare the checksums of current files with those from a valid backup to detect changes.
1:06
🎥 Source video

Extracted from a Google Search Central video

⏱ 5:20 💬 EN 📅 12/03/2013 ✂ 3 statements
Watch on YouTube (1:06) →
Other statements from this video 2
  1. 2:40 Pourquoi les logs serveur et htaccess sont-ils critiques pour évaluer les dégâts d'une attaque SEO ?
  2. 3:44 Comment nettoyer un site WordPress piraté sans compromettre son référencement ?
📅
Official statement from (13 years ago)
TL;DR

Google recommends using checksums to precisely identify compromised files after a hack. This systematic approach allows you to distinguish legitimate changes from malicious alterations, thus avoiding the removal of healthy content. Comparing with a clean backup remains the only reliable method to map the actual extent of damage on a hacked site.

What you need to understand

Why are checksums more reliable than visual inspection?

A sophisticated SEO hack doesn't just add visible spam pages. Hackers often inject obfuscated code into legitimate system files: .htaccess, wp-config.php, functions.php, or even third-party JavaScript libraries.

Manual inspection quickly becomes impossible on a site containing thousands of files. MD5 or SHA-256 checksums generate a unique fingerprint for each file. Any modification, even a single character, alters this fingerprint.

This method detects invisible backdoors, conditional redirects (that only activate for Googlebot), and base64 link injections. A file whose checksum no longer matches that of the clean backup has certainly been altered.

What should you do if you don't have a reference backup?

Critical case: you discover a hack, but your backups are corrupted or too old. The strategy changes completely.

You then need to compare your files with a clean installation of the same CMS (WordPress 6.4.2, Drupal 10.1.5, etc.). Core files that differ signal suspicious modifications. For custom files (themes, proprietary plugins), reconstruct a baseline by examining Git commits or requesting the clean sources from developers.

This approach takes more time and may overlook recent legitimate changes. That's why maintaining daily incremental backups becomes non-negotiable as soon as a site generates significant organic traffic.

How do you interpret the results from a checksum comparison?

You run a verification script and get a list of 347 modified files. Not all are compromised. Cache files, logs, and user session files change constantly.

You need to sort by criticality: PHP files in wp-admin/, .htaccess at the root, JavaScript files loaded on all pages. A modified log.txt file doesn't carry the same urgency as a header.php injected with hidden links.

  • Modified core files: absolute priority, almost always malicious
  • Configuration files (.htaccess, wp-config.php, robots.txt): check line by line
  • Templates and includes: look for iframes, external scripts, PHP redirects
  • JavaScript/CSS files: inspect minified code for suspicious third-party domains
  • Base64 or obfuscated files: systematically malicious

SEO Expert opinion

Does this recommendation really cover all attack vectors?

Let's be honest: focusing solely on files ignores half the problem. Most modern SEO hacks store their payload directly in the database. Polluted wp_posts tables filled with thousands of spam pages, user_meta injected with scripts, autoload options stuffed with redirects.

An experienced hacker leaves the files intact and works exclusively through SQL queries. Your checksum comparison comes back clean while 15,000 Viagra pages saturate your Google index. [To be verified] Google never specifies how to audit the database in this context.

It is necessary to combine the file approach with a comprehensive SQL diff: export the current database, compare with a clean backup, and look for suspicious patterns (URLs with random query strings, content in foreign languages, identical meta descriptions). Tools like WP-CLI or Adminer allow for this level of analysis.

Do checksums detect polymorphic infections?

Some sophisticated malware regenerates their code at regular intervals. They alter their footprint every hour to bypass signature detection systems. Your checksum today matches neither the backup nor yesterday's scan.

In this case, static analysis through checksums fails. It is necessary to shift to behavioral detection: monitoring suspicious outgoing connections, tracking real-time file changes, and looking for abnormal SQL queries. Tools like Sucuri SiteCheck or Wordfence analyze behavior rather than signatures.

What is the acceptable margin of error when cleaning up?

You identify 89 suspicious files. Deleting one legitimate file breaks the site. Leaving one infected file keeps the backdoor active. The risk is asymmetrical.

The cautious methodology: isolate the site (maintenance mode, block robots.txt), test each deletion in a staging environment, validate that critical functionalities still work. Only then deploy in production. This process takes hours, even days on a large site.

This is why some SEOs prefer to start from a clean installation and only reinject validated content. More radical, but it eliminates 100% of hidden backdoors. The compromise depends on your risk tolerance and the quality of your backups.

Practical impact and recommendations

What tools should you use to generate and compare checksums?

On a Linux server, the command md5sum or sha256sum generates the fingerprints. For example: find . -type f -exec md5sum {} \; > checksums_current.txt creates a file listing all the hashes. Compare this with a similarly generated file from your clean backup using diff checksums_backup.txt checksums_current.txt.

For WordPress specifically, WP-CLI simplifies the process: wp core verify-checksums automatically compares your core files with those from the official repo. wp plugin verify-checksums --all does the same for plugins from the directory. Custom files always require a manual comparison.

Graphical interfaces like Integrity Checker (WP plugin) or AIDE (Advanced Intrusion Detection Environment) automate monitoring. They send alerts as soon as a critical file changes. Initial setup: 2-3 hours. Time gain during an incident: considerable.

How can you prioritize cleanup to minimize SEO impact?

Every minute counts when Google is actively crawling your spam pages. The optimal sequence: block bot access via robots.txt or htaccess, remove the most visible infected files (those generating indexed URLs), clean the database, and then restore access gradually.

Monitor Search Console during and after the cleanup. Spam pages should disappear from the index within 48-72 hours if you submit URL removal requests. A spike in 404 errors is normal: these pages existed only for the hack. Do not redirect them to legitimate pages, as this would dilute your authority.

Once the site is clean, submit a reconsideration request if you received a security warning. Precisely document corrective actions: files deleted, passwords changed, vulnerabilities patched. Google prioritizes detailed requests.

Should you always consult a specialist?

A basic hack (link injections in a footer, a few spam pages) can be handled internally if you are familiar with SSH and databases. A sophisticated hack (multiple backdoors, obfuscated code, automatic reinfection) requires specialized expertise.

Warning signs: the site reinfects within 24 hours after cleanup, you find code you don't understand, Search Console shows thousands of unknown indexed pages. At this point, a professional audit becomes cost-effective. A specialized SEO agency has advanced forensic tools and manages communication with Google to expedite spam deindexing and restore your reputation.

  • Generate a snapshot of checksums for all your critical files every week (automatable via cron)
  • Maintain at least 3 complete backups (files + database) in distinct locations
  • Update CMS, themes, and plugins within 48 hours of a security patch
  • Enable real-time monitoring of file changes (.htaccess, wp-config.php, functions.php)
  • Precisely document any legitimate changes to avoid false positives during comparisons
  • Test restoration from backup at least once a quarter (an untested backup does not exist)
The checksum approach forms the technical foundation of effective cleanup, but is not sufficient in isolation. It must be part of a comprehensive strategy that includes database audits, behavioral monitoring, and proactive communication with Google. The increasing complexity of SEO attacks makes working with a specialized agency often relevant, especially for sites generating significant revenue via organic traffic.

❓ Frequently Asked Questions

Les checksums peuvent-ils détecter un hack zero-day encore inconnu ?
Oui, c'est justement leur force. Contrairement aux antivirus qui cherchent des signatures connues, les checksums détectent toute modification de fichier, quelle que soit la nature du malware. Un fichier altéré génère une empreinte différente, même si l'exploit est inédit.
Faut-il comparer les checksums uniquement des fichiers PHP ou aussi JS/CSS ?
Tous les fichiers exécutés côté serveur ou client. Les hackers injectent fréquemment du code malveillant dans les fichiers JavaScript pour rediriger les visiteurs ou voler des données. Les CSS peuvent contenir des imports vers des domaines malveillants.
Quelle fréquence pour vérifier les checksums sur un site en production ?
Automatisez la vérification quotidienne des fichiers core et hebdomadaire pour l'ensemble du site. Déclenchez une vérification immédiate après toute alerte Search Console ou pic de trafic suspect. Le monitoring temps réel reste l'idéal pour les sites critiques.
Un fichier modifié légitimement par une mise à jour pose-t-il problème ?
C'est pourquoi vous devez mettre à jour votre baseline de checksums après chaque modification légitime (update, ajout de plugin, customisation). Sinon, vous noyez les vrais positifs dans un bruit de fausses alertes.
Les hébergements mutualisés permettent-ils ce niveau d'accès aux fichiers ?
Oui via FTP/SFTP ou le gestionnaire de fichiers cPanel, mais l'exécution de scripts automatisés peut être limitée. Les VPS et serveurs dédiés offrent un contrôle total. Certains hébergeurs proposent des outils de vérification d'intégrité intégrés.
🏷 Related Topics
Domain Age & History JavaScript & Technical SEO PDF & Files

🎥 From the same video 2

Other SEO insights extracted from this same Google Search Central video · duration 5 min · published on 12/03/2013

🎥 Watch the full video on YouTube →

Related statements

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.