Official statement
Other statements from this video 2 ▾
Google recommends using checksums to precisely identify compromised files after a hack. This systematic approach allows you to distinguish legitimate changes from malicious alterations, thus avoiding the removal of healthy content. Comparing with a clean backup remains the only reliable method to map the actual extent of damage on a hacked site.
What you need to understand
Why are checksums more reliable than visual inspection?
A sophisticated SEO hack doesn't just add visible spam pages. Hackers often inject obfuscated code into legitimate system files: .htaccess, wp-config.php, functions.php, or even third-party JavaScript libraries.
Manual inspection quickly becomes impossible on a site containing thousands of files. MD5 or SHA-256 checksums generate a unique fingerprint for each file. Any modification, even a single character, alters this fingerprint.
This method detects invisible backdoors, conditional redirects (that only activate for Googlebot), and base64 link injections. A file whose checksum no longer matches that of the clean backup has certainly been altered.
What should you do if you don't have a reference backup?
Critical case: you discover a hack, but your backups are corrupted or too old. The strategy changes completely.
You then need to compare your files with a clean installation of the same CMS (WordPress 6.4.2, Drupal 10.1.5, etc.). Core files that differ signal suspicious modifications. For custom files (themes, proprietary plugins), reconstruct a baseline by examining Git commits or requesting the clean sources from developers.
This approach takes more time and may overlook recent legitimate changes. That's why maintaining daily incremental backups becomes non-negotiable as soon as a site generates significant organic traffic.
How do you interpret the results from a checksum comparison?
You run a verification script and get a list of 347 modified files. Not all are compromised. Cache files, logs, and user session files change constantly.
You need to sort by criticality: PHP files in wp-admin/, .htaccess at the root, JavaScript files loaded on all pages. A modified log.txt file doesn't carry the same urgency as a header.php injected with hidden links.
- Modified core files: absolute priority, almost always malicious
- Configuration files (.htaccess, wp-config.php, robots.txt): check line by line
- Templates and includes: look for iframes, external scripts, PHP redirects
- JavaScript/CSS files: inspect minified code for suspicious third-party domains
- Base64 or obfuscated files: systematically malicious
SEO Expert opinion
Does this recommendation really cover all attack vectors?
Let's be honest: focusing solely on files ignores half the problem. Most modern SEO hacks store their payload directly in the database. Polluted wp_posts tables filled with thousands of spam pages, user_meta injected with scripts, autoload options stuffed with redirects.
An experienced hacker leaves the files intact and works exclusively through SQL queries. Your checksum comparison comes back clean while 15,000 Viagra pages saturate your Google index. [To be verified] Google never specifies how to audit the database in this context.
It is necessary to combine the file approach with a comprehensive SQL diff: export the current database, compare with a clean backup, and look for suspicious patterns (URLs with random query strings, content in foreign languages, identical meta descriptions). Tools like WP-CLI or Adminer allow for this level of analysis.
Do checksums detect polymorphic infections?
Some sophisticated malware regenerates their code at regular intervals. They alter their footprint every hour to bypass signature detection systems. Your checksum today matches neither the backup nor yesterday's scan.
In this case, static analysis through checksums fails. It is necessary to shift to behavioral detection: monitoring suspicious outgoing connections, tracking real-time file changes, and looking for abnormal SQL queries. Tools like Sucuri SiteCheck or Wordfence analyze behavior rather than signatures.
What is the acceptable margin of error when cleaning up?
You identify 89 suspicious files. Deleting one legitimate file breaks the site. Leaving one infected file keeps the backdoor active. The risk is asymmetrical.
The cautious methodology: isolate the site (maintenance mode, block robots.txt), test each deletion in a staging environment, validate that critical functionalities still work. Only then deploy in production. This process takes hours, even days on a large site.
This is why some SEOs prefer to start from a clean installation and only reinject validated content. More radical, but it eliminates 100% of hidden backdoors. The compromise depends on your risk tolerance and the quality of your backups.
Practical impact and recommendations
What tools should you use to generate and compare checksums?
On a Linux server, the command md5sum or sha256sum generates the fingerprints. For example: find . -type f -exec md5sum {} \; > checksums_current.txt creates a file listing all the hashes. Compare this with a similarly generated file from your clean backup using diff checksums_backup.txt checksums_current.txt.
For WordPress specifically, WP-CLI simplifies the process: wp core verify-checksums automatically compares your core files with those from the official repo. wp plugin verify-checksums --all does the same for plugins from the directory. Custom files always require a manual comparison.
Graphical interfaces like Integrity Checker (WP plugin) or AIDE (Advanced Intrusion Detection Environment) automate monitoring. They send alerts as soon as a critical file changes. Initial setup: 2-3 hours. Time gain during an incident: considerable.
How can you prioritize cleanup to minimize SEO impact?
Every minute counts when Google is actively crawling your spam pages. The optimal sequence: block bot access via robots.txt or htaccess, remove the most visible infected files (those generating indexed URLs), clean the database, and then restore access gradually.
Monitor Search Console during and after the cleanup. Spam pages should disappear from the index within 48-72 hours if you submit URL removal requests. A spike in 404 errors is normal: these pages existed only for the hack. Do not redirect them to legitimate pages, as this would dilute your authority.
Once the site is clean, submit a reconsideration request if you received a security warning. Precisely document corrective actions: files deleted, passwords changed, vulnerabilities patched. Google prioritizes detailed requests.
Should you always consult a specialist?
A basic hack (link injections in a footer, a few spam pages) can be handled internally if you are familiar with SSH and databases. A sophisticated hack (multiple backdoors, obfuscated code, automatic reinfection) requires specialized expertise.
Warning signs: the site reinfects within 24 hours after cleanup, you find code you don't understand, Search Console shows thousands of unknown indexed pages. At this point, a professional audit becomes cost-effective. A specialized SEO agency has advanced forensic tools and manages communication with Google to expedite spam deindexing and restore your reputation.
- Generate a snapshot of checksums for all your critical files every week (automatable via cron)
- Maintain at least 3 complete backups (files + database) in distinct locations
- Update CMS, themes, and plugins within 48 hours of a security patch
- Enable real-time monitoring of file changes (.htaccess, wp-config.php, functions.php)
- Precisely document any legitimate changes to avoid false positives during comparisons
- Test restoration from backup at least once a quarter (an untested backup does not exist)
❓ Frequently Asked Questions
Les checksums peuvent-ils détecter un hack zero-day encore inconnu ?
Faut-il comparer les checksums uniquement des fichiers PHP ou aussi JS/CSS ?
Quelle fréquence pour vérifier les checksums sur un site en production ?
Un fichier modifié légitimement par une mise à jour pose-t-il problème ?
Les hébergements mutualisés permettent-ils ce niveau d'accès aux fichiers ?
🎥 From the same video 2
Other SEO insights extracted from this same Google Search Central video · duration 5 min · published on 12/03/2013
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.