Official statement
Other statements from this video 5 ▾
- 0:05 Comment Google Search Console détecte-t-il les infections malware de type 'error template' sur votre site ?
- 0:05 Comment Google Search Console détecte-t-il réellement les infections malware sur votre site ?
- 0:35 Comment les pages d'erreur 404 peuvent-elles devenir des vecteurs de malware sur votre site ?
- 1:37 Pourquoi modifier les directives ErrorDocument du htaccess après une infection malware ?
- 1:37 Comment nettoyer un fichier .htaccess infecté sans perdre vos redirections SEO ?
Google explicitly recommends using wget or curl to inspect suspicious URLs infected by malicious templates, rather than opening them in a browser. This practice protects your machine while allowing you to analyze the raw source code. For an SEO managing compromised sites, this is the standard risk-free investigation method.
What you need to understand
What exactly is an error template infection?
An error template infection involves the injection of malicious code into the error template files (404, 500, etc.) of a CMS. Attackers take advantage of the fact that these files are rarely monitored and can serve thousands of infected pages without immediately arousing suspicion.
The classic trap: these infected pages load differently depending on the user-agent. When you open them in Chrome or Firefox, the script detects a human visitor and triggers malicious redirects, cloaking, or worse. Wget and curl, on the other hand, retrieve the raw HTML code without executing JavaScript, allowing you to examine the actual structure without triggering the infection mechanisms.
Why does a standard browser expose your machine?
Modern browsers execute JavaScript, load iframes, follow redirects, and interpret the entire DOM. On an infected URL, this means that malicious code activates and might attempt to exploit vulnerabilities, install tracking scripts, or redirect to phishing pages.
Command-line tools like wget or curl simply retrieve the raw content of the HTTP response. No interpretation, no execution. You get exactly what the server returns, which helps identify SEO spam injections, suspicious 302 redirects, or hidden meta refresh tags.
How do wget and curl fit into a SEO hacking audit?
When Google notifies you of a hacked content issue via Search Console, the first step is to identify the scope of the infection. Wget allows you to quickly crawl a sample of suspicious URLs and extract common patterns: presence of external links to pharmaceutical domains, modified title tags, injected scripts in the head.
Curl is ideal for inspecting HTTP headers and spotting conditional redirects (based on referer or IP). Combine the two with grep, and you can automate the detection of malicious signatures across hundreds of URLs in just a few minutes. This is the standard workflow for any professional facing a massive hack.
- Wget/curl prevent the execution of malicious code when inspecting compromised URLs
- Standard browsers trigger infection mechanisms (JavaScript, redirects, iframes)
- These tools allow analysis of raw HTTP headers and source code without interpretation
- They integrate into automated audit scripts to handle large volumes of suspicious URLs
- Google officially recommends this approach to protect the machines of SEO professionals
SEO Expert opinion
Does this recommendation reflect the real practices of experienced SEOs?
Absolutely. Any professional who has managed a hacked site knows that the first rule is to never open suspicious URLs directly from your working machine. The use of wget or curl is documented in all serious resources on cleaning SEO hacks, long before Google formalized it.
What is interesting here is that Google explicitly mentions it in the context of error templates. This suggests that they have observed an increase in this specific attack vector and that too many webmasters continue to inspect these pages manually, contaminating their machines in the process. [To verify] if this recommendation is accompanied by increased detection on the Search Console side.
What limitations should you keep in mind with this approach?
Wget and curl show what the server sends to a command-line user-agent. The issue: modern infections are often contextual. They serve clean content to Google bots and malicious content to human visitors, or only to IPs outside the US, or only after the first click.
If the malware detects a curl/wget user-agent, it may serve a clean version. In these cases, you should force the Googlebot user-agent or use tools like Puppeteer in headless mode with varied proxies. Google's recommendation remains valid for an initial inspection, but it does not cover cases of advanced cloaking. A complete audit requires multiple combined approaches.
What to do if the infection persists despite visible cleaning?
This is the classic situation: you clean the templates, you run wget on 50 sample URLs, everything seems clean, and Google continues to report hacked content. The malware has probably spread in the database (custom fields, widgets, menus), in the .htaccess with complex RewriteCond rules, or worse, in a plugin or obfuscated theme.
Wget/curl only detect what transits via HTTP. They do not scour your PHP files for base64-encoded backdoors. You must complement with server scanners like maldet, file permissions audits, and a manual code review. If you find suspicious but clean code via wget, the infection is hiding elsewhere.
Practical impact and recommendations
How can you concretely use wget and curl to audit infected URLs?
Start by retrieving a list of sample URLs from Search Console (Security section or Hacked content issues). Export 20 to 50 suspicious URLs. Launch wget with the --user-agent option to simulate Googlebot: wget --user-agent="Mozilla/5.0 (compatible; Googlebot/2.1)" -O output.html "https://example.com/infected-page".
With curl, first check the HTTP headers: curl -I "https://example.com/infected-page". Look for suspicious 301/302 redirects, added X-Redirect headers, or unusual Set-Cookie headers. Then, retrieve the full body: curl -A "Googlebot" "https://example.com/infected-page" > page.html and grep for typical patterns (iframe, document.write, eval, pharma keywords).
What mistakes should you avoid during the investigation?
Never test directly from the client's network if the server is compromised. Some malware whitelist local IPs and only serve the infection to external visitors. Use a VPN or test from a third-party server to get a realistic view.
Another trap: relying on a single wget test. Conditional infections change behavior depending on time, referer, or visit count. Run multiple passes with varied user-agents (Desktop Chrome, Mobile Safari, Googlebot, Bingbot) and compare the results. If the responses differ, you have active cloaking.
How to automate detection across hundreds of URLs?
Create a bash script that loops through your list of URLs, executes wget for each, and filters the results with grep. Example: search for all occurrences of "viagra" or "casino" in the titles. You can also parse the HTML with xmllint or pup to automatically extract suspicious tags.
For large sites, combine with Screaming Frog in custom extraction mode: configure regex to detect malicious patterns in the source code. Screaming Frog uses a headless engine that does not execute JavaScript by default, so it's safer than a standard browser, though wget/curl remain the benchmark for critical cases.
- Retrieve the list of suspicious URLs from Search Console (Security section)
- Use wget/curl with Googlebot user-agent to obtain the raw server rendering
- Compare responses with different user-agents to detect cloaking
- Parse the HTML with grep, xmllint, or Python scripts to identify injections
- Check HTTP headers (redirects, cookies, suspicious custom headers)
- Never test from the network of the compromised server (whitelist IP possible)
❓ Frequently Asked Questions
Wget et curl sont-ils suffisants pour détecter toutes les infections SEO ?
Que faire si wget renvoie une page propre alors que Google signale toujours du contenu piraté ?
Peut-on utiliser wget/curl depuis le serveur infecté lui-même ?
Comment forcer wget à se comporter exactement comme Googlebot ?
Combien d'URL échantillons faut-il tester pour évaluer l'étendue d'une infection ?
🎥 From the same video 5
Other SEO insights extracted from this same Google Search Central video · duration 1 min · published on 12/03/2013
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.