Should you really keep old CSS/JS assets for Googlebot?

Official statement

For obsolete assets when using techniques like Rails Asset Pipeline, it's best to keep them temporarily until Googlebot recrawls the new HTML content to avoid broken rendering.

2:05

🎥 Source video

Extracted from a Google Search Central video

⏱ 6:21 💬 EN 📅 16/03/2020 ✂ 10 statements

Watch on YouTube (2:05) →

✂ Other statements from this video 9 ▾

1:48 Faut-il vraiment conserver vos anciens assets CSS et JS pour éviter les erreurs de crawl ?
2:40 Faut-il vraiment pré-rendre 100% du contenu pour que Googlebot l'indexe correctement ?
2:40 Le prerendering JavaScript pose-t-il encore des risques d'indexation en SEO ?
3:43 Faut-il bloquer les modifications de titre via JavaScript pour éviter une indexation indésirable ?
3:43 Comment éviter que JavaScript réécrive vos balises title et sabote votre indexation Google ?
4:15 Faut-il vraiment se méfier du JavaScript dans un contenu pré-rendu ?
4:35 Le JavaScript post-prerendering est-il vraiment sans danger pour le SEO ?
5:19 Faut-il vraiment privilégier le SSR et le prerendering pour améliorer son crawl ?
5:19 Le dynamic rendering va-t-il vraiment disparaître du SEO ?

What you need to understand

Why does Googlebot need old files after a deployment?

When you deploy a new version of your site using Rails Asset Pipeline (or any similar system generating file names with hashes), your new files have different names — for example, app-a3f2b1.css becomes app-d8e9c4.css. The updated HTML points to the new files.

The problem: Googlebot does not instantly recrawl all your pages. It can cache the old HTML that still references app-a3f2b1.css for several days. If you delete this file, the bot tries to load it, gets a 404, and the rendering fails. Google then sees a broken page, without critical styles or scripts to display the content.

What does broken rendering change for indexing?

Broken rendering means that Googlebot does not see the content as a user would. If your CSS hides non-critical content or your JavaScript injects structural elements (navigation, internal links, text sections), their absence distorts the page's understanding.

In extreme cases — Single Page Applications where all content depends on JS — a missing script = a blank page for the bot. Even on traditional sites, broken layouts can disrupt the detection of main content areas versus ads or navigation.

How long should these old files be kept?

Google provides no specific figure — it’s the usual ambiguity. The duration depends on the crawl frequency of your pages: a high-traffic site with daily crawls could afford 7-10 days. A site less prioritized by Googlebot may need 3-4 weeks.

Martin Splitt refers to “temporarily” without defining a threshold. In practice, most modern asset pipelines (Webpack, Vite, Sprockets) keep by default 2-3 previous versions — a good compromise between SEO safety and storage management.

Keep old assets until Googlebot has recrawled the updated HTML pages
The duration varies depending on your site's crawl frequency — monitor your server logs
Hash versioning systems (fingerprinting) create this problem by generating new file names at each build
A broken rendering affects Google’s understanding of the content, not just aesthetics
This recommendation also applies to critical images referenced in the old HTML

SEO Expert opinion

Is this recommendation consistent with what we observe in the field?

Yes, and it’s even a well-documented issue for years. E-commerce platforms that deploy several times a day have learned the hard way that immediately removing old JavaScript bundles causes spikes in 404 errors in the Search Console, followed by temporary visibility drops on pages that have not changed in content.

What's more revealing: Google implicitly admits that its HTML cache infrastructure and resource crawling are not synchronized. The bot can keep an HTML snapshot while the rendering service tries to fetch assets in real-time. This is a rarely acknowledged architectural limitation.

What nuances should we add to this directive?

First point: this recommendation mainly concerns assets directly referenced in the HTML (<link>, <script>, <img> critical). Dynamically loaded resources in JavaScript after initial rendering pose less of a problem — if the main JS loads, the rest follows.

Second nuance: the severity depends on your rendering architecture. A site with Server-Side Rendering where textual content is present in the initial HTML can survive a missing CSS — Google will see raw but indexable text. A client-side site where React/Vue injects everything via JS? Immediate catastrophe if the main bundle 404s.

[To be verified] Google does not specify how it handles modern fallbacks: if you serve critical CSS inline in the <head> and only the non-critical external CSS fails, does the rendering remain valid? Field reports suggest that it does, but no official confirmation is provided.

Warning: This logic also applies to CDNs with aggressive purging. If you purge your CDN immediately after a deployment, Googlebot passing through the CDN to fetch assets may encounter the same 404s. Some CDNs offer a “grace period” option — use it.

When does this rule become secondary?

If you use a semantic versioning system without hashes (e.g., app.v2.css that you never delete), the problem disappears. Similarly, if you always serve the same file names while overwriting their content — no reference break.

For sites with static pre-rendering (Gatsby, Next.js static export) where Google crawls directly from complete HTML with inline content or minimal external assets, the impact is negligible. The real risk concerns complex hybrid architectures with multiple layers of cache and dynamic rendering.

Practical impact and recommendations

What should be done during a deployment?

Configure your build pipeline to keep at least the last 2-3 versions of assets. Webpack allows this through the clean-webpack-plugin with the cleanOnceBeforeBuildPatterns option set to keep recent files. Rails Asset Pipeline provides config.assets.keep_versions.

Set up 404 monitoring on assets through your server logs or your CDN. Specifically filter requests coming from Googlebot (user-agent Googlebot) — if you see spikes of 404s on old CSS/JS files after a deployment, it signals that the bot has not yet recrawled your HTML pages.

Use the Search Console to check rendering: test some major URLs with the “URL Inspection” tool > “Test Live URL”. Compare the Google rendering screenshot with your actual site. If Google shows a broken page while your browser sees it correctly, you likely have an issue with obsolete 404 assets.

What mistakes should absolutely be avoided?

Never configure a deployment script that automatically purges all old assets immediately after the push. This is a common practice in traditional development (to “clean” the server) but disastrous for SEO. Add a delay or logic based on file age.

Avoid overly aggressive HTTP cache strategies on assets with very short max-age combined with fast file deletion. If Googlebot caches an HTML with Cache-Control: max-age=86400 but you delete the referenced assets after 12 hours, you create a risk window.

Do not rely solely on sitemaps to force a quick recrawl. Submitting a sitemap after deployment does not guarantee that Google will recrawl all pages in the following hours — crawl priority remains determined by multiple factors (popularity, historical update frequency, crawl budget).

How to check if my infrastructure is compliant?

Check your server logs to identify the average duration between two Googlebot crawls on your main pages. If it’s 5 days, keep your assets for at least 7-10 days. If it’s 48 hours, a week will suffice. Adjust retention based on this real data.

Test a deployment in a staging environment accessible to Googlebot (via a separate Search Console property): deploy, immediately delete old assets, and observe coverage reports and rendering in URL inspection. If errors appear, you have proof of the issue before it affects production.

Automate alerts on 404 assets via your monitoring (Datadog, New Relic, Cloudflare logs): if the 404 rate on /assets/* exceeds a threshold for more than 24 hours, trigger a notification. This allows you to react before the SEO impact becomes evident in rankings.

Configure retention of at least 2-3 versions of assets in your build pipeline
Monitor 404s on assets filtered by user-agent Googlebot
Test Google rendering via Search Console after each major deployment
Measure the actual crawl frequency of your pages to calibrate retention duration
Document the deployment procedure so that the entire technical team follows the same logic
Avoid immediate CDN purges — prefer gradual invalidation or a grace period

Managing obsolete assets is a technical point often overlooked that can have immediate consequences on indexing. If these optimizations seem complex to orchestrate — especially on multi-environment architectures with CDN, versioning, and frequent deployments — support from a specialized SEO agency can help you avoid costly mistakes and establish sustainable processes tailored to your technical stack.

❓ Frequently Asked Questions

Combien de temps exactement faut-il garder les anciens fichiers CSS et JavaScript ?

Google ne donne pas de chiffre précis. Sur le terrain, 7-14 jours est un bon compromis pour la plupart des sites. Ajustez selon votre fréquence de crawl observée dans les logs serveur.

Cette recommandation s'applique-t-elle aussi aux images ?

Oui, si l'ancien HTML référence des images critiques (logo, visuels de hero, images structurantes). Les images purement décoratives ou lazy-loadées posent moins de problème.

Un CSS manquant empêche-t-il vraiment Google d'indexer le contenu textuel ?

Non, le texte reste accessible. Mais un layout cassé peut perturber la détection des zones de contenu principal, et sur les SPA, un JS manquant peut rendre la page totalement vide pour Googlebot.

Peut-on forcer un recrawl rapide via la Search Console pour éviter ce problème ?

Soumettre un sitemap ou demander une indexation aide, mais ne garantit pas un recrawl immédiat de toutes les pages. La priorité reste déterminée par l'algorithme de crawl de Google.

Les CDN avec purge automatique posent-ils le même problème ?

Oui. Si vous purgez le cache CDN immédiatement après un déploiement, Googlebot qui fetch les assets via le CDN rencontrera les mêmes 404. Utilisez un grace period si votre CDN le propose.

🎥 From the same video 9

Other SEO insights extracted from this same Google Search Central video · duration 6 min · published on 16/03/2020

🎥 Watch the full video on YouTube →