Official statement
Other statements from this video 5 ▾
- 1:43 Should you convert your site to Markdown to boost your SEO?
- 19:48 Do text files for AI really enhance your SEO discoverability?
- 21:23 Should you double your documentation in Markdown to please Google’s AI?
- 24:19 Is HTML still the only format that Google can effectively index?
- 25:20 Should you create separate versions of your site for LLMs, or is that a recipe for chaos?
Google confirms that HTML serves as the foundation for crawling and content discovery. Without a clear HTML structure, bots struggle to identify internal links and map the site’s architecture. For an SEO, this means that relying solely on JavaScript or modern frameworks without HTML fallback exposes you to major risks of incomplete indexing and loss of visibility.
What you need to understand
What does Mueller's statement really mean?
Mueller highlights a technical reality often overlooked: Google's bots primarily analyze the HTML code to understand a site's structure. The HTML parser identifies the <a href> tags, builds the internal link graph, and plans the next URLs to crawl.
Without accessible HTML, Googlebot has to wait for the complete JavaScript rendering to discover links. This delay consumes crawl budget, slows the discovery of new content, and weakens indexing on large sites. Frameworks like React, Vue, or Angular often generate client-side content that is invisible on the bot's first pass.
Why is this clarification coming now?
The rise of Single Page Applications (SPAs) and headless architectures has created a generation of sites with skeletal initial HTML. Developers rely on JavaScript to display everything, including links.
Google has indeed improved its ability to execute JS, but Mueller emphasizes that this layer remains secondary and costly. JavaScript rendering uses additional server resources, introduces latency, and does not guarantee comprehensive discovery. A dynamically generated link may well escape the bot if JS execution fails or times out.
What is the difference between crawling and indexing in this context?
Crawling refers to the discovery and traversal of URLs. Indexing occurs afterwards when Google analyzes the content and decides to store it. This statement specifically concerns the discovery phase: without HTML, the bot cannot find the pages at all.
A site can have quality content, strong signals, but if the links are not accessible in the initial HTML, those pages remain orphaned. They will only be crawled if a sitemap XML references them or if an external backlink points directly to them, which is still marginal for most internal pages.
- Initial HTML: the bot instantly reads the links, structures the crawl graph, and plans the next visits without delay.
- JavaScript required: rendering delay, increased resource consumption, risk of timeout or execution failure, partial discovery.
- Recommended hybrid architecture: serve HTML containing at least critical internal linking, then enhanced by JS for interactivity.
- Critical case: e-commerce sites with thousands of dynamically generated product pages, where the absence of initial HTML blocks the discovery of entire sections of the catalog.
- Diagnostic tools: compare the source HTML (
curlor View Page Source) with the rendered DOM (Inspect Element) to identify discrepancies.
SEO Expert opinion
Is this position consistent with real-world observations?
Absolutely. Audits consistently reveal discovery issues on poorly configured SPA or headless sites. The pages exist, the content is relevant, but Google does not crawl them due to the lack of links accessible in initial HTML.
Tests with Google Search Console (URL inspection, coverage report) show glaring discrepancies between the URLs submitted via sitemap and those actually crawled. When analyzing server logs, it is clear that Googlebot visits HTML-linked pages significantly more, and much less those requiring JS. The data completely aligns with this statement.
What nuances should be added to this assertion?
Google can crawl JavaScript-only sites, that is a fact. But it requires more time, more resources, and offers no guarantees. On a small site of 50 pages, the risk remains manageable. On a portal of 100,000 URLs, the absence of initial HTML becomes catastrophic.
Another nuance: some modern frameworks (Next.js, Nuxt) offer Server-Side Rendering (SSR) or static generation. These approaches serve complete HTML from the first load while retaining the SPA experience on the client side. The problem does not lie with JavaScript itself but with the chosen architecture. A React SSR site poses no crawling issues.
In which cases does this rule become critical?
E-commerce sites and classifieds portals are the first concerned. Thousands of dynamically generated product listings or articles with JavaScript filter navigation: without HTML, the bot discovers only a fraction of the catalog. Organic traffic losses can amount to tens of thousands of visits monthly.
Media sites with infinite pagination or scroll-loading encounter the same problem. Articles beyond the first page remain invisible if no classic HTML link connects them. The result: recent content that is not crawled and never appears in the SERPs. [To be verified]: Google claims to be continuously improving JS rendering, but tests show that the priority remains on initial HTML, and no public roadmap specifies a timeline for total parity.
Practical impact and recommendations
What should you prioritize checking on your site?
Run a crawlability audit by comparing the source HTML (curl or View Page Source) with the final DOM (Inspect Element after complete loading). If critical links only appear post-JS execution, you have a problem. Use Screaming Frog in "HTML only" mode to simulate a basic bot, then compare with a complete crawl including JS.
Check the server logs to identify real crawl patterns. Does Googlebot visit all sections of the site evenly, or are some categories under-crawled? Discrepancies often reveal missing links in HTML. Correlate this data with Google Search Console: URLs not crawled despite their presence in the sitemap indicate a deficiency in HTML linking.
What technical errors should be absolutely avoided?
Never generate the entire internal linking structure via JavaScript only. Menus, breadcrumbs, pagination, contextual links must all exist in native HTML. Frameworks like React Router create valid <a> links, but only after client-side hydration, which is too late for the bot's first pass.
Avoid onClick links without an href attribute. A JavaScript button that triggers navigation is not a link for Googlebot. Even with event listeners, ensure a true <a href="URL"> exists in the initial HTML. Overlays, modals, and dropdowns must contain classic HTML links, not just JS handlers.
How to implement a sustainable solution?
Adopt a hybrid architecture: SSR (Server-Side Rendering) or SSG (Static Site Generation) to serve complete HTML from the first request, followed by progressive hydration for interactivity. Next.js, Nuxt, SvelteKit, Astro all provide this approach. The bot receives immediately usable HTML, while users benefit from a smooth SPA experience.
For existing sites relying solely on CSR (Client-Side Rendering), implement at least prerendering or dynamic rendering (serving static HTML to bots and JS to visitors). Solutions like Prerender.io or Rendertron, although debated, remain acceptable if the served content is strictly identical. Google tolerates this approach as long as there is no cloaking.
- Audit the initial HTML with curl or View Page Source and list all present
<a href>links - Compare with the final DOM after JS to identify links generated dynamically only
- Analyze server logs to spot under-crawled sections despite their presence in the sitemap
- Migrate to SSR/SSG if the site currently relies on pure CSR, or implement prerendering for bots
- Ensure all navigation elements (menu, pagination, filters) exist in native HTML with valid href attributes
- Regularly test with Google Search Console (URL inspection) to confirm that crawled content matches initial HTML
❓ Frequently Asked Questions
Google crawle-t-il vraiment moins bien les sites en pur JavaScript ?
Le Server-Side Rendering suffit-il à résoudre tous les problèmes de crawlabilité ?
Peut-on se contenter d'un sitemap XML sans HTML pour les liens internes ?
Le dynamic rendering est-il considéré comme du cloaking par Google ?
Comment vérifier si mes liens sont accessibles en HTML initial ?
🎥 From the same video 5
Other SEO insights extracted from this same Google Search Central video · duration 25 min · published on 15/06/2026
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.