Does Google really enforce a strict 15 MB HTML crawl limit per page?

Official statement

Googlebot crawls up to 15 megabytes of HTML per page. This limit doesn't affect the vast majority of websites.

🎥 Source video

Extracted from a Google Search Central video

💬 EN 📅 28/09/2022 ✂ 14 statements

Watch on YouTube →

✂ Other statements from this video 13 ▾

□ Will structured data pros/cons in reviews really change the game in search results?
□ Can structured product data really transform your Google visibility?
□ Is Google's new Merchant Listings report a game-changer for e-commerce SEO?
□ Does the Helpful Content Update really penalize your entire site, or just problem pages?
□ Should you really forget technical SEO to please Google with 'people-first' content?
□ Why did Google roll out the Helpful Content Update exclusively in English at first?
□ Why does Google finally maintain a dedicated page to track ranking algorithm updates?
□ How can you unlock your videos with Google's new Video Indexing Report in Search Console?
□ How can you leverage the new video data in Google's URL Inspection Tool to boost your rankings?
□ Can Google's new HTTPS report really prevent your rankings from dropping?
□ Is Google's Search Console classification update changing how you should prioritize your SEO tasks?
□ Is Google really abandoning geotargeting controls in Search Console?
□ How can you optimize your feeds to make the most of Google Discover's Follow feature?

📅

Official statement from September 28, 2022 (3 years ago)

⚠ A more recent statement exists on this topic Is Crawl Budget Just a Myth Invented by SEOs? John Mueller · March 5, 2024 View statement →

TL;DR

Googlebot crawls up to 15 megabytes of HTML per page. Beyond that threshold, content gets truncated and ignored for indexing. Google claims this limit doesn't affect most websites, but certain specific use cases can run into this barrier.

What you need to understand

Google enforces a strict technical limit: 15 MB of HTML per page. This constraint applies to the raw HTML document itself, not external resources (images, CSS, JavaScript). If your page exceeds this threshold, Googlebot stops downloading and indexes only the portion it was able to retrieve.

This statement from John Mueller aims to clarify a technical parameter that's often misunderstood in the crawling process. Unlike crawl budget, which governs the quantity of pages explored, this limit focuses on the individual size of each HTML document.

What counts in these 15 MB?

Only the HTML source code is affected. JavaScript files, CSS, images, videos, and other external resources loaded via separate requests don't factor into the calculation. We're talking about the document as returned by the server during the initial request.

In practice, this means server-generated HTML, inline content, and the initial DOM are counted. If you massively inject content or JSON data into your <script> tags, that adds to the weight.

Why does Google impose this limit?

It comes down to resources and performance. Crawling the web at Google's scale requires balancing exploration depth against efficiency. Downloading tens of megabytes per page would slow down the process and consume considerable bandwidth for marginal benefit.

Google operates on the principle that if your HTML page weighs more than 15 MB, either you have an architecture problem or the excess content provides nothing to the user experience or semantic understanding of the page.

Which websites risk being impacted?

The vast majority of websites don't even come close to this limit. A typical editorial page weighs between 50 KB and 500 KB of HTML. Even complex pages with lots of content rarely exceed 2-3 MB.

The at-risk cases? E-commerce sites with thousands of products hardcoded into the DOM, single-page applications (SPAs) that embed their entire state in the initial HTML, or infinite scroll pages generated server-side with poorly implemented lazy loading.

15 MB of raw HTML: only the source document counts, not external resources
Truncated crawl: beyond the limit, content is ignored for indexing
Rare cases: mainly affects heavy or poorly optimized web architectures
No penalty: Google doesn't penalize; it simply stops downloading

SEO Expert opinion

Is this limit consistent with real-world observations?

Yes. This statement aligns with what we've been observing for years. Google has always had implicit technical limits, and this is merely a public formalization of a constraint already in place. Technical audits regularly reveal pages whose bottom content is never indexed—often because the HTML is too heavy or the response time exceeds Googlebot's patience thresholds.

That said, we should nuance this: the 15 MB limit is probably not the only factor at play. Other mechanisms (server timeout, DOM depth, processing time) can cut off crawling well before reaching this ceiling.

What is this statement really telling us?

Let's be honest: this limit is an indirect signal. Google is telling us between the lines that if your HTML exceeds 15 MB, you have an architecture problem. No human user should have to load such a massive amount of code just to display a webpage.

The implicit message: optimize your HTML generation, defer loading secondary content, implement proper lazy loading on the client side, and separate your data from your presentation. [To be verified]: we lack concrete data on how frequently this limit is exceeded and its actual SEO impact. Google claims that "most sites" aren't affected, but no precise metrics are provided.

What remains unclear?

The statement is vague on several points. What exactly happens to content located after the 15 MB mark? Is it completely ignored, or can Google revisit it in a subsequent crawl? No official answer.

Another question: does this limit apply the same way to JavaScript rendering? If Googlebot executes the JS and the resulting DOM exceeds 15 MB, is there a second limit? Again, complete silence.

Caution: this limit can mask deeper issues. If your HTML approaches 15 MB, the real issue isn't "how do I work around the limit" but "why does our architecture generate such bloated documents". Before trying to fragment, question the relevance of your rendering strategy.

Practical impact and recommendations

How do I check if my site is affected?

Start by measuring the raw HTML weight of your strategic pages. Use Chrome DevTools (Network tab, filter by "Doc") or a simple curl -I to get the initial document size. Focus on high-content pages: product sheets, category pages, long-form articles.

If you're over 5 MB, investigate. Beyond 10 MB, you're in the red zone. Inspect the source code: look for embedded JSON data blocks, inline JavaScript variables, and excessive metadata.

What should you do if you're approaching the limit?

Fragment your content. If you're hardcoding thousands of products into your HTML, switch to server-side pagination or proper lazy loading. Move bulky data into separate JSON files retrieved via AJAX after initial load.

Clean up unnecessary code: verbose HTML comments, redundant tags, oversized inline scripts. Minify your HTML in production—every byte counts. Push as much logic as possible to post-load JavaScript rather than generating everything server-side.

What critical mistakes should you avoid?

Don't attempt to artificially circumvent the limit by splitting a page into multiple hidden fragments that you later load via JS. Google detects these manipulations and you risk devaluation for cloaking or hidden content.

Also avoid overloading your pages with content intended only for search engines. If no human reads the 12,000-word auto-generated product description you've written, Google won't index it either—and you'll have bloated your HTML for nothing.

Measure the raw HTML weight of your strategic pages using DevTools or curl
Identify data blocks (JSON, JS variables) that bloat your document
Fragment long lists with server-side pagination or client-side lazy loading
Externalize bulky resources (product data, configurators) into separate files
Minify HTML in production and strip out unnecessary comments
Test rendering in Google Search Console to verify all content is properly indexed
Monitor server logs for incomplete crawls (206 codes, connection interruptions)

This 15 MB limit should only concern sites with atypical or poorly optimized architectures. For most projects, a technical audit will surface other bottlenecks well before hitting this ceiling. If your HTML weighs several megabytes, the problem goes beyond SEO: the entire user experience suffers. These technical optimizations—precise measurement, architecture overhaul, data externalization—require specialized expertise and thorough diagnosis. If you identify a risk of exceeding the limit or want to restructure your HTML generation, working with a specialized SEO agency can save you valuable time and prevent costly mistakes.

❓ Frequently Asked Questions

Les 15 Mo incluent-ils le JavaScript et le CSS inline ?

Oui, tout ce qui est embarqué dans le document HTML source compte : scripts inline, styles CSS dans des balises <style>, données JSON dans des <script type="application/ld+json">. En revanche, les fichiers externes chargés via <link> ou <script src> ne sont pas comptabilisés.

Que se passe-t-il si ma page dépasse 15 Mo ?

Googlebot arrête le téléchargement après 15 Mo et indexe uniquement le contenu qu'il a pu récupérer. Le reste est ignoré. Aucune pénalité n'est appliquée, mais vous perdez de la visibilité sur le contenu situé après la coupure.

Cette limite s'applique-t-elle au rendu JavaScript ?

La déclaration officielle ne le précise pas. La limite porte sur le HTML initial téléchargé, mais on ignore si une contrainte similaire existe pour le DOM après exécution du JavaScript. Point à clarifier.

Mon site e-commerce avec 500 produits par page est-il concerné ?

Probablement pas si vous utilisez une génération HTML classique. Une fiche produit standard pèse entre 5 et 20 Ko. Même avec 500 produits, vous restez sous les 10 Mo. Mais si vous embarquez des données structurées volumineuses ou du JSON inline, vérifiez.

Comment mesurer précisément le poids HTML de mes pages ?

Ouvrez les DevTools Chrome, onglet Network, rechargez la page, filtrez sur "Doc" et regardez la colonne Size. Vous pouvez aussi utiliser curl -s URL | wc -c en ligne de commande pour obtenir la taille exacte en octets.

🏷 Related Topics

crawl Googlebot limite HTML indexation architecture web performance DOM crawl budget

Domain Age & History Crawl & Indexing

🎥 From the same video 13

Other SEO insights extracted from this same Google Search Central video · published on 28/09/2022

🎥 Watch the full video on YouTube →

Related statements

« Previous

URL Inspection Tool Enhancement for Videos...

Structured data pros and cons for reviews...

« Back to results