Why does Google allow PDFs to be 32 times larger than HTML pages before hitting the crawl limit?

Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

For PDF files, Google Search applies a crawl limit of approximately 64 megabytes, significantly higher than the standard 2 MB for HTML. This higher limit is necessary because PDFs are naturally larger in size.

🎥 Source video

Extracted from a Google Search Central video

💬 EN 📅 12/03/2026 ✂ 12 statements

Watch on YouTube →

✂ Other statements from this video 11 ▾

📅

Official statement from March 12, 2026 (1 month ago)

⚠ A more recent statement exists on this topic Does Googlebot really stop crawling after 15 MB per URL? Martin Splitt · March 30, 2026 View statement →

TL;DR

Google applies a crawl limit of approximately 64 MB for PDF files, which is 32 times higher than the 2 MB limit for HTML. This difference exists because PDFs are inherently more voluminous in nature, incorporating fonts, images, and metadata. Any document exceeding this threshold simply will not be crawled in its entirety by Googlebot.

What you need to understand

What exactly is this 64 MB limit announced by Gary Illyes?

Google imposes a strict crawl limit of approximately 64 megabytes for indexable PDF files. In practical terms, if your PDF exceeds this size, Googlebot will stop reading at that threshold and will not index the remaining content.

This restriction applies solely to the crawl process — in other words, the robot's ability to read the file. It does not concern the initial download of the PDF by a user, but rather Google's capacity to extract and index its text content.

Why is this limit 32 times higher than the HTML limit?

HTML files are limited to approximately 2 MB by Google, a constraint known for years. PDFs, on the other hand, carry far more than plain text: embedded fonts, high-resolution images, structured metadata, annotations, interactive forms.

A technical documentation PDF or an annual report can quickly reach several dozen megabytes without being "heavy" in a functional sense. Google calibrated this limit to 64 MB to accommodate the majority of legitimate professional use cases, while preventing Googlebot from crawling excessively large files.

Does this limit apply to all types of PDFs?

Yes, without distinction. Whether it's a native text PDF (generated from Word or InDesign), a scanned PDF with OCR, or a pure image PDF, the 64 MB limit applies uniformly.

Be aware, however: a scanned PDF without OCR will not be text-indexable by Google regardless of size. The crawl limit only comes into play if the content is technically extractable.

PDF crawl limit: ~64 MB (full file volume)
HTML crawl limit: ~2 MB (source code only)
Beyond the threshold: excess content is not indexed
PDF type: the limit applies to all, but only text/OCR PDFs are indexable
Practical implications: monitor the weight of PDFs that are strategic for SEO

SEO Expert opinion

Does this 64 MB limit really match what's observed in practice?

Let's be honest: until Gary Illyes made this announcement, most SEO professionals were unaware that a specific limit existed for PDFs. Real-world observation shows that PDFs ranging from 10 to 30 MB generally index without difficulty, which is consistent with a 64 MB ceiling.

However, for very large files — product catalogs at 80 MB, illustrated annual reports at 100 MB — partial indexation problems have often been encountered without always being able to identify the cause. This official clarification finally explains these edge cases. [To verify]: Google has not specified whether this limit has always applied or was introduced recently.

Does the different treatment of HTML vs. PDF create a fairness issue?

Not really. Comparing the 2 MB limit for HTML with the 64 MB limit for PDF makes sense only if you're comparing apples to apples. Modern HTML is rarely "heavy" in itself — it's external resources (CSS, JS, images) that increase a page's weight, and these are not counted within the 2 MB limit.

A PDF, on the other hand, packages everything: fonts, images, metadata. The 64 MB therefore includes the equivalent of HTML plus all its resources. In this sense, Google applies a relatively consistent logic between formats.

Watch out for multi-language or multi-chapter PDFs: If you consolidate multiple documents into one large file for practical reasons (e.g., complete product catalog), you risk crossing the threshold. It's often better to break them into several thematic PDFs of reasonable size.

Should you worry if a strategic PDF exceeds 64 MB?

It depends on your context. If the PDF in question is a critical SEO asset — for example, a technical guide you want to rank for long-tail queries — then yes, it's a problem.

If it's a supplementary document made available for download (detailed financial report, historical archive), the SEO impact will be limited. But in that case, why keep it indexable? A simple X-Robots-Tag: noindex in the HTTP header or a well-configured robots.txt would solve the issue.

Practical impact and recommendations

What should you do concretely if your PDFs exceed or approach this limit?

First step: audit the PDFs currently indexed on your site. Use a crawler (Screaming Frog, OnCrawl, Botify) configured to extract PDF file URLs and their weights. Filter those exceeding 50 MB — that's your red zone.

For each large PDF, ask yourself: does this document have an SEO purpose? If yes, you need to reduce its weight. If no, de-index it properly.

What techniques can you use to reduce PDF weight without losing quality?

The optimization levers are classical but often overlooked. Compress embedded images (move to 150-200 dpi instead of 300 dpi for web), remove unused fonts, strip unnecessary metadata, disable embedded Photoshop layers.

Tools like Adobe Acrobat Pro, Ghostscript, or online services like Smallpdf enable effective compression. In most cases, an 80 MB PDF can be reduced to 30-40 MB without noticeable degradation on screen.

How can you verify that Google crawls your entire PDFs?

Use Google Search Console to inspect the PDF URL and check the version rendered by Googlebot. If the content appears truncated or incomplete, that's a red flag.

Another technique: search Google for a text phrase located near the end of the PDF (last page, appendices). If it doesn't appear in the index while the document's beginning is well indexed, you likely have a crawl limit issue.

Audit all indexable PDFs and identify those > 50 MB
Compress embedded images (150-200 dpi is sufficient for web)
Remove non-essential embedded fonts
Break down large documents into multiple thematic files if appropriate
Test the crawled version via Google Search Console (URL inspection)
Verify indexation of end-of-document content with targeted searches
De-index non-strategic PDFs (X-Robots-Tag: noindex or robots.txt)
Implement recurring monitoring of published PDF weights

Optimizing PDF weight for SEO requires precise auditing, appropriate compression tools, and a clear editorial strategy that distinguishes documents intended for SEO purposes from downloadable archives. If your site regularly publishes strategic PDF content (technical documentation, industry guides, catalogs), implementing a rigorous optimization process can quickly become complex. In this context, support from an SEO-specialized agency helps structure a sustainable strategy, automate checks, and avoid costly indexation errors.

❓ Frequently Asked Questions

La limite de 64 Mo s'applique-t-elle au poids du fichier sur le serveur ou au poids une fois décompressé ?

Google mesure la taille du fichier tel qu'il est servi par le serveur, avant décompression éventuelle par le navigateur. C'est donc bien le poids du PDF stocké sur votre infrastructure qui compte.

Si je découpe un PDF de 80 Mo en deux fichiers de 40 Mo, Google les indexera-t-il tous les deux intégralement ?

Oui, à condition que chaque fichier reste sous la limite de 64 Mo et soit techniquement crawlable. Veillez toutefois à maintenir une cohérence éditoriale et des liens internes entre les deux parties si nécessaire.

Un PDF hébergé sur un CDN externe (type Google Drive, Dropbox) est-il soumis à la même limite ?

Oui, la limite s'applique quel que soit l'hébergement. Ce qui compte, c'est que Googlebot puisse accéder au fichier et qu'il soit techniquement crawlable. L'emplacement du serveur n'a pas d'incidence sur cette règle.

Les PDFs générés dynamiquement (factures, rapports personnalisés) sont-ils concernés par cette limite ?

S'ils sont accessibles via une URL crawlable par Googlebot, oui. Mais dans la plupart des cas, ces documents ne devraient pas être indexables (contenus privés, derrière authentification, ou bloqués via robots.txt).

Google prévoit-il d'augmenter cette limite à l'avenir avec l'évolution des infrastructures ?

Aucune indication officielle à ce sujet. La limite de 64 Mo semble calibrée pour couvrir les usages légitimes actuels tout en préservant les ressources de crawl. Toute évolution future dépendra probablement des pratiques éditoriales observées sur le web.

🏷 Related Topics

crawl budget indexation PDF Googlebot limites crawl optimisation PDF fichiers volumineux Search Console

Crawl & Indexing AI & SEO PDF & Files

🎥 From the same video 11

Other SEO insights extracted from this same Google Search Central video · published on 12/03/2026

🎥 Watch the full video on YouTube →

Related statements

« Previous

Difference between crawlers and fetchers at Google...

Googlebot is not a single program but an infrastru...

« Back to results