Should you really show the complete content to Googlebot if the paywall blocks users?

Quick SEO Quiz

Test your SEO knowledge in 3 questions

Less than 30 seconds. Find out how much you really know about Google search.

🕒 ~30s 🎯 3 questions 📚 SEO Google

Official statement

Googlebot must be able to see the complete content to understand the ranking topic, AND see the structured paywall markup. Users do not need to see this markup, but it is essential for Google to see it, especially if Googlebot receives the full content while users see the paywall.

44:03

🎥 Source video

Extracted from a Google Search Central video

⏱ 1h14 💬 EN 📅 11/12/2020 ✂ 46 statements

Watch on YouTube (44:03) →

✂ Other statements from this video 45 ▾

📅

Official statement from December 11, 2020 (5 years ago)

⚠ A more recent statement exists on this topic How Can You Properly Manage Paywalls to Avoid Google Penalties? John Mueller · December 21, 2020 View statement →

TL;DR

Google requires Googlebot to access the full content to understand the ranking topic, while also seeing the structured paywall markup. This dual requirement imposes precise technical management: users see the paywall, but the bot must scan the entire article AND the Schema.org NewsArticle or CreativeWork markup indicating the restriction. Without this markup, Google may consider cloaking as abusive.

What you need to understand

Why does Google need to see both the content AND the paywall?

Google's logic relies on a delicate balance. The engine must analyze the complete content to determine its thematic relevance, writing quality, and legitimacy in search results. If Googlebot only sees a snippet or teaser, it cannot properly assess the topic or the depth of treatment.

At the same time, the structured paywall markup signals to Google that access for users is restricted. Without this markup, showing the complete content to the bot while blocking internet users resembles classic cloaking — a punishable practice. The markup legitimizes this difference in treatment by formalizing the existence of a paid business model.

What specific paywall markup does Google expect?

Google primarily recognizes Schema.org NewsArticle with the property isAccessibleForFree set to false, combined with hasPart pointing to a WebPageElement with a cssSelector targeting the locked content. For non-news content, CreativeWork with isAccessibleForFree also works.

The JSON-LD structure must be clean and complete. Google ignores rough or incomplete implementations. The cssSelector must point to the area that is actually hidden from non-subscribers — not to a fictional or decorative element. This technical precision avoids false positives and cloaking penalties.

Does this markup influence the ranking of paid content?

Mueller does not say so explicitly, but field experience shows that Google ranks paid content similarly to free content, provided that the markup is correct. The paywall itself is not a downgrading factor — quite the opposite: Google wants to index premium content to offer a diversity of results.

However, users see a "Subscription" tag in the SERPs for results with this markup. This can affect the CTR — users often filter out paid content unless the brand's reputation compensates. Ranking remains possible, but user engagement may be lower.

Googlebot must scan the entire content to evaluate the topic and quality, not just a snippet or teaser.
The Schema.org markup (NewsArticle or CreativeWork) with isAccessibleForFree=false legitimizes the difference in treatment between the bot and user.
Without this markup, showing complete content to the bot resembles cloaking and exposes sites to manual or algorithmic penalties.
The cssSelector must point to the area that is truly locked — not a fictitious element — for Google to validate the markup.
The "Subscription" tag in the SERPs informs users but may reduce CTR if the brand is not well-known.

SEO Expert opinion

Is this statement consistent with field observations?

Overall yes, but with gray areas. Premium news sites applying this model do maintain their visibility in SERPs on competitive informational queries — Le Monde, The New York Times, Les Échos. Their paid content ranks alongside free sources, proving that Google honors its promise.

On the other hand, the quality of the markup remains a point of friction. We regularly observe sites with invalid or incomplete JSON-LD markup — missing cssSelector, poorly structured hasPart — that do not trigger immediate penalties but see their indexing partial or erratic. Google seems to tolerate temporarily, then gradually downgrades. [To check]: the exact extent of this tolerance and the timelines before penalties are never communicated.

What nuances should be added to this rule?

Mueller talks about "complete content", but Google tolerates a teaser area visible to everyone — typically the first 3-4 paragraphs or 20-30% of the article. This practice, called "metered paywall", allows users to judge the quality before subscribing while giving Googlebot full access to the rest.

The problem: Google never specifies the acceptable teaser/locked content ratio. Publishers are navigating by trial and error. Some show 50% and fly under the radar, while others block 90% and get penalized for "insufficient content". The cautious recommendation remains 20-30% visible, but it's empirical — not officially documented. [To check]: the exact thresholds and tolerance criteria.

In what cases does this rule not apply or pose problems?

Technical B2B content, market reports, or studies pose a challenge. Showing everything to Googlebot exposes you to scraping risks by competitors or aggregators posing as the bot. The added value of this content lies in its exclusivity — making it accessible for crawling weakens the business model.

Similarly, SaaS sites or online training with gated content (PDF downloads, exclusive videos) do not fit naturally into the NewsArticle schema. Using CreativeWork works theoretically, but Google's support is less documented and tested. There is a lack of consolidated feedback on these verticals.

Warning: If you serve full content to Googlebot without valid paywall markup, you are effectively cloaking. A manual inspection by Google can lead to severe penalties — downgrading or partial de-indexing. Always test with the URL inspection tool and validate your JSON-LD with the Rich Results Test.

Practical impact and recommendations

What should be done concretely to meet these requirements?

First step: implement the JSON-LD NewsArticle or CreativeWork markup with isAccessibleForFree set to false. The hasPart property must point to a WebPageElement object whose cssSelector precisely targets the locked div or section. Example: "cssSelector": ".article-content--premium". This selector must exactly correspond to the area hidden from the user.

Next, configure the server to detect Googlebot (user-agent "Googlebot" or reverse DNS check) and serve it the complete HTML, markup included. Non-authenticated users see the teaser + paywall, but the complete DOM remains in the source code — simply hidden by CSS or JavaScript. Google reads raw HTML, so the content must be present in text, not loaded via Ajax post-authentication.

What errors should be avoided during this implementation?

Never serve different content to Googlebot without valid paywall markup. This is the red line. If your CMS displays the full article to the bot but caches it from users, and the JSON-LD is absent or invalid, you are in violation. Google may tolerate a few days, but a manual inspection exposes you to sanction.

Another common pitfall: the cssSelector pointing to a non-existent or cosmetic element. Some sites indicate ".paywall-banner" while the actual content is in ".article-body". Google detects the inconsistency and ignores the markup. The result: the site thinks it's compliant but remains exposed to the risk of cloaking. Always validate that the selector targets the correct area.

How can I check that my implementation is correct?

Use the URL inspection tool in Google Search Console and request a live crawl. Check the rendered HTML and look for your JSON-LD — it should appear complete and valid. Copy-paste the JSON block into Google's Rich Results Test to verify it is recognized as NewsArticle or CreativeWork with a paywall.

Then, test in incognito mode from several IPs to confirm that non-authenticated users can indeed see the paywall and that the full content is not accessible via source code inspection. If the full text is readable in plain view in the DOM even for an anonymous visitor, you risk scraping — consider conditional server rendering or partial obfuscation.

Implement the JSON-LD NewsArticle or CreativeWork with isAccessibleForFree=false and hasPart pointing to the locked content.
Verify that the cssSelector precisely targets the hidden area for non-subscribers, not a decorative element.
Configure the server to detect Googlebot (user-agent + reverse DNS) and serve it the complete HTML with the markup.
Validate the JSON-LD with Google's Rich Results Test and the URL inspection tool in Search Console.
Test in incognito mode that the full content is not accessible in plain view in the DOM to avoid scraping.
Regularly monitor server logs for any crawl anomalies or unintended differences in treatment.

Compliance requires technical coordination between development, SEO, and editorial teams. The Schema.org markup, Googlebot detection, and continuous validation are non-negotiable prerequisites. These optimizations can be complex to orchestrate alone, especially for custom CMS or hybrid architectures. Engaging a specialized SEO agency can audit the implementation, correct inconsistencies, and monitor indexing over time — personalized support helps avoid costly mistakes and accelerates compliance efforts.

❓ Frequently Asked Questions

Puis-je bloquer totalement Googlebot si je veux un paywall strict sans indexation ?

Oui, mais alors votre contenu ne sera pas indexé du tout. Google ne classera que les pages accessibles à Googlebot. Si vous bloquez le bot via robots.txt ou User-Agent, vous renoncez à la visibilité organique sur ce contenu.

Le markup paywall est-il obligatoire même si je montre un extrait gratuit visible à tous ?

Si Googlebot voit le contenu complet alors que les utilisateurs voient un extrait puis un paywall, le markup reste obligatoire. Il légitime cette différence de traitement et évite une qualification en cloaking.

Quel pourcentage de contenu puis-je montrer gratuitement sans affaiblir mon modèle payant ?

Google ne précise pas de ratio officiel. La pratique courante est 20-30% visible en teaser, le reste verrouillé. Au-delà de 50%, vous risquez que les utilisateurs n'aient pas besoin de l'abonnement — mais Google n'impose aucune limite technique.

Si mon JSON-LD paywall est invalide, risque-t-on une pénalité immédiate ?

Rarement immédiate, mais Google peut qualifier le site en cloaking lors d'une inspection manuelle. Les sanctions vont du déclassement partiel à la désindexation. Mieux vaut valider le markup avant déploiement avec le Rich Results Test.

Les contenus payants se classent-ils aussi bien que les contenus gratuits dans les SERP ?

En théorie oui, si la qualité et la pertinence sont équivalentes. En pratique, la pastille "Abonnement" peut réduire le CTR, surtout pour les marques peu connues. La notoriété et l'autorité du site compensent souvent ce frein.

🏷 Related Topics

paywall indexation cloaking Schema.org Googlebot contenu premium NewsArticle markup structuré

Content Crawl & Indexing AI & SEO

🎥 From the same video 45

Other SEO insights extracted from this same Google Search Central video · duration 1h14 · published on 11/12/2020

🎥 Watch the full video on YouTube →

Related statements

« Previous

Launching site changes during a core update is not...

Blocking by robots.txt prevents link value transfe...

« Back to results