What does Google say about SEO? /
Quick SEO Quiz

Test your SEO knowledge in 5 questions

Less than a minute. Find out how much you really know about Google search.

🕒 ~1 min 🎯 5 questions

Official statement

If you want to index two sites that share content, make sure each site has meaningful unique content. Otherwise, use canonical tags to indicate your indexing preference.
29:12
🎥 Source video

Extracted from a Google Search Central video

⏱ 1h20 💬 EN 📅 25/08/2017 ✂ 13 statements
Watch on YouTube (29:12) →
Other statements from this video 12
  1. 1:37 La balise canonical peut-elle vraiment bloquer les pages portes ?
  2. 3:09 Les URL dupliquées pénalisent-elles vraiment le crawl budget des gros sites ?
  3. 5:06 Comment les liens internes influencent-ils réellement le crawl et le ranking de vos pages ?
  4. 6:06 Les attributs alt et title influencent-ils vraiment le référencement des pages liées ?
  5. 7:18 Combien de liens dans le footer est-ce vraiment trop pour Google ?
  6. 14:46 Faut-il vraiment éviter de multiplier les liens dans les pieds de page ?
  7. 30:09 Comment Google gère-t-il vraiment le contenu dupliqué dans son index ?
  8. 34:14 Le balisage organisationnel suffit-il vraiment à garantir un Knowledge Panel ?
  9. 40:55 Les interstitiels mobiles tuent-ils vraiment votre référencement naturel ?
  10. 45:23 Faut-il vraiment retirer les extensions .html de ses URLs pour améliorer son SEO ?
  11. 64:46 Comment créer du contenu « significativement meilleur » que vos concurrents selon Google ?
  12. 65:57 Le balisage de données structurées peut-il tuer vos rich snippets sans impacter votre classement ?
📅
Official statement from (8 years ago)
TL;DR

Google tolerates shared content between two sites as long as each domain provides enough unique content elsewhere. If not, canonical tags should point to the preferred version. The goal is to avoid diluting the crawl budget or creating algorithmic confusion about which version to rank.

What you need to understand

Why does Google refer to "shared content" instead of duplicate content?

The distinction is important. Shared content refers to a voluntary situation: two legitimate sites under different control share content for business or technical reasons. It's not scraping.

Google recognizes that some business models require shared content: franchises, distributors, OEMs. The presence of duplicate content is not a penalty in itself, but a signal of ambiguity that Google resolves by choosing the version it finds most relevant.

What qualifies as "meaningfully unique content" in Google's eyes?

Google does not provide a numeric threshold, which is frustrating. Meaningful does not mean "a few added lines". We are talking about truly distinct pages, a clean structure, and content that offers differentiated value.

In practice, a site with 80% duplicate content and 20% unique does not meet the threshold. Field experience shows that an inverted ratio is preferable: at least 60-70% unique content for Google to consider the site legitimate to index in parallel.

When should a canonical tag actually be used?

The canonical tag comes into play when you cannot create enough unique content. Typical example: a reseller publishing the manufacturer's product sheets word for word. No differentiated value.

In this case, pointing to the source site via canonical prevents Google from wasting time crawling identical versions. Note: this is not a submission declaration. Google remains free to ignore the directive if the signals (backlinks, freshness, authority) lean towards the other version.

  • Shared content is not a penalty, but a risk of indexing cannibalization
  • Each site must justify its indexing with sufficiently voluminous unique content (no official number, field suggests > 60%)
  • The canonical is a directive, not an order: Google can ignore it if other signals contradict your choice
  • No precise threshold communicated on what constitutes "meaningful content", hence the need to test and monitor
  • The overall architecture matters: two sites with 30% duplication but different structures, targets, and contexts pose less of a problem than a 90% clone

SEO Expert opinion

Is this directive consistent with field observations?

Yes, but with nuances that Google intentionally omits. In practice, the indexing battle between two sites sharing content relies on secondary signals: domain authority, link profile, user engagement, freshness.

I have seen cases where site B (with a canonical tag to A) still ranks because its backlink profile was significantly better. The canonical is a weak signal against strong contradictory signals. [To verify]: Google never specifies the relative weight of canonical versus other signals, which makes its application unpredictable.

What are the gray areas of this statement?

Mueller says nothing about timing. If you launch site B with 80% duplicate content from site A, how long before Google decides which to prioritize? Weeks, months?

Another silence: what happens to the crawl budget? Two sites with shared content = two redundant crawls. On a large catalog, this can slow the discovery of new unique content. Google never quantifies this impact, but experience shows that duplicated sites see their crawl frequency gradually decrease.

In what cases do these recommendations fail?

Multi-regional sites with translated content: Google sometimes considers translations as unique content (hreflang), sometimes as quasi-duplicate if the translation is automatic and of poor quality. The ambiguity remains.

Another case: marketplaces where multiple sellers offer the same product. Each product listing has unique content (reviews, prices, stock), but the central description is the same. Google often favors the listing with the best engagement history, not necessarily the one with the canonical tag. UX signals take precedence.

Note: If you manage two sites with shared content and you set up cross-canonical links (A points to B on page X, B points to A on page Y), Google risks detecting an inconsistency and ignoring all your directives. The logic must be unambiguous.

Practical impact and recommendations

What should you do if you manage two sites with shared content?

First, audit the duplication/unique ratio. Use Screaming Frog or Sitebulb to identify duplicate pages between the two domains. If the ratio exceeds 40% duplication, you are in a risk zone.

Then decide which version should be canonical based on business and SEO signals: which domain has the best link profile? Which one converts better? Which one are you investing in the most? The canonical should point to the strategic site, not the one that "deserves" it theoretically.

What critical mistakes should be avoided at all costs?

Never put a canonical tag hoping that Google will still index both versions. The canonical explicitly says, "ignore this page, go look at the other one." If you want both indexed, you need to create differentiated content, not cheat with directives.

Another pitfall: adding 2-3 cosmetic paragraphs to make a page "unique". Google detects near-duplicate content through shingling algorithms. If 90% of the text is identical, your additions will have no impact. Aim for at least 40-50% truly different content to avoid the filter.

How can you check if your strategy is working?

Monitor the evolution of indexing via site: commands and Google Search Console. If you have set canonical tags from site B to A, indexing for B should gradually decline. If it stagnates or increases, Google is ignoring your directives.

Also check the impressions in GSC: if both sites continue to generate impressions for the same queries, they are cannibalizing each other. One of them should clearly dominate after a few weeks. Otherwise, dig into the contradictory signals (backlinks, engagement).

  • Audit the duplication rate between the two sites with a crawler (goal: < 40%)
  • Define a clear strategy: either canonical tags to a master site, or massive creation of unique content on each site
  • Implement canonical tags in HTML, not just via HTTP headers (more reliable for Google)
  • Monitor monthly indexing via GSC and site: to detect inconsistencies
  • Analyze impressions by site: only one should dominate each query cluster, otherwise active cannibalization
  • Plan a unique content strategy over 6-12 months if you want to maintain indexed status for both sites in parallel (significant editorial production)
Managing duplicate content between two sites requires a clear and coherent strategy. If you do not have the resources to produce enough unique content on each domain, the canonical to a master site is the safest solution. Otherwise, engage in an ambitious editorial plan and monitor indexing closely. These technical and editorial trade-offs can quickly become complex, especially on large catalogs or multi-domain architectures. If the situation seems confusing or you notice contradictory signals in GSC, consulting a specialized SEO agency can clarify the strategy and avoid costly visibility mistakes.

❓ Frequently Asked Questions

Peut-on indexer deux sites identiques si chacun cible un pays différent ?
Oui, à condition d'utiliser hreflang pour signaler la relation géo-ciblée et que chaque version ait des adaptations locales (langue, devise, contenus régionaux). Google les traite alors comme variantes, pas duplicates.
Faut-il mettre une canonical même si les deux sites ont 60 % de contenu unique ?
Non. Si chaque site a suffisamment de contenu unique et une raison légitime d'exister (cibles différentes, offres différentes), laisse Google indexer les deux. La canonical n'est utile que si tu ne peux pas justifier deux indexations.
Google pénalise-t-il un site qui duplique du contenu d'un autre domaine ?
Pas de pénalité algorithmique directe, mais Google choisira une version canonique par défaut. Le site dupliquant risque de ne pas être indexé ou d'avoir un crawl budget réduit, ce qui revient à une invisibilité de fait.
La balise canonical dans le header HTTP est-elle aussi efficace que dans le HTML ?
Théoriquement oui, mais en pratique la version HTML est plus fiable car Google la détecte systématiquement lors du rendu. La version HTTP peut être ignorée si la chaîne de redirections est complexe.
Combien de temps avant que Google prenne en compte une canonical nouvellement ajoutée ?
Dépend de la fréquence de crawl du site. Sur un site crawlé quotidiennement, quelques jours à deux semaines. Sur un site moins prioritaire, plusieurs semaines voire mois avant que l'indexation s'ajuste.
🏷 Related Topics
Content Crawl & Indexing AI & SEO

🎥 From the same video 12

Other SEO insights extracted from this same Google Search Central video · duration 1h20 · published on 25/08/2017

🎥 Watch the full video on YouTube →

Related statements

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.