What does Google say about SEO? /
Quick SEO Quiz

Test your SEO knowledge in 5 questions

Less than a minute. Find out how much you really know about Google search.

🕒 ~1 min 🎯 5 questions

Official statement

Errors in content structure within hreflang tags can lead to article fragmentation, which may cause crawling issues for Google.
46:48
🎥 Source video

Extracted from a Google Search Central video

⏱ 59:16 💬 EN 📅 19/06/2018 ✂ 9 statements
Watch on YouTube (46:48) →
Other statements from this video 8
  1. 5:48 Faut-il choisir des sous-répertoires ou des domaines distincts pour un site multilingue ?
  2. 8:34 Faut-il vraiment géolocaliser ses sous-domaines et sous-répertoires dans Search Console ?
  3. 10:44 L'attribut hreflang fonctionne-t-il vraiment en unidirectionnel ou faut-il systématiquement créer des liens bidirectionnels ?
  4. 13:08 Les domaines par pays (ccTLD) sont-ils vraiment indispensables pour le référencement international ?
  5. 19:47 Faut-il vraiment géolocaliser un site à audience internationale ?
  6. 25:02 Hreflang bidirectionnel : pourquoi Google ignore-t-il vos annotations internationales ?
  7. 44:06 Les fautes d'orthographe dans les commentaires nuisent-elles au classement SEO ?
  8. 53:04 Google applique-t-il des algorithmes différents selon votre niche ?
📅
Official statement from (7 years ago)
TL;DR

Google states that errors in the content structure indicated by hreflang can cause article fragmentation, disrupting crawl. Specifically, if your hreflang tags point to truncated, paginated, or poorly structured URLs, the bot may miss entire sections of your content. It's essential to audit the integrity of target pages and ensure that each alternative URL presents a complete, coherent, and crawlable version of the article.

What you need to understand

What does "fragmented content" actually mean in this context?

Google refers to article fragmentation when a translated or alternative page is split across multiple URLs instead of remaining unified. Typically, you have a long article in French at /article-complet, but the English version is divided into /en/article-part-1, /en/article-part-2, etc. If your hreflang tag points to /en/article-part-1, the bot only discovers the first fragment.

This technical fragmentation is not always intentional: misconfigured pagination, AMP with truncated articles, lightweight mobile versions, or redirects to cached snippets can cause it. The issue is that Googlebot assumes the complete alternative version resides at the URL specified in hreflang. If it doesn't, it might index an incomplete piece or completely ignore the alternative.

Why does this cause crawl errors?

Crawl errors occur when the bot attempts to reconcile conflicting signals. You declare via hreflang that page A in French corresponds to page B in English, but B contains only 30% of A's content. Googlebot detects a structural inconsistency: the semantic signals do not match, the internal anchors differ, and the depth of content varies drastically.

In this case, the engine might raise a warning in Search Console ('incorrect hreflang tag'), de-index isolated fragments, or worse, continuously crawl the fragments in search of a complete version that does not exist. This wastes crawl budget unnecessarily and delays the indexing of truly priority pages.

How can I detect if my site is affected?

The first step is to open Search Console and navigate to Internationalization > hreflang. If you see alerts for 'Alternative URL not found' or 'Alternative URL not crawlable', you are likely affected. Next, download your log file: look for Googlebot crawls on the alternative URLs listed in your hreflang tags.

If you observe a low crawl rate (less than 10% of alternative URLs crawled in a month) or receive HTTP 404/410/302 codes on those URLs, that is a red flag. Complement this with a technical audit: list all hreflang URLs, crawl them with Screaming Frog or Oncrawl, and ensure that each targeted page presents complete, non-truncated content.

  • Article Fragmentation: translated pages cut into pieces or paginated without correct rel=next/prev tags.
  • Crawl Errors: inconsistencies detected by Googlebot between the declared hreflang structure and the actual content of target pages.
  • Impact on Indexing: partial de-indexing, wasted crawl budget, ignored international signals.
  • Hreflang Audit: ensure complete integrity of each alternative URL (HTTP codes, content completeness, semantic consistency).
  • Logs and Search Console: essential tools for detecting failed crawls or hreflang warnings.

SEO Expert opinion

Is this statement consistent with on-the-ground observations?

Yes, completely. We regularly observe during audits of multilingual sites that hreflang points to truncated AMP versions or lightweight mobile pages that contain only the introduction. Googlebot crawls the alternative URL, detects the content gap with the source version, and eventually ignores the hreflang tag. The result: loss of international visibility, unresolved duplicates, or worse, cannibalization between language versions.

I saw a recent case where an e-commerce site declared hreflang for product listings paginated across multiple tabs ('Description,' 'Reviews,' 'Specs'). Each tab had its own URL, but hreflang only pointed to the 'Description' tab. Google indexed only that tab for alternative versions, leaving out 70% of the product content. International organic traffic dropped by 40% in three months.

What nuances should be addressed?

Google does not specify what threshold of fragmentation triggers crawl errors. If your French article is 2,000 words and the English version is 1,800 words (slightly condensed), is that a problem? [To verify] We lack official data on the acceptable ratio. From experience, as long as the semantic structure remains consistent (same H2s, same main sections), a gap of 10-15% is usually fine.

However, if the alternative version is an excerpt or a summary (less than 50% of the source content), expect problems. Google is likely comparing the semantic vectors of the two pages: if the distance is too great, the hreflang tag is ignored. Another point: pagination with rel=next/prev does not replace hreflang. If your content is paginated, hreflang must point to the consolidated page (view-all) or the first page with complete rel=next/prev tags.

In what cases does this rule not apply?

If you use hreflang for geolocated landing pages with intentionally different content (specific offers by country, prices in local currencies), fragmentation is not a bug but a feature. Google tolerates substantial differences if they are justified by the geographical context. For example, /fr/promotions/ and /de/promotions/ can present completely different products without triggering a hreflang error.

Another case: news sites that publish short versions for mobile and long versions for desktop. If your hreflang clearly distinguishes the contexts (hreflang + media query or distinct URLs with coherent signals), Google does not view that as problematic fragmentation. However, be careful: the risk of confusion remains high, so thoroughly document your logic in a sitemap.xml file with exhaustive hreflang tags.

Note: Hreflang errors are often silent. You may not see an alert in Search Console, but your alternative pages will remain under-indexed for months. Regularly audit your logs to detect failed crawls on hreflang URLs.

Practical impact and recommendations

What practical steps should I take to avoid fragmentation?

The first action is to map your hreflang URLs in a spreadsheet. Column A: Source URL, Column B: Alternative URL by language/region, Column C: HTTP status, Column D: word count, Column E: present H2/H3 tags. Compare line by line. If an alternative URL has less than 70% of the source content or lacks key sections, that’s a red flag.

Next, crawl all alternative URLs using a tool like Screaming Frog in 'deep crawl' mode. Check that each page returns a 200 OK, has textual content (not just an image or JavaScript spinner), and presents a complete HTML structure (not a fragment loaded via Ajax afterwards). If you use client-side rendering, ensure that Googlebot can execute the JavaScript and see the complete content.

What technical errors should I absolutely avoid?

Never point hreflang to a canonicalized URL elsewhere. If /en/article-part-1/ has a canonical tag pointing to /en/article-complet/, your hreflang should point directly to /en/article-complet/, not the fragment. Googlebot follows canonicals and can get lost in a chain of redirects or conflicting annotations.

Also avoid incomplete self-referential hreflang. Each page must declare hreflang to all its alternatives AND to itself (including x-default). If your French page declares hreflang to English but the English page does not return to French, Google considers the hreflang cluster broken and ignores the annotations. Use a hreflang validator (like the Hreflang Tags Testing Tool) to detect missing reciprocities.

How can I check that your implementation is correct?

Install the hreflang Tag Checker extension (Chrome/Firefox) and navigate your multilingual pages. The extension displays the detected hreflang tags and reports any reciprocity errors or 404 URLs. Complement this with a Search Console audit: in the 'Coverage' section, filter by 'Excluded' and look for URLs that mention 'Alternative not found' or 'Alternative URL does not point to this page.'

Finally, analyze your server logs over a 30-day window. Extract all Googlebot crawls (User-Agent containing 'Googlebot') on the URLs listed in your hreflang tags. If less than 80% of these URLs have been crawled at least once, you have a discoverability or crawl budget issue. Prioritize correcting uncrawled URLs: check robots.txt, server response times, and the presence of internal links to these pages.

  • Audit all hreflang URLs to verify their completeness (content, structure, HTTP status 200).
  • Compare content depth between source page and alternative pages (minimum ratio 70%).
  • Never point hreflang to paginated fragments without rel=next/prev tags or view-all URLs.
  • Check reciprocity of hreflang tags (each alternative page must refer back to the source).
  • Analyze server logs to detect failed crawls on hreflang URLs.
  • Use Search Console and hreflang validators to identify annotation errors.
Content fragmentation via hreflang is a common technical trap on multilingual or multi-regional sites. Google expects complete, coherent, and easily crawlable alternative URLs. Any structural inconsistency triggers crawl errors, wastes your budget, and undermines your international visibility. Regularly audit your annotations, crawl your alternatives, and ensure that each hreflang URL points to a fully indexable page. If your international architecture is complex (multiple languages, regional content, distinct AMP or mobile versions), these optimizations can quickly become technical and time-consuming. In such cases, hiring a specialized SEO agency for international SEO can secure implementation and prevent costly visibility errors.

❓ Frequently Asked Questions

Hreflang fonctionne-t-il si mes pages alternatives ont un contenu légèrement différent ?
Oui, tant que la structure sémantique reste cohérente (mêmes sections principales, mêmes thématiques). Un écart de 10-15 % dans la longueur ou des adaptations locales (prix, exemples culturels) sont tolérés. En revanche, si l'une des versions est un simple résumé ou un extrait (moins de 50 % du contenu source), Google risque d'ignorer la balise hreflang.
Que se passe-t-il si mon hreflang pointe vers une page paginée ?
Google crawle uniquement l'URL spécifiée dans hreflang. Si c'est la page 1 d'une série paginée, le bot ne découvre que ce fragment. Vous devez soit pointer hreflang vers une URL view-all consolidée, soit implémenter des balises rel=next/prev correctes sur chaque page de la série.
Comment savoir si mes erreurs hreflang impactent mon crawl budget ?
Analysez vos logs serveur : extrayez tous les crawls Googlebot sur les URL listées dans vos balises hreflang. Si moins de 80 % sont crawlées en 30 jours, ou si vous voyez des crawls répétés en 404/410/302, vous gaspillez du budget. La Search Console peut aussi signaler des « URL alternatives non trouvées ».
Peut-on utiliser hreflang pour des pages intentionnellement différentes par pays ?
Oui, c'est même recommandé pour les pages géolocalisées avec contenus distincts (offres, prix, promotions locales). Google tolère des différences substantielles si elles sont justifiées par le contexte géographique. Mais documentez bien votre logique et assurez-vous que chaque page reste complète et indexable.
Faut-il inclure x-default dans toutes mes implémentations hreflang ?
Oui, x-default indique à Google quelle URL afficher par défaut si aucune langue/région ne correspond au profil de l'utilisateur. Sans x-default, le bot peut indexer une version linguistique aléatoire pour des requêtes génériques. Pointez x-default vers votre langue principale ou une page de sélection de langue.
🏷 Related Topics
Content Crawl & Indexing Discover & News Pagination & Structure International SEO

🎥 From the same video 8

Other SEO insights extracted from this same Google Search Central video · duration 59 min · published on 19/06/2018

🎥 Watch the full video on YouTube →

Related statements

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.