Why does Google sometimes index your AMP pages before their canonical HTML version?

Official statement

Google may sometimes discover and index an AMP page before its canonical HTML version, especially if links point directly to the AMP. Once the HTML page is crawled, Google connects the two versions and focuses on the HTML. These situations resolve automatically.

31:45

🎥 Source video

Extracted from a Google Search Central video

⏱ 37:34 💬 EN 📅 12/06/2020 ✂ 18 statements

Watch on YouTube (31:45) →

✂ Other statements from this video 17 ▾

1:06 Pourquoi Google affiche-t-il soudainement plus d'URLs non indexées dans Search Console ?
3:11 Le crawl budget : pourquoi Google ne crawle-t-il qu'une fraction de vos pages connues ?
5:17 Core Web Vitals : pourquoi vos tests en laboratoire ne servent-ils à rien pour le ranking ?
9:30 Le contenu généré par les utilisateurs engage-t-il vraiment la responsabilité SEO du site ?
11:03 Faut-il vraiment inclure toutes vos pages dans un sitemap général ?
12:05 Le crawl budget varie-t-il selon l'origine du contenu ?
13:08 Googlebot envoie-t-il un referrer HTTP lors du crawl de votre site ?
14:09 La qualité des images influence-t-elle vraiment le ranking dans la recherche web Google ?
18:15 Comment Google évalue-t-il vraiment l'importance de vos pages via le linking interne ?
20:19 Pourquoi un site bien positionné peut-il perdre sa pertinence sans avoir commis d'erreur ?
21:53 Les Core Web Vitals sont-ils vraiment un facteur de ranking ou juste un écran de fumée ?
22:57 Discover fonctionne-t-il vraiment sans critères techniques stricts ?
25:02 Retirer des pages d'un sitemap peut-il limiter leur crawl par Google ?
27:08 Faut-il vraiment utiliser unavailable_after pour gérer le contenu temporaire ?
30:11 Le structured data influence-t-il réellement le ranking dans Google ?
33:52 Les Core Web Vitals sont-ils vraiment décisifs pour le ranking Google ?
35:51 Google voit-il vraiment le contenu chargé dynamiquement après un clic utilisateur ?

What you need to understand

How can Google index the AMP before the main HTML page?

The classic sequence suggests that Google first discovers the HTML page, identifies the rel=amphtml link, and then crawls the AMP version. However, this scenario doesn't always reflect the reality of the web.

If a third-party site, an RSS feed, or a direct link points to the AMP URL itself (e.g., https://example.com/article.amp.html), Google may index this version first. The canonical HTML page may not have been crawled yet—or worse, it may be temporarily blocked by a server issue, a robots.txt error, or insufficient crawl budget.

In this case, Google indexes what it has at hand: the AMP page. It appears in search results as the primary URL, even though it has a rel=canonical tag pointing to the HTML version that it hasn't seen yet.

What happens once the HTML page is crawled?

When Google eventually discovers the canonical HTML page, it analyzes the rel=amphtml and rel=canonical tags between the two versions. If everything is correctly tagged, it understands that the AMP is a variant and that the HTML version should take priority.

The engine then shifts its focus: the HTML URL becomes the main indexed version, while the AMP is treated as an alternative variant. This transition may take a few days as Google recrawls both pages and updates its index.

Mueller states that this process resolves automatically, without manual intervention. In theory. In practice, it depends on the consistency of your tags and your crawl budget.

Why does this situation pose practical problems?

During this transition phase, the AMP URL remains visible in the SERPs. Analytics tools and Search Console may display fragmented data: some traffic arrives on the AMP, while another arrives on the HTML.

If you are tracking your rankings with third-party tools (SEMrush, Ahrefs, etc.), you might see two distinct URLs ranked for the same query, or artificial fluctuations in positions. Tracked conversions on the HTML side may not reflect actual traffic if some still arrive on the AMP.

Another issue: if your HTML page has conversion elements missing from the AMP (complex forms, chat, widgets), users landing on the AMP will have a degraded experience. And Google doesn't care if your conversion funnel is broken during this timeframe.

Google may index the AMP first if external links point directly to this version
The reconnection with the canonical HTML page happens automatically after crawl, but may take several days
During this transition, analytics and ranking data can be fragmented between the two URLs
Users landing on the AMP may have a diminished experience if the page lacks elements present on the HTML
The automatic resolution assumes correct canonical tags — any inconsistency prolongs the issue

SEO Expert opinion

Is this statement consistent with on-the-ground observations?

Yes, and it's actually a common scenario on sites that generate publicly accessible AMP URLs. We regularly observe AMP pages indexed first, especially when they are shared on Twitter, LinkedIn, or aggregated in RSS feeds pointing directly to the .amp.html URL.

The problem is that Mueller presents this as a non-event that resolves itself. In reality, it depends. If your site has a low crawl budget, if the HTML page is deeply nested in the structure, or if it returns temporary errors, Google may take weeks to make the connection. I've seen cases where the AMP remained indexed for over a month before Google switched. [To be verified] whether this delay is truly “automatic” in all contexts.

What nuances should be added to this claim?

Mueller says nothing about what happens if the canonical tags are inconsistent. If the AMP points to an HTML URL that itself redirects, or if the HTML page doesn't properly declare the rel=amphtml link, Google may get stuck in an intermediate state.

Another point: he claims that Google “focuses on the HTML” once the connection is established. But this doesn't mean that the AMP disappears from the index. It remains accessible, and in some cases (mobile searches, Top Stories), it is still the AMP that may be served first. The “focus” Mueller speaks of is vague: does it refer to ranking, display, or just the canonical version stored?

Finally, he does not mention the impacts on Core Web Vitals. If the AMP is indexed first and Google collects CWV data on this version, then switches to the HTML which has different performance metrics, will the metrics reset? Or will Google keep a mixed history? [To be verified] how UX signals are handled during this transition.

In what cases does this rule not apply?

If you have disabled AMP or removed the rel=amphtml/canonical tags, this situation no longer occurs. But beware: old AMP URLs may remain cached in Google's index for weeks. You'll need to force a recrawl via Search Console or wait for Google to purge naturally.

Another case: if you use AMP only for Top Stories and the AMP URLs are never directly linked, the risk of AMP-first indexing is almost negligible. This is mainly an issue for sites that expose AMP URLs in their sitemap, RSS feeds, or social shares.

Attention: If you're migrating from AMP to standard HTML, do not abruptly remove AMP pages without a 301 redirect. Google may continue serving the indexed AMP for weeks, creating cascading 404s. Implement AMP → HTML redirects and let Google recrawl before fully disabling AMP.

Practical impact and recommendations

What concrete actions should be taken to avoid this problem?

First, never directly link your AMP URLs in your social shares, RSS feeds, or newsletters. Always use the canonical HTML URL. If your CMS automatically generates links to the AMP, configure it to point to the standard HTML version.

Next, ensure your rel=canonical and rel=amphtml tags are symmetrical. The AMP page should point to the HTML with rel=canonical, and the HTML should point to the AMP with rel=amphtml. Any inconsistency prolongs Google's confusion.

If you notice that an AMP page is indexed before its HTML version, force the crawl of the HTML page via the URL Inspection tool in Search Console. Don't just wait: Google may take weeks to recrawl naturally if your crawl budget is tight.

What mistakes should be avoided in AMP/HTML management?

Do not create redirect chains between AMP and HTML. If the AMP points to an HTML URL that redirects elsewhere, Google may ignore the canonical directive. Each URL should point directly to its final target.

Avoid blocking AMP URLs in the robots.txt while keeping them accessible. If Google cannot crawl the AMP, it cannot check the consistency of the canonical tags, and the situation remains static.

Do not remove AMP pages without explicit 301 redirection. Even if the HTML page is indexed, Google may continue trying to access the AMP for months. A 404 on the AMP can degrade crawl budget and slow down consolidation.

How to verify that your configuration is compliant?

Use the AMP report in Search Console to identify indexed AMP pages. If you see AMP URLs appearing while the HTML version exists, it’s a signal that you need to force the recrawl of the HTML.

Audit your RSS feeds, XML sitemaps, and social shares to ensure they point to the canonical HTML URLs, never to .amp.html. A simple grep on your templates can reveal hardcoded AMP links by mistake.

Monitor your rankings with third-party tools: if you see two URLs (HTML and AMP) ranked for the same query, it means Google has not yet consolidated. Force the crawl of the HTML and wait a few days.

Check that the rel=canonical (AMP → HTML) and rel=amphtml (HTML → AMP) tags are symmetrical and consistent
Never directly link AMP URLs in your RSS feeds, sitemaps, or social shares
Force the crawl of the HTML page via Search Console if the AMP is indexed first
Avoid redirect chains between AMP and HTML — always point to the final URL
Regularly audit the AMP report in Search Console to detect inconsistencies
If you disable AMP, set up 301 redirects from AMP → HTML before deleting the pages

Managing the AMP/HTML relationship relies on a strict consistency of canonical tags and vigilance over external links. If Google indexes the AMP first, force the crawl of the HTML and check that your feeds are not propagating AMP URLs. These optimizations may seem simple in theory, but they often involve deep technical adjustments in the CMS, templates, and publishing processes. If your infrastructure is complex or you notice persistent AMP indexations despite your corrections, it may be wise to consult a specialized SEO agency for a complete audit and personalized support on migrating or optimizing your AMP configuration.

❓ Frequently Asked Questions

Google peut-il indexer définitivement l'AMP au lieu de la page HTML ?

Non, une fois la page HTML crawlée et les balises canoniques validées, Google bascule toujours sur la version HTML comme URL principale. L'AMP peut rester en index comme variante, mais elle n'est plus la référence.

Combien de temps faut-il pour que Google bascule de l'AMP vers la HTML ?

Ça dépend du crawl budget et de la fréquence de crawl de votre site. En moyenne quelques jours à deux semaines. Vous pouvez accélérer le processus en forçant le crawl de la page HTML via Search Console.

Les rankings de la page AMP sont-ils transférés à la page HTML après consolidation ?

Oui, Google traite la HTML et l'AMP comme des variantes de la même entité une fois la connexion établie. Les signaux de ranking (backlinks, CTR, etc.) sont consolidés sur la version canonique.

Faut-il bloquer les URLs AMP dans le robots.txt pour éviter ce problème ?

Non, au contraire. Si Google ne peut pas crawler l'AMP, il ne peut pas vérifier les balises canoniques. Laissez l'AMP accessible et assurez-vous que les balises rel=canonical/amphtml sont correctes.

Que se passe-t-il si je supprime l'AMP sans redirection ?

Les URLs AMP indexées renverront des 404, ce qui peut dégrader votre crawl budget et ralentir la consolidation. Mettez toujours en place des redirections 301 AMP → HTML avant de désactiver AMP.

🎥 From the same video 17

Other SEO insights extracted from this same Google Search Central video · duration 37 min · published on 12/06/2020

🎥 Watch the full video on YouTube →