How does Google actually crawl your AMP pages?

Quick SEO Quiz

Test your SEO knowledge in 5 questions

Less than a minute. Find out how much you really know about Google search.

🕒 ~1 min 🎯 5 questions

Official statement

There is no specific user-agent for crawling AMP pages; they are generally crawled by the usual mobile user-agent. Ensure that the canonical annotation on your AMP pages is correctly configured for Google to find it.

23:22

🎥 Source video

Extracted from a Google Search Central video

⏱ 36:10 💬 EN 📅 30/06/2016 ✂ 7 statements

Watch on YouTube (23:22) →

✂ Other statements from this video 6 ▾

📅

Official statement from June 30, 2016 (9 years ago)

⚠ A more recent statement exists on this topic Is it true that AMP is a speed factor for Google? John Mueller · August 21, 2020 View statement →

TL;DR

Google does not deploy a distinct user-agent to crawl AMP pages; it is the standard mobile Googlebot that handles this task. The key is to properly configure the canonical link allowing Google to connect the AMP version with the regular version. If this annotation is poorly set up, Google may treat your AMP pages as orphan or duplicated content, which directly impacts indexing.

What you need to understand

Why hasn't Google created a dedicated user-agent for AMP?

The logic is straightforward: Google treats AMP as a mobile page format, not as a separate protocol. Creating a separate user-agent would entail unnecessary technical complexity for webmasters and additional management on the infrastructure side. The usual mobile Googlebot (which presents itself with a smartphone user-agent) supports crawling AMP versions just like it crawls any responsive page.

This approach avoids fragmentation: no need to manage two distinct robots.txt configurations or monitor two different crawl profiles in Search Console. Google centralizes mobile crawl behavior, whether for AMP pages, classic mobile pages, or responsive sites.

What is the canonical annotation and why is it critical?

The canonical annotation is the link that connects your AMP page to its classic HTML version. Technically, it is a <link rel="canonical"> tag placed in the <head> of the AMP page, pointing to the URL of the standard version. Conversely, the classic HTML page must contain a <link rel="amphtml"> tag pointing to the AMP.

Without this dual annotation, Google cannot understand that these two pages are related. It may then consider them as duplicated content, indexing one while ignoring the other, or worse, not serving the AMP version in contexts where it would be relevant (mobile search, Top Stories carousels). Discoverability relies entirely on this bidirectional configuration.

How does Google practically discover your AMP pages?

The process begins with the crawl of your classic HTML page. The mobile Googlebot crawls your content, detects the <link rel="amphtml"> tag, and then follows this link to crawl the AMP version. Once on the AMP page, it checks for the presence of the canonical tag pointing back to the original URL.

If this loop is consistent, Google understands that these are two representations of the same content and can decide which to serve depending on the context (device, search feature). If the loop is broken or contradictory, Google generally ignores the AMP version or treats it as an orphan page, thus disrupting your mobile strategy.

No specific AMP user-agent: the standard mobile Googlebot takes care of it.
Mandatory bidirectional annotation: rel="amphtml" on the standard page, rel="canonical" on the AMP page.
Two-step crawling: detection of AMP from the classic page, then verification of the canonical.
Consistency required: any inconsistency between the two annotations blocks the connection of the versions.
Direct impact on indexing: configuration error = loss of mobile visibility or content duplication.

SEO Expert opinion

Is this statement consistent with field observations?

Yes, it is perfectly aligned with what we observe in server logs and in Search Console. AMP crawl requests indeed originate from the standard mobile Googlebot, with the same user-agent string as a classic mobile page. We never see a distinct identifiable bot for AMP, unlike other technologies like Google-InspectionTool, which has its own user-agent.

Where it sometimes gets tricky is that some webmasters expect to see an explicit signal in their monitoring tools. They look for an "AMP" marker in the logs when the only reliable indicator is the crawled URL itself (often in /amp/ or with a parameter ?amp=1). This confusion leads to diagnostic errors when AMP crawl does not appear in their filtered reports.

What common mistakes does this statement help avoid?

The first is unintentionally blocking AMP crawl through robots.txt, thinking a specific user-agent exists. Some sites configure rules for "Googlebot-AMP" that do not exist, or block /amp/ believing they are protecting against bot traffic, while they are simply preventing Google from discovering these pages. The result: orphaned AMP pages that never show up in mobile results.

The second classic mistake is neglecting the canonical tag on the AMP side. Many CMSs automatically generate the rel="amphtml" on the standard page but forget to add the reverse canonical on the AMP. Google then crawls the AMP version, does not find a return link, and considers the content to be duplicated or unlinked. We see this frequently on WordPress sites where the AMP plugin is poorly configured or disabled and then reactivated without a full audit.

Should you still invest in AMP today?

This is the real question, and Google remains deliberately vague. AMP has lost its status as a preferred ranking criterion since Core Web Vitals became the mobile standard. Top Stories carousels have not required AMP for several years. Technically, there is no obligation to maintain AMP pages to perform well in mobile SEO.

However: in certain sectors (news, intensive mobile e-commerce), AMP continues to offer a real speed advantage, especially on 3G connections or in geographical areas where network latency is high. If your mobile traffic accounts for 70%+ and your Core Web Vitals struggle to turn green, AMP remains a tactical lever. [To be confirmed]: Google has never officially confirmed whether AMP still benefits from a hidden algorithmic boost, but nothing in the recent patents suggests it.

Attention: maintaining two versions of each page (HTML + AMP) doubles the attack surface for configuration errors. If you do not have the resources to regularly audit canonical annotations and AMP validity, it is better to invest in optimizing the Core Web Vitals of your classic mobile version.

Practical impact and recommendations

What should you prioritize checking on your existing AMP pages?

Start by auditing the consistency of bidirectional annotations. Crawl your site with Screaming Frog or an equivalent tool in mobile mode, extract all rel="amphtml" tags from your standard pages, and then verify that each targeted AMP URL indeed contains a rel="canonical" pointing back to the original URL. Any asymmetry (AMP without canonical, or canonical pointing to a wrong URL) must be corrected immediately.

Then, check in Search Console the index of AMP pages. Google offers a dedicated "AMP" report (if your pages are detected) listing validation errors and indexing issues. If AMP pages do not appear in this report while they exist, it is likely that the amphtml link is not discovered during the crawl of the standard version.

How can you test if Google is properly crawling your AMP?

Use the URL Inspection tool in Search Console on a standard HTML page containing an amphtml link. Google will indicate if it has detected the associated AMP version. If not, inspect the source code of your page: is the amphtml tag present in the <head>? Does it point to a valid and accessible URL?

Next, directly inspect the AMP URL. Google should confirm that it is indexable and contains a valid canonical. If the tool reports "Alternative page with appropriate canonical tag", it means the loop is working correctly. Any other mention ("Excluded by the canonical tag", "Detected but not indexed") indicates a configuration problem that needs urgent resolution.

Should you adjust robots.txt or HTTP headers for AMP?

No, and this is precisely the benefit of Google's approach. Since AMP crawl uses the standard mobile Googlebot, no specific rules are needed in robots.txt. If you are already blocking certain sections of your site to the mobile Googlebot (for example filters or infinite pagination pages), these rules will also apply to the AMP versions of these pages, which is generally consistent.

On the HTTP headers side, ensure that your AMP pages return a Content-Type: text/html, and not an exotic MIME type. Some misconfigured servers serve AMP pages with application/xhtml+xml, which can disrupt rendering or validation. Also, check that cache headers are consistent with your overall SEO strategy: no accidental no-index, no HTTP canonicalization contradicting the HTML tag.

Audit the presence and consistency of rel="amphtml" and rel="canonical" tags across the site.
Verify in Search Console that AMP pages are detected and indexed without validation errors.
Test with the URL Inspection tool the discovery of the AMP version from the standard page.
Check server logs to ensure that the mobile Googlebot is indeed crawling the AMP URLs.
Ensure that robots.txt does not inadvertently block directories containing AMP pages.
Make sure that the HTTP headers of AMP pages are clean (Content-Type, Cache-Control, no no-index).

AMP crawl entirely relies on the quality of the bidirectional canonical annotation. No specific user-agent, no distinct robots.txt rules: Google uses the standard mobile Googlebot and relies on rel="amphtml" and rel="canonical" links to associate versions. A regular technical audit of these annotations is essential to maintain consistency in mobile indexing. If the technical management of these mechanisms seems complex to you or if you lack resources to regularly audit these configurations, engaging a specialized SEO agency can ensure reliable implementation and continuous monitoring of AMP best practices.

❓ Frequently Asked Questions

Puis-je bloquer le crawl AMP dans robots.txt sans impacter le crawl mobile classique ?

Non, puisque Google utilise le même Googlebot mobile pour les deux. Bloquer /amp/ dans robots.txt empêchera simplement Google de découvrir vos pages AMP, mais n'affectera pas le crawl des autres pages mobiles.

Que se passe-t-il si j'oublie la balise canonical sur mes pages AMP ?

Google ne pourra pas relier la version AMP à la page standard. Il risque de traiter l'AMP comme du contenu dupliqué ou orphelin, ce qui peut entraîner une exclusion de l'index ou une cannibalisation entre les deux versions.

Est-ce que Search Console affiche des données de crawl séparées pour AMP ?

Non, les statistiques de crawl globales incluent les pages AMP sans distinction. Le rapport "AMP" de Search Console liste uniquement les erreurs de validation et les problèmes d'indexation spécifiques aux pages AMP détectées.

Faut-il un sitemap XML dédié pour les pages AMP ?

Ce n'est pas obligatoire mais recommandé sur des gros sites. Un sitemap AMP séparé facilite la découverte et le monitoring, surtout si vos URLs AMP suivent une structure différente de vos pages standards.

Peut-on servir une page AMP comme version canonique principale ?

Oui, techniquement c'est possible : la page AMP pointe alors vers elle-même en canonical, et la version HTML classique disparaît ou redirige vers l'AMP. Mais cette configuration est rare et généralement déconseillée pour des raisons de flexibilité éditoriale et publicitaire.

🏷 Related Topics

AMP crawl mobile Googlebot canonical indexation user-agent Search Console pages mobiles

Domain Age & History Crawl & Indexing Mobile SEO

🎥 From the same video 6

Other SEO insights extracted from this same Google Search Central video · duration 36 min · published on 30/06/2016

🎥 Watch the full video on YouTube →

Related statements

« Previous

Messages and Alerts in Google Search Console...

Managing Misinformation in Search Results...

« Back to results