Official statement
Other statements from this video 12 ▾
- □ Faut-il abandonner les acronymes AEO et GEO au profit du bon vieux SEO ?
- □ Faut-il vraiment ignorer l'AI Overview dans sa stratégie SEO ?
- □ Faut-il vraiment encore croire au mantra « contenu pour les humains » en 2025 ?
- □ Faut-il arrêter d'optimiser pour les AI Overviews de Google ?
- □ Le SEO technique est-il vraiment devenu automatique grâce aux CMS modernes ?
- □ Le contenu original et authentique est-il vraiment votre meilleure arme face à l'IA ?
- □ Le contenu factuel basique est-il devenu inutile pour le SEO ?
- □ Le contenu de première main va-t-il vraiment devenir un critère de classement dominant ?
- □ Les données structurées sont-elles vraiment inutiles pour l'IA de Google ?
- □ Faut-il arrêter de mesurer les clics organiques pour se concentrer sur les conversions qualitatives ?
- □ Pourquoi votre site n'apparaît-il pas dans l'AI Overview alors qu'il est bien positionné dans les résultats classiques ?
- □ Faut-il optimiser son contenu différemment pour chaque IA et système de recherche ?
Google confirms that producing content in multiple formats — text, images, videos — mechanically increases your chances of appearing in different search experiences, including multimodal searches where queries and answers can mix formats. For SEO, this means: diversifying formats is no longer a bonus, it's a full-fledged visibility acquisition strategy.
What you need to understand
What does Google mean by "multimodal searches"?
Multimodal searches refer to queries where the user can combine text, image, or even voice — and where Google responds by mixing formats too. Think Google Lens with an added text query, or a voice search that triggers a video response.
What Google is saying here is that if you produce text only, you're missing out on all the surfaces where an image, video, or visual carousel would be more relevant. And vice versa: a video without transcription or structured markup loses opportunities in standard search.
Why talk about "opportunities" rather than direct ranking?
Sullivan isn't saying that multimodal improves your #1 position on a given query. He's saying it multiplies your touchpoints with the user: standard results, Images tab, Videos tab, visual featured snippets, carousels, rich results.
Each format becomes a distinct acquisition channel. It's a logic of distribution, not over-optimization of a single piece of content.
What does this change for a site already doing text + images?
If your images are decorative and unoptimized (no relevant alt text, no strong semantic context), they don't really count. Google is talking about intentional formats here: a video designed to answer a question, a structured infographic, a transcribed podcast.
The idea: each format must be able to exist independently in the results, not just illustrate an article.
- Multimodal ≠ decorative multimedia: each format must answer a specific user intent
- Google creates distinct search surfaces for each type of content — text, image, video, products, maps, etc.
- Producing in a single format mechanically limits your exposure surface in the Google ecosystem
- Multimodal searches (Lens + text, voice + visual) become common usage, not a niche
SEO Expert opinion
Is this statement consistent with what we observe in the field?
Yes, and it's actually behind reality. For years, sites dominating certain niches — DIY, cooking, tech — have done so by saturating all formats. A written tutorial + YouTube video + Pinterest-friendly images captures 3 to 5 times more traffic than the same text-only content.
What's new is that Google is formalizing it as a conscious strategy. Before, we diversified by opportunism; now it's official doctrine. And that implies the algorithm likely values this diversity — not directly in ranking, but by broadening the contexts of relevance where your content can match.
In what cases does this multimodal logic not work?
On hyper-transactional queries or immediate local searches, the user wants a quick answer, not a buffet of formats. Example: "locksmith Paris 11" — a video or infographic won't add anything.
Similarly, in highly specialized B2B niches with low search volume, investing in 4 formats for 50 monthly visits makes no economic sense. Let's be honest: multimodal is expensive in production. You have to make trade-offs.
What nuances should be added to this recommendation?
Google talks about "opportunities," not guarantees. Producing a poorly optimized video or an image without context adds zero extra visibility. Format alone isn't enough: you need the technical infrastructure that comes with it — schema.org VideoObject, transcriptions, contextual alt tags, dedicated sitemaps.
And that's where it gets stuck for many sites. The problem isn't creative, it's technical and organizational. A multimodal strategy without a structured publication workflow quickly becomes an unmanageable undertaking.
Practical impact and recommendations
What concretely must be done to leverage this multimodal logic?
First, audit your existing content: what formats do you already produce? Are they indexable and optimized to appear independently in results? If you have 50 YouTube videos but no structured integration on the site, you're losing part of the potential.
Next, prioritize by user intent. Certain queries naturally call for a format: tutorials demand video, comparisons call for tables or infographics, definitions need structured text. Map your top queries and identify format gaps.
What technical mistakes kill multimodal visibility?
The first: unoptimized images. No descriptive alt text, no semantic context around them, inadequate dimensions, heavy formats. Google can't guess an image is relevant if you give it no signals.
The second: videos hosted elsewhere without integration. A YouTube video without embedding on your site, without transcription, without VideoObject schema, is traffic for YouTube, not for you. Google may display it in video results, but that doesn't help your domain.
The third: format silos. Your blog in text, your videos on a /videos/ page, your infographics on Pinterest — all disconnected. Google doesn't see thematic coherence, so doesn't position you as a multimodal authority on a subject.
How do you verify your multimodal strategy is bearing fruit?
In Google Search Console, segment your performance by search type: Web, Images, Videos. If 100% of your impressions come from standard Web while you're producing videos, there's an indexing or relevance problem.
Also monitor featured snippets and rich carousels: do you appear in varied formats on your target queries? If not, your competitors are capturing these surfaces instead.
- Audit your existing content: what formats do you produce, are they technically indexable?
- Identify priority queries where a complementary format (video, infographic) could capture additional traffic
- Technically optimize each format: schema.org, alt text, transcriptions, dedicated sitemaps
- Create internal links between formats to strengthen thematic coherence in Google's eyes
- Segment your GSC performance by search type to measure the real impact of each format
- Avoid the over-engineering trap: don't multiply formats on each page, diversify at site level
🎥 From the same video 12
Other SEO insights extracted from this same Google Search Central video · published on 17/12/2025
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.