What does Google say about SEO? /
Quick SEO Quiz

Test your SEO knowledge in 5 questions

Less than a minute. Find out how much you really know about Google search.

🕒 ~1 min 🎯 5 questions

Official statement

The difference between the figures in Search Console and the site: query is that the site query is an approximate estimate, while the indexing status from Search Console is more accurate. The sitemap view provides the most precise figures.
71:25
🎥 Source video

Extracted from a Google Search Central video

⏱ 1h12 💬 EN 📅 02/02/2018 ✂ 12 statements
Watch on YouTube (71:25) →
Other statements from this video 11
  1. 4:11 Faut-il vraiment stabiliser vos fichiers sitemap pour optimiser le crawl ?
  2. 6:05 Le CDN peut-il tuer votre crawl budget sans prévenir ?
  3. 11:21 Le responsive design est-il vraiment indispensable pour survivre au mobile-first indexing ?
  4. 14:05 Les PWA sont-elles vraiment plus complexes que l'AMP pour le SEO ?
  5. 15:53 AMP est-il encore utile pour améliorer vos performances SEO ?
  6. 23:46 Faut-il vraiment indexer toutes vos pages de pagination ?
  7. 32:21 Mettre à jour les dates de publication améliore-t-il vraiment le classement Google ?
  8. 38:57 Les balises hreflang diluent-elles réellement l'autorité de vos pages principales ?
  9. 52:42 La structure d'URL a-t-elle vraiment un impact sur le classement Google ?
  10. 59:05 La publicité Google Ads influence-t-elle vraiment le référencement naturel ?
  11. 67:49 La densité de mots-clés est-elle encore un critère SEO en 2025 ?
📅
Official statement from (8 years ago)
TL;DR

Google confirms that the site: query is only a rough estimate of the number of indexed pages, while the indexing report from Search Console provides more reliable data. For maximum accuracy, the sitemap view offers the most precise figures. This hierarchy of reliability fundamentally changes how an SEO should audit a site's indexing status.

What you need to understand

What is the concrete difference between site: and Search Console?

The site:example.com query that every SEO has used for years is just an approximation. Google has always hinted at this, but John Mueller has explicitly stated: these figures do not reflect the exact reality of your index. The site: command queries a quick estimate, not an exhaustive count.

The indexing status report in the Google Search Console relies on actual crawling and processing data. It counts the pages that Googlebot has actually crawled, analyzed, and then decided to index or not. It’s an administrative view, not an approximation of SERP.

Why is the sitemap view the most accurate?

The sitemap allows Google to reconcile what you submit with what is actually indexed. In Search Console, each submitted sitemap shows how many URLs have been discovered, crawled, indexed, or excluded. This granularity offers immediate diagnostics.

A clean and segmented sitemap (by page type, language, update frequency) provides ultra-precise indexing reports. You see in real time the URLs accepted versus rejected. It’s the only tool that allows you to cross-check the declared intention (your sitemap) with the actual result (Google’s index).

When is a significant gap normal?

A gap of a few percent between site: and Search Console is common. But a gap of 30%, 50%, or more often reveals a crawl budget or architecture issue. Very large sites (e-commerce sites with millions of listings, media with deep archives) regularly see site: underestimating or overestimating the actual index.

Orphan pages, 301 redirect URLs still temporarily present in the index, and poorly configured mobile/desktop variants are just some of the factors that pollute the site: count without appearing in Search Console. The reverse is rare but possible (recently indexed URLs that have not been crawled).

  • site: is a quick estimate, useful for a glance but never for a precise audit
  • The Search Console indexing report reflects the actual state of your pages’ processing by Googlebot
  • The sitemap view offers maximum granularity for diagnosing gaps between submission and indexing
  • A significant gap often indicates crawl, quality, or duplication issues
  • Never take site: figures as a reference in client reporting or a technical audit

SEO Expert opinion

Is this statement consistent with real-world observations?

Absolutely. Every technical SEO has noticed that site: can vary by 20% from one day to the next without any site changes. This isn't a bug; it's the very nature of this command: it queries a sample of the index, not an exhaustive database. Google never designed site: as an auditing tool, but as a quick help for users.

The Search Console, on the other hand, compiles real crawling logs. When it says that a page is excluded due to noindex or canonicalization, it relies on the actual processing of the HTML. The figures may take 24 to 72 hours to stabilize after a change, but they reflect the technical reality.

What nuances should be added to this hierarchy of reliability?

The sitemap view is only accurate if your sitemap is clean and complete. A sitemap filled with 404 URLs, redirects, or noindex pages completely skews the diagnosis. Google might tell you that 50% of your submitted URLs are indexed, but that’s because the remaining 50% are toxic. [To be checked] regularly: the quality of your sitemap determines the quality of the reporting.

Second point: the Search Console sometimes lags behind the actual index. A page may be indexed and ranking without appearing as "indexed" in the report for 48 hours. This synchronization delay does not invalidate the overall reliability, but it requires never auditing in real-time. Allow 3 to 7 days after a structural change before drawing conclusions.

In what cases does site: remain useful nonetheless?

For a quick check of the presence of a page type, site: remains irreplaceable. For example: site:example.com inurl:"product" to verify that your product listings are indeed indexed. Or site:example.com inurl:"?utm" to identify URLs with UTM parameters indexed by mistake. It’s a filter, not a counter.

Another use: to detect leaks of sensitive URLs. A site:example.com "password" or site:example.com filetype:pdf can reveal confidential documents indexed. In this case, the accuracy of the figure matters little; what counts is the presence. site: is a detector, while Search Console is a dashboard.

Practical impact and recommendations

What should be done concretely to properly audit indexing?

The first rule: ban site: from your client reports. Always replace it with Search Console data, in the "Pages" section, under the "Why are pages not indexed" tab. This report lists exclusions (noindex, canonicalized, crawled but not indexed, discovered but not crawled) with actual volumes. This is your source of truth.

Segment your sitemaps by type: one for category pages, one for product listings, one for the blog, and one for static pages. In Search Console, each sitemap will have its own indexing report. You will immediately see which section of the site is causing issues. This granularity is impossible with site:.

What errors should be avoided when analyzing gaps?

Don’t panic if site: shows 10,000 pages while Search Console shows 8,500. First, check for legitimate exclusions: canonicalized pages, 301 redirects, intentionally noindexed pages. If these exclusions are intentional, the gap is normal and healthy. The problem arises when strategic pages are excluded for no reason.

A classic error: comparing site: with the number of URLs in the sitemap. These are two different scopes. Your sitemap may contain 12,000 URLs, Google indexes 8,500, and site: shows 9,200. The 3,700 non-indexed URLs could be duplicates, low-quality pages, or URLs blocked by crawl budget. Investigate the exclusions before making corrections.

How to verify that my site conforms to this logic?

Monthly audit of the "Pages" report in Search Console. Export the data, sort exclusions by descending volume. Prioritize crawled but not indexed pages: these are often pages of insufficient quality that you need to improve or intentionally deindex. Discovered but uncrawled pages reveal a crawl budget or depth issue.

Check the consistency between sitemap and index: for each sitemap, divide the number of indexed URLs by the number of submitted URLs. A ratio below 70% on a strategic sitemap (product listings, in-depth articles) warrants immediate investigation. Either your sitemap contains noise, or Google deems those pages irrelevant. In both cases, action is required.

  • Replace site: with the "Pages" report from Search Console in all your audits and client reports
  • Segment sitemaps by page type to obtain granular reports
  • Analyze exclusions in Search Console monthly, prioritizing "Crawled, currently not indexed"
  • Clean up sitemaps of any URL in 404, redirect, noindex, or canonicalized
  • Never audit indexing in real-time: allow 3 to 7 days of latency after a structural change
  • Calculate the indexed/submitted ratio per sitemap and investigate any ratio < 70% on strategic content
A reliable indexing audit relies on Search Console and properly segmented sitemaps. The site: query remains a quick detection tool, but should never be used as a benchmark metric. For complex sites with high volume or critical indexing concerns, these optimizations require specialized expertise in crawl budget, architecture, and log parsing. Support from a specialized SEO agency can expedite diagnostics and avoid months of trial and error over often counter-intuitive technical issues.

❓ Frequently Asked Questions

Pourquoi la requête site: affiche-t-elle des chiffres qui varient chaque jour ?
La commande site: interroge une estimation rapide de l'index, pas un décompte exhaustif. Google échantillonne les résultats, ce qui provoque des variations naturelles de 10 à 20% sans modification du site.
Puis-je me fier au rapport d'indexation de la Search Console pour un audit technique ?
Oui, c'est la source la plus fiable. La Search Console compile les logs de crawl réels et le statut de traitement de chaque URL. Laissez 48 à 72h après un changement pour que les données se stabilisent.
Comment expliquer un écart de 40% entre site: et la Search Console ?
Vérifiez les exclusions légitimes : canonicalisations, redirections 301, noindex volontaires. Si l'écart persiste sur des pages stratégiques, cherchez des problèmes de crawl budget, de qualité de contenu ou de profondeur de site.
Quelle est la meilleure pratique pour segmenter mes sitemaps ?
Créez un sitemap par typologie de pages : catégories, produits, articles, pages statiques. Chaque sitemap aura son propre rapport dans la Search Console, facilitant le diagnostic granulaire des problèmes d'indexation.
Que faire si Google indexe moins de 70% des URLs soumises dans mon sitemap principal ?
Nettoyez d'abord le sitemap de toute URL en erreur (404, redirect, noindex). Si le ratio reste faible, analysez la qualité du contenu et la profondeur de crawl. Google peut juger ces pages non pertinentes ou ne pas disposer du budget pour les crawler.
🏷 Related Topics
Crawl & Indexing JavaScript & Technical SEO Search Console

🎥 From the same video 11

Other SEO insights extracted from this same Google Search Central video · duration 1h12 · published on 02/02/2018

🎥 Watch the full video on YouTube →

Related statements

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.