Official statement
Other statements from this video 9 ▾
- 2:20 Pourquoi Google refuse-t-il d'indexer vos pages malgré un contenu que vous jugez pertinent ?
- 8:04 Faut-il vraiment abandonner AMP pour votre stratégie SEO ?
- 11:12 Pourquoi les outils Core Web Vitals donnent-ils des résultats contradictoires ?
- 17:40 Comment Google traite-t-il vraiment les pages de phishing dans ses résultats de recherche ?
- 31:32 Faut-il vraiment exclure les URLs mobiles des sitemaps XML ?
- 33:06 Pourquoi Google détecte-t-il des différentiels de couverture entre mobile et desktop dans Search Console ?
- 41:04 Faut-il vraiment utiliser la balise picture pour servir vos images WebP ?
- 47:58 Les données structurées améliorent-elles vraiment votre positionnement dans Google ?
- 54:20 Google pénalise-t-il vraiment les sites avec plusieurs URLs en première page ?
Google states that the data collected via the site: command and that of Search Console diverge structurally. For any serious analysis, Search Console remains the reliable source. The site: command was never designed for precise monitoring, and relying on it leads to marketing decisions based on misleading metrics.
What you need to understand
Why do these two sources of data differ?
The site: command queries a simplified index intended for standard user queries. It does not reflect the actual state of indexing as Google sees it for ranking.
Search Console pulls directly from Google's internal logs: actual impressions, real clicks, true index coverage. It's a backend view, not a frontend approximation. The discrepancies observed are not bugs but the result of two systems with distinct objectives.
What does the site: command actually measure?
It returns a sample of indexed URLs, filtered according to varying relevance and freshness criteria. The displayed number fluctuates without apparent logic: today 850 pages, tomorrow 1200, with nothing having changed on the site.
Google guarantees no exhaustiveness. The command can ignore indexed pages or display others that have already been removed from the ranking index. It primarily serves to check that a site is present, not to audit its technical health.
What makes Search Console more reliable?
Search Console reveals actual performance data: each recorded impression corresponds to an occurrence in the SERPs. Each click is traced back to Google's server log.
The tool also provides detailed indexing status: explored pages, excluded pages with specific reasons (canonicalized, noindex, 4xx error). It's a forensic report, not an estimate. Coverage alerts help detect partial de-indexing or crawl issues before they impact traffic.
- site: returns a indicative sample, never exhaustive, with undocumented display logic.
- Search Console provides real backend data: impressions, clicks, indexing status validated by crawlers.
- The gaps between the two sources are structural, not accidental: they originate from distinct systems with different purposes.
- For any serious marketing analysis (traffic tracking, technical auditing, client reporting), Search Console remains the only reference.
- The site: command still serves a purpose for quick checks: domain presence, testing for mass de-indexing, a quick look at displayed snippets.
SEO Expert opinion
Is this statement consistent with observed practices?
Absolutely. Every SEO who has compared the two sources has noticed massive discrepancies: Search Console shows 5000 indexed pages, site: returns 1200 or 8000 depending on the day. Clients panic, juniors freak out, and we waste time explaining that one should never base reporting on site:.
Google has been repeating this for years, but the command remains a reassuring reflex to check that a site exists. The problem is that it gives an illusion of precision: a round number, a list of URLs, and one believes they have a truth. However, this number has no contractual value.
What nuances should be added?
The site: command still has utility for quick and coarse checks: detecting mass de-indexing (drop from 10000 to 0 pages), verifying that a new domain is beginning to be crawled, or auditing the displayed meta snippets in the SERPs.
It also allows for spotting undesired indexed content: test pages, non-canonicalized URL parameters, or internal content that surfaces. But for any precise tracking, it is unusable.
[To be verified]: Google does not communicate the filtering logic of site:. It is unknown whether it prioritizes recently crawled pages, URLs with the most internal PageRank, or a random sample. This opacity makes any interpretation risky.
When does this rule not apply?
If a site does not appear at all in Search Console (not verified, or unconfigured GSC account), the site: command remains the only available indicator. It's a stopgap, but at least it confirms that part of the site is indexed.
For recently launched sites, site: may display URLs before Search Console reports any impression data. The processing delay of GSC (24-48 hours) creates a temporary gap where site: can give a first, albeit imprecise, overview.
Practical impact and recommendations
What should you do concretely?
Stop monitoring the site: command. If you run it once a month to check that your site exists, that's fine. But if you log the displayed number in an Excel sheet, you're wasting your time. The variations between two measurements mean nothing.
Set up Search Console for all projects, with all properties (HTTP/HTTPS, www/non-www, subdomains). Ensure that data is reported correctly within 48 hours after adding a new domain. It's your only source of truth for indexing and organic performance.
What mistakes to avoid?
Never justify a drop in traffic by a decrease in the number of pages displayed in site:. The two have no direct causal link. A URL may leave the site: sample while still ranking and generating traffic.
Also, avoid reassuring a client by showing them that site: displays 10000 pages while Search Console indexes 3000. The client may discover the gap later and lose trust. Always present Search Console data first, and explain that site: is a troubleshooting tool, not a dashboard.
How to check if my site is correctly indexed?
Go to Search Console > Indexing > Pages. Look at the number of indexed URLs, and especially the list of excluded pages with their reasons. This is where you detect real issues: unwanted canonicalizations, forgotten noindex tags, 404 errors on strategic pages.
Cross-reference this data with a Screaming Frog or Oncrawl crawl to ensure that important pages are indeed in the index, and that unnecessary pages (pagination, facets, filters) are excluded. If Search Console indexes 15000 pages while your site only has 5000, you have a problem with duplication or uncontrolled URL parameters.
- Use Search Console as the sole source for all client reporting and performance analysis.
- Never track the evolution of the number of pages returned by site: over time.
- Set up all domain variants (HTTP/HTTPS, www) in Search Console for a comprehensive view.
- Cross-reference GSC data with a complete crawl to identify indexed pages that are not crawlable, or vice versa.
- Explain to your clients and team that site: is just an indicator of presence, not an indexing KPI.
- Monitor coverage alerts in Search Console to detect partial de-indexing before it impacts traffic.
❓ Frequently Asked Questions
Pourquoi le nombre de pages affiché par site: change-t-il chaque jour ?
Peut-on se fier à site: pour détecter une pénalité ?
Search Console affiche 5000 pages indexées, site: en renvoie 1200. Est-ce un problème ?
La commande site: a-t-elle encore une utilité ?
Comment expliquer cet écart à un client qui panique en voyant site: ?
🎥 From the same video 9
Other SEO insights extracted from this same Google Search Central video · duration 59 min · published on 03/09/2020
🎥 Watch the full video on YouTube →
💬 Comments (0)
Be the first to comment.