What does Google say about SEO? /
Quick SEO Quiz

Test your SEO knowledge in 5 questions

Less than a minute. Find out how much you really know about Google search.

🕒 ~1 min 🎯 5 questions

Official statement

Make sure that the pages you want to see in search results are indexable. Meta noindex tags or other issues can prevent indexing.
6:17
🎥 Source video

Extracted from a Google Search Central video

⏱ 1h00 💬 EN 📅 14/12/2017 ✂ 10 statements
Watch on YouTube (6:17) →
Other statements from this video 9
  1. 7:20 Pourquoi Google recommande-t-il JSON-LD pour le balisage de données structurées ?
  2. 7:54 Faut-il vraiment mettre à jour son sitemap offres d'emploi régulièrement pour ranker ?
  3. 9:20 Pourquoi les erreurs 503 peuvent-elles détruire votre crawl budget ?
  4. 12:52 Comment Google affiche-t-il désormais les avis et salaires dans les résultats d'emploi ?
  5. 19:32 Le balisage d'offres d'emploi sans données de localisation : valide ou pas ?
  6. 23:45 Pourquoi Google pénalise-t-il le balisage structuré sur vos pages de résultats internes ?
  7. 30:06 Que risquez-vous vraiment si Google détecte un abus de balisage structuré sur votre site ?
  8. 44:12 Pourquoi le balisage schema emploi ne garantit-il pas votre positionnement dans les résultats ?
  9. 49:47 Faut-il vraiment enrichir ses données structurées avec tous les champs disponibles ?
📅
Official statement from (8 years ago)
TL;DR

Google reminds us that a page can be crawled without being indexable. Meta noindex tags, X-Robots-Tag directives, or even a simple setting in the CMS can block indexing despite active crawling. The real issue is to verify not whether Googlebot visits the page, but whether it can actually make it into the index. An indexability audit should precede any content strategy.

What you need to understand

What is the difference between crawlable and indexable?

A page can be crawled by Googlebot without ever entering the index. Crawling is the technical visit by the bot. Indexing is the decision to add that page to the catalog of potential results.

Hundreds of sites lose traffic because they confuse the two. Their log files show regular visits from Googlebot, but Search Console displays “Excluded by noindex tag”. The bot comes, reads, and then discards. Specifically, a noindex directive in the HTML or in the HTTP headers is enough to block indexing even if the crawl budget is allocated.

What are the main obstacles to indexing?

The meta robots noindex tag remains the classic culprit, often left in production after a staging phase. But HTTP headers like X-Robots-Tag go unnoticed because they don’t appear in the visible source code.

CMS parameters are also problematic. WordPress, Shopify, and Prestashop have options for “Visibility to search engines” that add a global noindex. A distracted developer ticks the box, and the site goes live with all the pages blocked. 301/302 redirects are also not indexable, but they transfer the signal to the target—unless the target itself is blocked.

How does Google detect these blockages?

Googlebot first reads the HTTP headers before even parsing the HTML. If an X-Robots-Tag: noindex appears in the server response, the HTML content isn’t even analyzed for indexing. The bot can still follow internal links to discover other pages, but this one remains excluded.

In Search Console, the “Coverage” or “Pages” tab reports these exclusions with explicit labels: “Excluded by noindex tag,” “Blocked by robots.txt file,” “Page with redirect.” The problem is that these reports can take several days to show up after a deployment. A preventative audit avoids losing a week of visibility.

  • Crawl ≠ indexing: Googlebot can visit a page without adding it to the index
  • Meta robots tags and X-Robots-Tag headers block indexing even if crawling is allowed
  • Search Console reports exclusions, but with a delay of several days
  • Popular CMS (WordPress, Shopify) have hidden options that add global noindexes
  • A preventative audit of robots directives should be part of the production checklist

SEO Expert opinion

Is this statement consistent with observed practices?

Yes, and it’s even a welcome reminder. In practice, 30 to 40% of SEO audits reveal strategic pages blocked by forgotten directives. Teams focus on content, backlinks, structure, but neglect technical indexability. Google can crawl 10,000 pages a day on a site and index only 200 if there are lingering noindex tags.

The confusion also arises from the fact that popular SEO tools (Screaming Frog, Oncrawl, Botify) simulate a crawl but don’t always check the HTTP headers in real conditions. A local scan can miss an X-Robots-Tag configured only in production via Cloudflare or a CDN. Testing with the Search Console URL Inspection tool is necessary, as it shows exactly what Googlebot sees.

What nuances should be added?

Google says “ensure pages are indexable,” but doesn’t specify the time between correction and effective indexing. [To be verified] Removing a noindex does not guarantee immediate indexing. Recrawling can take days or even weeks on a site with a low crawl budget.

Another unclear point: Google does not detail the impact of poorly configured canonicals. A canonical tag pointing to a different URL is not a strict noindex, but it redirects the indexing signal elsewhere. The result: the original page remains crawled but is never displayed in the SERPs. Technically, it’s not “blocked,” but the effect is the same.

In what cases does this rule not apply?

Some pages must remain non-indexable: login pages, shopping carts, internal search results, dynamic faceted filters. An e-commerce site can generate thousands of combinatorial URLs (color + size + price) that are better off blocked to avoid diluting the crawl budget.

The real challenge is to intelligently segment. A global noindex on /search/ is relevant, but a noindex on /category/ is often a mistake. Modern CMS platforms allow conditional rules (noindex if parameter X is present), but their configuration requires a clear vision of the architecture. Poor settings can cost more than an external audit.

Practical impact and recommendations

What actions should be taken to audit indexability?

First, export the list of “Excluded” URLs from Search Console > Pages. Sort by exclusion reason: noindex, robots.txt, canonicals, soft 404. Each category reveals a different type of problem. Accidental noindexes are quickly identified if a strategic page appears in the list.

Next, crawl the site with Screaming Frog in Spider mode, enabling the “Render JavaScript” option if the site is in React/Vue/Angular. Check the “Indexability” column for each URL. Compare this with the actual HTTP headers through the “Response Headers” tab. A mismatch between the HTML (noindex absent) and the headers (X-Robots-Tag present) indicates a server or CDN layer injecting directives.

What mistakes should be avoided during deployment?

Never leave a staging environment with password protection + noindex and then copy the database into production without cleaning up the settings. WordPress, for example, stores the “Discourage search engines” setting in wp_options. A direct SQL import reactivates the global noindex.

Avoid overly broad robots.txt rules as well. A “Disallow: /*?” blocks all URLs with parameters, including legitimate paginations or language variants (?lang=en). Googlebot will not be able to crawl or index. Prefer targeted Disallow rules on unnecessary parameters (utm_, sessionid, sort).

How can you verify that the corrections are working?

Use the URL Inspection tool in Search Console immediately after removing a noindex. Click “Test live URL” to force a new crawl. Google will display “Indexable URL” or indicate a persistent issue. If everything is green, request a manual indexing.

Monitor server logs for 48 hours after the correction. If Googlebot doesn’t return, it indicates that the page has a low crawl priority. You then need to boost the signal: add internal links from already indexed pages, submit an updated XML sitemap, or publish fresh content that calls that URL. Technical indexability is insufficient if Google never discovers it.

  • Export excluded URLs from Search Console and sort by reason
  • Crawl the site with JavaScript rendering enabled to detect dynamic noindexes
  • Check actual HTTP headers (X-Robots-Tag, canonical) via cURL or developer tools
  • Test each correction with the URL Inspection tool before generalizing
  • Automate Search Console alerts to detect new exclusions each week
  • Document indexability rules in an internal guide to avoid regressions
Indexability is a technical prerequisite often overlooked in favor of content strategy. However, a forgotten noindex can nullify months of SEO efforts. A preventative audit before each deployment, coupled with ongoing monitoring via Search Console and server logs, ensures that strategic pages remain visible. These technical optimizations require sharp expertise and rigorous follow-up. For critical projects or complex migrations, engaging a specialized SEO agency secures indexability from the design phase and helps avoid costly traffic losses.

❓ Frequently Asked Questions

Une page peut-elle être crawlée sans être indexée ?
Oui, Googlebot peut visiter une page régulièrement (crawl) sans jamais l'ajouter à l'index si une directive noindex, un X-Robots-Tag ou une canonical vers une autre URL bloque l'indexation. Le crawl mesure la découverte, l'indexation mesure l'entrée au catalogue de résultats.
Où se cachent les directives noindex invisibles dans le code source ?
Les en-têtes HTTP X-Robots-Tag sont invisibles dans le HTML affiché par le navigateur. Il faut inspecter la réponse serveur via les dev tools (onglet Network) ou cURL pour les détecter. Les CDN et serveurs proxy peuvent aussi injecter ces en-têtes en production.
Combien de temps après avoir retiré un noindex la page sera-t-elle indexée ?
Ça dépend du crawl budget et de la priorité de la page. Entre quelques heures pour une page liée depuis la homepage et plusieurs semaines pour une page profonde. Forcer un recrawl via l'outil d'inspection d'URL accélère le processus mais ne garantit pas une indexation immédiate.
Les balises canonical mal configurées bloquent-elles l'indexation ?
Pas techniquement, mais elles redirigent le signal d'indexation vers l'URL canonique. Si la page A pointe via canonical vers B, Google indexera B et ignorera A dans les résultats. L'effet est proche d'un noindex même si le statut diffère dans Search Console.
Faut-il noindexer toutes les pages avec paramètres URL ?
Non, seulement celles qui créent du contenu dupliqué ou sans valeur (filtres, tris, sessions). Les paginations légitimes, variantes de langue ou paramètres trackés (utm_) peuvent rester indexables si elles apportent du contenu unique. Segmenter avec des règles conditionnelles plutôt qu'un blocage global.
🏷 Related Topics
Domain Age & History Crawl & Indexing AI & SEO

🎥 From the same video 9

Other SEO insights extracted from this same Google Search Central video · duration 1h00 · published on 14/12/2017

🎥 Watch the full video on YouTube →

Related statements

💬 Comments (0)

Be the first to comment.

2000 characters remaining
🔔

Get real-time analysis of the latest Google SEO declarations

Be the first to know every time a new official Google statement drops — with full expert analysis.

No spam. Unsubscribe in one click.