Why does Google crawl pages but not index them?

Quick SEO Quiz

Test your SEO knowledge in 5 questions

Less than a minute. Find out how much you really know about Google search.

🕒 ~1 min 🎯 5 questions

Official statement

It's normal for Google to crawl certain URLs without indexing them. If a URL does not bring added value in terms of search, such as index or archive pages, it may be crawled but not indexed.

3:40

🎥 Source video

Extracted from a Google Search Central video

⏱ 58:29 💬 EN 📅 30/11/2018 ✂ 19 statements

Watch on YouTube (3:40) →

✂ Other statements from this video 18 ▾

📅

Official statement from November 30, 2018 (7 years ago)

⚠ A more recent statement exists on this topic How does Google really navigate links to uncover new content? Google · July 20, 2022 View statement →

TL;DR

Google frequently crawls URLs that it deliberately chooses not to index, especially if they don't offer distinct search value. Archive, pagination, or sorting pages are typically affected. For SEO, this means regular crawling is not a signal of future indexing, and it is essential to actively manage which pages deserve indexing.

What you need to understand

What does Google mean by "added value in terms of search"?

When Google refers to added value in search, it signifies a page's ability to fulfill a user intent that other pages on your site do not already address. A chronological archive page listing 10 already individually indexed articles adds nothing new.

The engine thinks in terms of marginal utility: if indexing this URL does not serve a specific query that existing pages do not satisfy, it is crawled to check its freshness but remains out of the index. This is a matter of resource optimization: why store and classify a redundant page?

Why crawl a page if Google isn't planning to index it?

Crawling serves multiple purposes beyond immediate indexing. Google follows internal links to discover other content, analyzes the signals of site freshness, and checks if the status of the page has changed (e.g., from thin content to comprehensive content).

A page can be crawled regularly for months without ever entering the index if it remains below the quality threshold or if it is structurally duplicated. This is especially noticeable on e-commerce facets, WordPress tags, or multiple sorting pages that generate almost identical URL combinations.

Are all index and archive pages affected?

Not necessarily. An archive page that offers editorial curation, a unique introduction, or collects content from a distinct thematic angle can be perfectly indexed. It's the generic and automated nature that poses a problem.

Well-designed hub pages, with substantial introductions and actual context, escape this rule. Conversely, a purely technical archive (/page/2/, /sort/price-asc/) without unique content will be crawled but ignored from the index even though it receives regular crawling.

Crawling and indexing are two distinct processes: one does not automatically imply the other.
Search value is evaluated relative to the pages already indexed on the site, not in absolute terms.
Purely technical pages (pagination, sorting, filters) without unique content are the first to be excluded.
A status of "Crawled - Not Indexed" in Search Console is not necessarily a problem if the page is intentionally secondary.
Google periodically reevaluates these URLs: content improvement can unlock indexing.

SEO Expert opinion

Is this statement consistent with real-world observations?

Absolutely. Audits regularly reveal sites with 60 to 80% of crawled but not indexed pages, especially on poorly configured e-commerce platforms or WordPress sites with multiple taxonomies. Google crawls these URLs to keep its sitemap up to date but refuses to index them.

The problem arises when strategic pages fall into this category. I have seen well-optimized product listings, with unique content, stagnating in "Crawled - Not Indexed" for quarters because they were drowned in a sea of useless facets. The overall site signal contaminated the good pages.

What nuances should be added to Mueller's statement?

Mueller intentionally remains vague about the decision threshold. What tips a page one way or the other? The honest answer: no one outside of Google knows precisely. Patterns can be deduced (duplication, thin content, click depth), but the exact criteria remain opaque. [To be verified]

Second nuance: saying that a page "does not add value" is an algorithmic judgment, not an absolute truth. I have corrected situations where Google underestimated a page's usefulness simply because internal linking was poor or the overall quality signals of the domain were diluted. Improving the technical context was enough to unlock indexing, without touching the content.

In what cases does this rule not apply?

Pages that carry a specific search intent escape this logic. An author-specific archive page on a media site can be indexed if users explicitly search for "articles by [author name]." A well-crafted category page, with dense editorial content, will be indexed even if it lists products that are already individually indexed.

Conversely, I have seen legitimately useful pages denied indexing because the site had a spam history or a disastrous content/code ratio. The overall context of the domain plays a huge role: a clean site with few pages will find it easier to have its archives indexed than a bloated site with 100,000 low-quality URLs.

Practical impact and recommendations

How to identify crawled but deliberately non-indexed pages?

Open the Search Console, navigate to the "Coverage" or "Pages" section. Filter for the status "Crawled - Currently Not Indexed." Export the complete list and segment it by type: facets, pagination, archives, tags, actual content.

Utilize a crawler (Screaming Frog, Oncrawl) to cross-reference with your analytics. If a page generates direct or referral traffic but is not indexed, it's a signal that it holds value and that Google is mistaken. If it generates nothing and has no backlinks, it's probably best to properly disallow it via robots.txt or noindex.

What concrete steps can be taken to reduce this issue?

First, clean your site. Block in robots.txt or set to noindex automatic facets, sorting pages, purely technical archives. Reduce the crawlable surface area to pages that are genuinely intended to be indexed. This focuses the crawl budget on what matters.

Next, enhance the legitimate pages that are stagnant in "Crawled - Not Indexed." Add unique content, strengthen internal links to them, gain some external backlinks. If a category page deserves indexing, give it the means: a 200-word introduction, filters in structured FAQ, genuine editorial work.

What mistakes should be absolutely avoided?

Do not confuse frequent crawling with guaranteed indexing. Some SEOs think optimizing crawl budget is enough. False. Google can crawl a page every day and refuse to index it indefinitely if it does not pass quality filters.

Another pitfall: leaving thousands of crawled non-indexed pages without action. This dilutes the overall quality signals of the site. Google sees a domain that generates massive low-value URLs, contaminating the perception of strategic pages. It's better to have a site with 500 well-indexed pages than one with 10,000 pages where 9,000 are ignored.

Audit Search Console every quarter to identify new "Crawled - Not Indexed" pages.
Block in robots.txt automatic facets, sorting, and filters without SEO value.
Set to noindex chronological archives without proper editorial content.
Enhance the unique content of category/tag pages you want indexed.
Reduce the click depth of strategic pages to facilitate their indexing.
Monitor the evolution of the ratio of indexed pages to crawled pages month by month.

The status "Crawled - Not Indexed" is only a problem if strategic pages stagnate there. Focus your efforts on reducing the noise (blocking unnecessary URLs) and enriching legitimate pages. These architectural optimizations can be complex to orchestrate alone, especially on large sites: engaging a specialized SEO agency can provide a precise diagnosis and a tailored action plan to maximize your indexing effectiveness.

❓ Frequently Asked Questions

Une page explorée mais non indexée sera-t-elle un jour indexée automatiquement ?

Pas nécessairement. Google réévalue périodiquement ces pages, mais sans amélioration de contenu, de maillage ou de signaux qualité, elles peuvent rester indéfiniment hors index.

Faut-il bloquer en robots.txt les pages qu'on ne veut pas indexer ?

Oui si elles n'ont aucune valeur de crawl (ex: facettes infinies). Utilise robots.txt pour économiser le crawl budget. Si tu veux qu'elles soient crawlées mais pas indexées, préfère le noindex en meta.

Le statut "Explorée – non indexée" impacte-t-il le ranking des pages indexées ?

Indirectement oui. Un ratio élevé de pages explorées non indexées peut signaler à Google un site de faible qualité globale, ce qui dilue l'autorité perçue et peut affecter les pages stratégiques.

Comment forcer Google à indexer une page bloquée dans ce statut ?

Améliore son contenu unique, renforce le maillage interne, obtiens quelques backlinks, réduis sa profondeur de clic. Ensuite, demande une inspection URL dans la Search Console. Pas de garantie, mais ça augmente les chances.

Les pages de pagination doivent-elles toutes être indexées ?

Non. Sauf si elles portent du contenu éditorial unique ou répondent à une requête spécifique, les pages de pagination (page/2/, page/3/) peuvent rester en noindex ou être bloquées en crawl sans problème.

🏷 Related Topics

indexation crawl budget Search Console pagination archives exploration URLs qualité contenu

Domain Age & History Crawl & Indexing AI & SEO Domain Name

🎥 From the same video 18

Other SEO insights extracted from this same Google Search Central video · duration 58 min · published on 30/11/2018

🎥 Watch the full video on YouTube →

Related statements

« Previous

Practicing SEO Against Strong Competition...

Using Unique Images for Image Search...

« Back to results