Out of 150,000 pages analyzed, the number of images (0.95) and semantic optimization score (0.93) dominate ranking correlations, far ahead of exact query in title (0.69). Tables, lists, structured data, and videos have virtually no measurable impact. Caution: correlation is not causation.
This study analyzes the correlations between 23 on-page factors and positions in Google results, across more than 150,000 French pages.
The major finding upends common beliefs: the number of images shows the strongest correlation (0.95), ahead of semantic optimization score (0.93) and word count (0.85). The number of products on e-commerce pages follows at 0.79.
In contrast, practices considered important show weak correlations: exact query at the beginning of title (0.75), presence of an H1 (0.69), or exact query in title (0.69). Tables, lists, videos, and structured data have virtually no measurable impact.
Titles with numbers even show a negative correlation (0.77). Only 25% of top 10 pages contain the exact query in the title, questioning the dogma of keyword stuffing.
The median content of first results is 736 words versus 619 for tenth positions, but 28% of pages are under 300 words, suggesting that quantity alone is not enough once in the top 10.
This analysis relies on statistical correlations, not direct causal relationships. A strong correlation can mask a confounding factor.
The number of images comes first, but no satisfactory explanation emerges. E-commerce pages with many products naturally have more images, yet the number of products only correlates at 0.79. The hypothesis: images could be a proxy for other signals (content richness, editorial effort, user engagement).
The weak correlation of exact queries in titles is surprising, but explainable: on 3-4 word queries, Google understands semantic variations. The algorithm prioritizes contextual relevance over exact matching.
Structured data shows zero impact on ranking, confirming Google's official position: they serve rich snippet display, not positioning. Their presence may correlate with CMS age rather than quality.
Major limitation: the study excludes off-page factors (backlinks, domain authority) and behavioral metrics (CTR, time on page), which heavily influence rankings. The results concern only on-page in an already competitive context (top 10).
[Opinion] In my view, the correlation of image count remains suspect. Across 150,000 pages, such consistency suggests either an unidentified hidden factor or selection bias. My experience shows that massively adding images without editorial relevance doesn't boost positions. I lean toward confusion: sites that invest in iconography also invest elsewhere (UX, content, technical).
[Generalization] The claim "having an H1 has little impact" deserves nuance. In the top 10, 91% of pages have an H1. So it's not a differentiating factor at this level, but probably a prerequisite to get there. I would nuance by saying that the absence of H1 excludes you from the top 30, but its presence won't move you from 8th to 3rd place.
[Experience Feedback] The weakness of exact query in title matches my field observations. For 2-3 years, semantic variations rank as well as exact keywords. Google understands that "best online casinos" and "top casino sites internet" target the same intent.
[To Verify] The negative correlation of numbers in titles (-0.77) seems counter-intuitive against clickbait best practices. This could reflect a bias: listicles perform in CTR but not necessarily in pure algorithmic relevance. Commercial queries without list expectation would be penalized by this format.
[Opinion] My view on semantic optimization (0.93): it's the only strong actionable factor. Unlike images whose impact remains mysterious, boosting semantic score has a verifiable mechanical effect. However, 50% of first results exceed the recommended score by only 10%, suggesting diminishing returns beyond a threshold.