Is Compression a Google SEO Myth?

Recently, there has been an SEO test that tried to determine if compression ratio has an impact on search engine rankings. Some people seem to think that higher compression ratios are associated with lower rankings. To understand the role of compressibility in SEO, it is crucial to examine both the original sources on compression ratios and the relevant research papers before jumping to conclusions about whether it is an SEO myth.

Search Engines Compress Web Pages

In the context of search engines, compressibility refers to the extent to which web pages can be compressed. An example of compression is shrinking a document into a zip file. Search engines compress indexed web pages because it conserves storage space and leads to faster processing. This is a practice employed by all search engines.

Websites & Host Providers Compress Web Pages

Web page compression is beneficial as it enables search crawlers to access web pages more swiftly. This, in turn, signals to Googlebot that it won’t put a strain on the server and that it is okay to index even more pages. Compression also speeds up websites, enhancing the user experience for site visitors. Most web hosts automatically enable compression because it is advantageous for websites, site visitors, and web hosts alike, as it reduces bandwidth loads. Everyone benefits from website compression.

High Levels of Compression Correlate With Spam

Researchers at a search engine found that highly compressible web pages are correlated with low-quality content. The study, titled Spam, Damn Spam, and Statistics: Using Statistical Analysis to Locate Spam Web Pages (PDF), was conducted in 2006 by Marc Najork and Dennis Fetterly, two of the world’s leading researchers. Najork currently works at DeepMind as a Distinguished Research Scientist, while Fetterly, a software engineer at Google, has authored many important research papers related to search, content analysis, and other related topics. This research paper is highly significant.

The 2006 research paper revealed that 70% of web pages that compress at a level of 4.0 or higher tend to be low-quality pages with a high level of redundant word usage. The average compression level of sites was around 2.0.

Here are the averages of normal web pages listed by the research paper:

  • Compression ratio of 2.0: This is the most frequently occurring compression ratio in the dataset.
  • Compression ratio of 2.1: Half of the pages have a compression ratio below 2.1, and half have a compression ratio above it.
  • Compression ratio of 2.11: On average, the compression ratio of the pages analyzed is 2.11.

It would be an easy first-pass method to filter out obvious content spam, so it makes sense that search engines might use it to weed out heavy-handed content spam. However, weeding out spam is more complicated than relying on simple solutions. Search engines use multiple signals because it results in a higher level of accuracy.

The researchers from 2006 reported that 70% of sites with a compression level of 4.0 or higher were spam. That means that the other 30% were not spam sites. There are always outliers in statistics, and that 30% of non-spam sites is why search engines tend to use more than one signal.

Subscribe for Daily Search Insights

AI, PPC, and digital marketing news distilled to fuel success. Join the other 75k marketers!

Do Search Engines Use Compressibility?

It is reasonable to assume that search engines might use compressibility to identify heavy-handed, obvious spam. However, it is also reasonable to assume that if search engines do employ it, they are using it in conjunction with other signals to increase the accuracy of their metrics. Nobody knows for certain if Google uses compressibility.

Impossible to Determine If Google’s Using Compression

This article addresses the fact that there is no way to prove whether a compression ratio is an SEO myth or not.

Here’s why:

  1. If a site triggered the 4.0 compression ratio plus other spam signals, those sites would not be in the search results.
  2. If those sites are not in the search results, there is no way to test the search results to see if Google is using compression ratio as a spam signal.

It would be reasonable to assume that the sites with high 4.0 compression ratios were removed. But we don’t know that; it’s not a certainty. So we can’t prove that they were removed.

The only thing we do know is that there is this research paper out there authored by distinguished scientists.

Compressibility Is Not Something to Worry About

Compressibility may or may not be an SEO myth. But one thing is fairly certain: it’s not something that publishers or SEOs who publish normal sites should worry about. For example, Google canonicalizes duplicate pages and consolidates the PageRank signals to the canonicalized page. That’s entirely normal with dynamic websites like ecommerce web pages. Product pages may also compress at a higher rate because there might not be a lot of content on them. That’s okay, too. Google is able to rank those.

Something like compression takes abnormal levels of heavy-handed spam tactics to trigger it. Then consider that spam signals are not used in isolation because of false positives, it’s probably not unreasonable to say that the average website does not have to worry about compression ratios.

Featured Image by Shutterstock/Roman Samborskyi

Leave a Reply

Your email address will not be published. Required fields are marked *