Does gzip save CO2?
- by Dan Mateas

In this case study we look at a question we have gotten multiple times from our community: We know gzip saves data transfer and makes pages load faster. But does gzip really carbon emissions?

Thinking about it the question is quite intriguing: gzip was not designed to reduce carbon emissions in particular and when the data is transfered it has to be first compressed and then on the end user device also decompressed. This obiously consumes energy and thus emits CO2.

So is it really worth it to gzip if you just care about carbon emissions? Let’s dive in!

What do we want to find out?

Research question
Does gzip compression consume more energy than it is able to save through reduced size of the transferred data?

Calculations

For our case study we will look at two cases: First we take the Javascript file from google.com

Then we will also look at a CSV file with a size of 10 MB, so that gzip takes at least a second to run.

google.js

805 kB uncompressed -> 272 kB compressed

accidents.csv

21 MB uncompressed -> 5.2 MB compressed

Now we can calculate the carbon savings of each of these files using our CO2 Formulas:

VueJS

( 805 kB
google.js
-
272 kB )
google.js.gz
x
0.00000006
kwH / kB
x
3.600.000
J / kWh
=
115.13 J
savings

CSV File

( 21 MB
CSV
-
5.2 MB )
CSV.gz
x
0.00006
kwH / MB
x
3.600.000
J / kWh
=
3412.8 J
savings

We see that just by compressing the file we would save 115.13 Joules and 3412.8 Joules when we were to transfer these files of a fixed line network.

Now lets look at the compute cost on order to achieve this compression.

We will use perf on linux and Intel RAPL to measure the energy for compression as this is typically a CPU only task.

When using Intel RAPL you have to watch out that when using it for a time window smaller than the minimal time resolution of the Intel RAPL interface we might get inaccurate results. Currently these limits are 976 microseconds and 15.3 microJoule.

Also you have to take care, if your process runs longer than a typical scheduler tick, that your base load is very low, so your code can run mostly uninterrupted.

In order to compensate for scheduler overhead we make 10 runs per scenario.

Runs (n) File Action Energy (mean) Duration (mean)
10 CSV gzip 8.416 J +/- .062 0.7245 s +/- .002
10 CSV.gz gunzip 1.43 J +/- .02 .1245 s +/- .0002
10 google.js gzip .399 J +/- .006 .035 s +/- .0002
10 google.js.gz gunzip .088 J +/- .0008 .088 s +/- .0

The table shows the results of the mean of 10 runs for each of the rows. We see that we are well above the minimum limits for Intel RAPL.

Summary

In the results of the table we can see, that the network carbon emission savings through compression of 115.13 J and 3412.8 J are way higher than the compression & decompress costs of ~10 J ad ~0.5 J

Even with this simple test setup which may contain some inaccuracies, as we did not watch out for hyperthreading, no pre-burn time, no pre-idle time and we did not check which C-state our CPU cores where in or if Intel Speed-Step was activated we can safely assume: gzip compression saves energy!