Lynne Teaches Tech: Why does compressing a JPEG make it look worse, even though putting in a ZIP file makes it look the same?

there are many different methods of file compression. one of the simplest methods is run length encoding (RLE). the idea is simple: say you have a file like this:
aaabbbbaaaaa
you could store it as:
a3b4a5
to represent that there are 3 a’s, 4 b’s, etc. you would then simply need a program to reverse the process – decompression. this is seldom used, however, as it had a fairly major flaw:
abcde
becomes
a1b1c1d1e1
which is twice as large! RLE compression is best used for data with lots of repetition, and will generally make more random files *larger* rather than smaller.

a more complex method is to make a “dictionary” of things contained in the file and use that to make things smaller. for example, you could replace all occurrences of “the united states of america” with “🇺🇸”, and then state that “🇺🇸” refers to “the united states of america” in the dictionary. this would allow you to save a (relatively) huge amount of space if the full phrase appears dozens of times.

what i’ve been talking about so far are lossless compression methods. the file is exactly the same after being compressed and decompressed. lossy compression, on the other hand, allows for some data loss. this would be unacceptable in, for example, a computer program or a text file, because you can’t just remove a chunk of text or code and approximate what used to be there. you can, however, do this with an image. this is how JPEGs work. the compression is lossy, which means that some data is removed. this is relatively imperceptible at higher quality settings, but becomes more obvious the more you sacrifice quality for size. PNG files are (almost always) lossless, however. your phone camera takes photos in JPEG instead of PNG, though, because even though some quality is lost, a photo stored as a PNG would be much, much larger.

some examples of file formats that typically use lossy compression are JPEG, MP4, MP3, OGG, and AVI. some examples of lossless compression formats are FLAC, PNG, ZIP, RAR, and ALAC. some examples of lossless, uncompressed files are WAV, TXT, JS, BMP, and TAR. in terms of file size, you’ll always find that lossy files are smaller than the lossless files they were created from (unless it’s an horrendously inefficient compression format), and that losslessly compressed files are smaller than uncompressed ones.

you’ll find that putting a long text file in a zip makes it much smaller, but putting an MP3 in a zip has a much less major effect. this is because MP3 files are already compressed quite efficiently, and there’s not really much that a lossless algorithm can do.

there are benefits to all three types of formats. lossily compressed files are much smaller, losslessly compressed files are perfectly true to the original sound/image/etc while being much smaller, and uncompressed data is very easy for computers to work with, as they don’t have to apply any decompression or compression algorithms to it. this is (partly) why BMP and WAV files still exist, despite PNG and FLAC being much more efficient.

as an example of how dramatic these differences often are, i looked at the file sizes for the downloadable version of master boot record’s album “internet protocol” in three formats: WAV, FLAC, and MP3.

you can see that the file size (shown in megabytes) is nearly 90 megabytes smaller with the FLAC version, and the MP3 version is only ~13% of the size of the WAV version. note that these downloads are in ZIP format – the WAV files would be even larger than shown here. this is not representative of all compression algorithms, nor is it representative of all music – this is just an illustrative example. TV static in particular compresses very poorly, because it’s so random, which makes it hard for algorithms to find patterns. watch a youtube video of TV static to see this in effect – you’ll notice obvious “block” shapes and blurriness that shouldn’t be there as the algorithm struggles to do its job. the compression on youtube is particularly strong to ensure that the servers can keep up with the enormous demand, but not so much so that videos become blurry, unwatchable messes.