§
Intruduction
JPEG, an acronym for Joint Photographic Experts Group, is a popular technique for compressing digital images, especially those taken by digital cameras. It allows users to adjust the level of compression, striking a balance between storage size and image quality. Generally, JPEG achieves a compression ratio of 10:1 without significantly compromising the visual appearance of the image. Since its inception in 1992, JPEG has become the most widely adopted image compression standard globally. It is also the most commonly used digital image format, with billions of JPEG images being generated daily since 2015. The term "JPEG" originates from the Joint Photographic Experts Group, the organization that established the standard in 1992. JPEG played a significant role in the widespread use of digital images and their distribution over the Internet and social media platforms.
§
The JPEG Algorithm
§§
Color Space
In contrast to specific alternative algorithms, JPEG employs Y'CbCr for image interpretation. Y' represents the luma component within this color space, while CB and CR represent the chroma components for blue-difference and red-difference, respectively. The following formulas describe the conversion process from RGB (Red, Green, Blue) to Y'CbCr:
And to convert from Y'CbCr to RGB:
§§
Discrete Cosine Transform
Letâs split the image into 8x8 pixel blocks. We will perform a two-dimensional DCT (Discrete Cosine Transform) for each block. You can look at the links at the bottom if you donât know what a DCT is. Here are the formulas for calculating a new 8x8 block where each cell represents the Cosine Coefficient:
Where Gu,v is the DCT coefficient at coordinates (u, v). gx,y is the pixel value at coordinates (x, y). By the way the cosine here has radians as input. And α(n):
§§
Quantization
Small terms in the DCT donât change the value much, so why store them? The Quantization step is basically made for storing as less as possible of these small terms. JPEG has a âQuantization Matrixâ Q, then we replace each DCT block value with:
§§
Entropy Coding
We have a collection of blocks 8x8. Notice those blocks. How can we exploit this and save memory? Notice that the value in the block at (0, 0) will have a significant coefficient. The farther the index from 0, the less meaning. Letâs zigzag the block and create a new array.

This way, all the zeros will be in a row suitable for the RLE (Run-Lenght Encoding). What is RLE? Simple, RLE is the most obvious way of compressing information. Letâs consider a string.
AAAABBBBBAAAA
How can we compress the text most simply? Let's just write 4A5B4A. Simple and effective! If the text contains a number, like in our case, we can use a separator symbol. For example, consider a text:
111111AAAAAAA222222
Letâs say the separator symbols are â;â and â:â Then we can compress this string to:
6:1;7:A;6:2;
Notice this compression method only works for strings with long sequences of repeating characters. For other sequences like:
ABC
The compressed version of the string is bigger than the original:
1A1B1C
In our case with the DCT blocks, after doing the zigzag algorithm, there are long sequences of zeros which is good. Now that we performed the RLE, we use the Huffman Coding. Huffman Coding is something I will not discuss here; however, if you would like to know about it, I provided links in the âLinksâ section.
§§
Decoding
After decoding the entropy coding, we get the quantized DCT block. First, we need to take the entry-for-entry product with the quantization matrix. Here is the formula:
Now we need to perform a inverse two-dimentional DCT on F. Here is the formula:
Thats it! This is how JPEG works! Apply everything I described for all the signals Yâ, Cb, and Cr, and you get yourself JPEG.
§
Links
https://en.wikipedia.org/wiki/Huffman_coding
https://tutorial.math.lamar.edu/classes/de/fouriercosineseries.aspx