One of my friends in the tech group asked me this question during a pleasant afternoon for writing code
Does anyone know why some images are bigger than the original ones after being processed by Canvas. ToDataURL?
Four opinions sprang up immediately on this question
A. The base64 character is definitely longer than the original one
B. Chrome processing algorithm is different, Chromium comes out to blame
A. toDataURL B. toDataURL C. toDataURL
D. It is much more natural to process the information of the graph, and it can not be helped by the large volume
Which side are you standing on? Leave your first thoughts in the comments section)
People often use toDataURL + blob to turn an image in a canvas into a downloadable file. What’s the journey of an image file to get to us? What’s the reason for your question?
Where does a picture come from
With these questions in mind, let’s look at how a picture comes into being.
How do you describe a picture
The image is a container of colors. If you want to describe colors, you can use RGB three colors. By mixing RGB three colors, you can simulate the desired color, just like we write CSS color, give RGB.
Now that you’ve found the way to describe it, how do you store this data?
Scheme 1: code one by one
“Just record them one by one. It’s an easy way to think.” – BMP
This Bmp image format was often seen on previous Windows systems. This type can record RGB data one by one and is very simple to write logically.
Scheme 2: Color table
“Don’t save every pixel, make a map table, wouldn’t it be beautiful” – Gif
Gif format gives different schemes. As a Gif format, the contents of many screens are often similar. Instead of recording one by one, it is better to construct a dictionary. This allows only the index to be recorded in the corresponding pixel.
For example, if you have a 5-pixel image in RGB, you need to record three numbers: 3*5 = 15; However, if you record index, you only need to record 1 number + 3 color table. 1 * 5 + 3 = 8; That is, as long as there are more of the same colors in the images, the method will become better and better.
Color table is a lossless compression scheme, lossless compression is after compression, the information can be restored to the original. In the compressed table above, if we match the index stored in each pixel with the color table, we can indeed recover the RGB value of each pixel.
The effect of this scheme is also very good, in general the compression ratio can reach 10:1.
So what’s the unusual case?
Let’s say we have a string of characters, we can think of this as a picture, a letter is a pixel, and the color of the letter is the color of the pixel, and now we can use the idea of a color table to process this string of characters
AAAAAAAHHCCCCCYYYYYYRRRRRRRRRR
Building an index table
AHCYR
Through our code: 7A2H5C6Y10R
Good effect! It’s very efficient to compress the data
But here comes another string:
ABABAABAEDEDDEKKLKKLLK
As usual, we continue to build the index table and then code:
1A1B1A1B2A1B1A1E1D1E2D1E2K1L2K2L1K
As a result, we found that the coded data, but more than before the encoding, it is better to code one by one!
Picture things small, small money big, have opened the site brother should understand, these picture traffic can be money, each more than 1% of the data, less are pocket money.
As a result, lossless compression can increase the length of information in some special cases. From the appearance, the information that changes quickly seems to have something to do with the result
Option 3: Lossy compression
“Don’t remember what you don’t need, see me cut all the clutter” — PNG
Lossy compression corresponds to lossless compression, and the compressed information cannot be restored.
Just like I did when I was a child:
Word-for-word translation, like lossless compression, can also be restored to classical Chinese;
Free translation is like lossy compression, it’s true but it’s not easy to go back to classical Chinese.
For example, when a friend invites you to be an official, the refusal in ancient Chinese and vernacular Chinese:
Lossless versus lossy compression
So what information is being discarded by the above lossy compression “no thanks”? For example, detailed reasons for rejection (lack of knowledge), personal feelings (unbearable). This is not so important information, which is where lossy compression “damages”.
Lossy compression, what is it that you’re losing
An important part of the picture is visual redundancy
Of course, there are equally important code redundancy and so on, which I won’t expand on
Can you tell at a glance the difference between the three pixels here?
There are differences. The red channel component of all three color blocks has increased a little bit.
Some designers may be able to distinguish it, but I can not distinguish haha ~
So, to make it even harder, if we put it on a 1080p canvas and it’s 200W, can we tell it apart?
So is this information that important? After all, we all look the same. In terms of the above 3 color particles, the color table originally needs to record 3 colors, but if we think these 3 colors all look the same, let’s remember one color, we can’t see it anyway. There’s a saying in graphics: “If it looks right, it is right.”
So going back to our color table problem, how would lossy compression solve this problem?
Raw data:
ABABAABAEDEDDEKKLKKLLK
After smoothing the unimportant information through graphic noise suppression and other technical means, the data should not change so fast:
AAAAAAAAEEEEEEKKKKKKKK
Final code: 8A6E8K
Relative to the lossless compression schemes: 1 a1b1a1b2a1b1a1e1d1e2d1e2k1l2k2l1k
Solve the problem. ✿✿, (°▽°) Blue
RGB vs YUV
Png 8 vs Png32
PNG image is the implementation of the above algorithm, in the derivative process appeared Png8\24\32 formats
The following numbers indicate the size of the color table in this format: for example, 8bit-256 colors
Lossy compression does solve the problem, but in use, pNG8 and 32 are still visible to the naked eye:
There are limits to RGB’s power, and the more you press the limits of color, the more you will be trapped by them, unless you go beyond RGB’s shackles.
It would be nice if there was a way of representing colors so that they changed very little and were mostly the same.
So people try not to use RGB to express the color, people want to have a format in the color expression can be more “smooth” some.
YUV
Then a solution was discovered: YUV. In this way, the three color channels of Y lightness, U chrominance and V concentration are used to express the color, which can solve the above problem. Why?
- General ability: YUV can also represent RGB coverage of the gamut effect
In the case of y=0.5, the COLOR gamut of RGB can also be expressed by the change of UV
- Frequency characteristics
It was found that the UV component of the COLOR represented by YUV varies very little, which is ideal for lossy compression.
For example, in the following image, the first image is the original image, and the rest of the image is the YUV component in turn. The image of the y component in this image looks very similar to the original image, and it looks like the grayscale of the original image.
The quality of the UV images seems to be worse, some of the mountain details are missing, and most of the colors seem to be the same.
Most of the time, this is a good thing. It means that the gap is very small, which means that if we tweak it, it may not be visible to the naked eye. We mentioned above what the desired color looks like — little variation, mostly the same.
And that makes it very good for us to do lossy compression.
Cosine transformation and quantization
Yuv colors help us solve the problem of signal representation.
There is also the problem of identifying unimportant information so that it can be discarded without seriously damaging image quality.
It is expected to find the pixel information that is not sensitive to human eyes, so that it will not be perceived by people even if the operation such as smoothing is performed.
Multiple representations of the same object
Here a new representation is introduced. Just as the same color can be represented in RGB or YUV in two different representations, the coordinate representation of pixels can also be represented in a similar way:
Cartesian coordinates vs polar coordinates
One is the common Cartesian coordinates, which are denoted by the x and y axes.
One is polar coordinates, in terms of radius and Angle of rotation.
So again, in the image domain there are multiple ways that an image can be identified, one way is by the color component value, the other way is by the frequency domain
Two-dimensional signal -> frequency signal
Why does the frequency domain help us solve the problem of identifying unimportant information? Because we need to be able to identify certain colors that are very close, that are very close, in order to smooth out into one color, so that people don’t see it.
The frequency domain is very good at representing the changes between information, such as whether the changes between multiple pixels are fast or slow. Slow changes mean that the difference between colors is small.
(This is the most difficult part of the whole article, cheer up! We mainly explain the purpose and function of DCT this time, the specific implementation will be detailed in the next article, if you want to urge more be sure to more like comments, your support is the biggest encouragement!
In the representation in the frequency domain, we can find out by means of cosine transform (DCT), where the pixel changes more slowly and can be smoothen by us. We use the following matrix to simulate the U component in YUV, which becomes some frequency domain data after DCT (actually DCT+ quantization).
The original u component 3*3 needs to record 9 numbers in the color table. After DCT, only 8 numbers need to be recorded in the color table due to the existence of multiple zeros.
Given a more “robust” DCT, more and more zeros will be recorded, leaving us with less and less data
DCT acts like a function that inputs a matrix of color components and processing forces and returns a processed result:
const dct = (inputMatrix,strength) => outputMatrix
Copy the code
Finally, the pixels “selected” by DCT are the data suitable for smoothing, because there are many zeros, and lossless compression can achieve very good results.
DCT is a very important algorithm, which is widely used in video coding. It also exists in the videos we watch everyday to make intra-frame prediction.
Code into a file
For an image to become a JPEG, it has to go through all of the steps we’ve gone through before it becomes a JPEG
How does a picture render
If the coding is the compression process, rendering is the decompression process, most of the coding operation step by step inversion.
Since Chrome uses GLSL in OpenGL for rendering, the image is finally rendered by OpenGL after being decoded into RGB and presented in front of us.
Back to the start
Finally understand the birth and rendering of a picture, let’s go back to the original problem
Why does the image get bigger after toDataURL?
A. The base64 character is definitely longer than the original one
Correct. Not only does base64 grow, but files stored in base64 also grow
B. Chrome processing algorithm is different, Chromium comes out to blame
Correct. Each vendor’s algorithm is different; For example, Phototshop supports 10 levels of compression
A. toDataURL B. toDataURL C. toDataURL
Correct. Chrome’s default compression quality is 0.92, giving it a smaller size can solve some problems
D. It is much more natural to process the information of the graph, and it can not be helped by the large volume
Correct. Just like an all-black picture, the volume must be small
conclusion
This is mainly discussed with you
- Image coding in common compression: lossy, lossless use of the scene
- What problems do you solve when using different color formats
- Understand the reasons behind different volumes in the same image
Which part are you most interested in? Let me know in the comments!