There are three data normalization techniques: Min-max normalization, Z-Score normalization and decimal scale normalization.
The first two are most commonly used in deep learning, and this blog will focus on the first two.
1 Min Max normalized min-max
Minimum-maximum normalization, also known as discrete normalization, is a linear transformation of original data that maps data values to new [min, Max].
Assume that the set A has A range of values [minA, maxA]. How do I map the value of A to the range [new_min A, new_max A] by minimum-maximum normalization?
Minimum-maximum normalization preserves the relationship between the original data values. If the normalized future input case is outside the range of A’s raw data, it encounters an “out of bounds” error.
Give an example from your life
Assume that the minimum and maximum income of set A is $12,000 and $98,000, respectively. We want to map revenue to the range [1, 100]. So what does $73,600 map to?
Take an example from deep learning
Assume that set A is A CT image, and the minimum and maximum values are 0 and 255 respectively. The neural network wants the Tensor to be in the range of 0,1. How do we map it? Suppose one of the pixels has a value of 96. What is it mapped to?
In particular, we can simplify the formula when our new set range is [0, 1]
That’s the normalization formula we’ve seen the most. In deep learning, Pytorch maps images from 0-255 to 0-1 through ToTensor()
How to write min-max normalization in Python
import numpy as np
def min_max(x) :
# x: numpy array
x = (x - np.min(x)) / (np.max(x) - np.min(x))
return x
Copy the code
Test the
np.set_printoptions(4) Keep 4 decimal places
arr = np.array([0.23.45.89.234.255])
arr_norm = min_max(arr)
print(arr)
print(arr_norm)
Copy the code
[0 23 45 89 234 255] [0.0.0902 0.1765 0.349 0.9176 1.]
2 Z-Score standard score
Z-score is also called Stand Score, Z-value, Z-Score, normal Score, and standardized variable.
Z-score is the conversion of an original score into a standard score that makes previously uncomparable values comparable.
For example, Chinese Xiao Wang is 1.75m tall, American James is 1.85m tall, and Japanese Taro is 1.75m tall. Excluding the differences caused by nationality, may I ask which one is taller, Xiao Wang, James and Taro?
In direct numerical terms, of course James is the highest. But what does it mean to exclude differences based on nationality? That is to say, the Japanese may be relatively short across the country (just for example), and the 1.7m in Japan may be equivalent to the 1.75m in China and 1.85m in the US.
So it’s not a direct comparison of numbers, it’s a comparison of a standard height for each person in the context of the national height of their country.
This is where z-Score standardization comes in.
Where x is the original score, z is the converted Z-score, μ is the mean score of the sample space, and σ is the standard deviation of the sample space.
Let’s look at another example:
For example: Xiao Hong English test 90 points, language test 60 points, excuse me xiao Hong English and language which test good? The same situation, if the direct score, of course, is good English.
But an obvious possibility is that the difficulty of the two courses is not the same, maybe the Chinese is more difficult, everyone failed, only Xiao Hong passed; In English, everyone got 100, but Xiao Hong got 90. In this way, it seems that Xiao Hong’s Chinese test is better.
So how do you use z-scores for comparison?
Suppose the Chinese class average score is 40 and the standard deviation is 10, and the English class average score is 98 and the standard deviation is 5. So xiao Hong’s Chinese score “standard score” is (60-40)/10 = 2, and English score “standard score” is (90-98)/5 = -1.6. Such a comparison, The English score is far lower than the Chinese score, it can be seen that Xiao Hong’s Chinese is quite good.
Z-score reference link
How to write z-score in Python
def normalize(x, mean, std) :
x = (x-mean) / std
return x
Copy the code
Application of Z-Score in deep learning
As long as you have used ImageNet pre-trained models, you have definitely used Z-Score. Remember where you used it 🤭
mean = [0.485.0.456.0.406]
std = [0.229.0.224.0.225]
Copy the code
If this mean and variance look familiar.
Right, so this is the mean and variance of the ImageNet dataset, and a lot of times, we normalized the dataset using the mean and variance of the ImageNet dataset.
transforms.Normalize(mean=mean, std=std)
Copy the code
In this case, Normalize is basically doing z-Score normalization.
The article continues to be updated, you can follow the wechat public account [Medical image Artificial Intelligence Field Camp] to get the latest developments, a public account focusing on the frontier technology in the field of medical image processing. Adhere to the practice of the main, hand in hand to take you to do projects, play games, write papers. All original articles provide theoretical explanation, experimental code, experimental data. Only practice can grow faster, pay attention to us, learn and progress together ~
I’m Tina and I’ll see you next time
I work all day and write all night
And then finally, get a thumbs up, comments, favorites. Or three with one button