Those who do risk control models are familiar with IV and PSI values, which are generally used to measure the importance of variables and PSI is used to monitor the stability of model scores and characteristics.
Generally, I am familiar with the calculation formula, but for the internal principle, why IV can measure the importance of variables and PSI can measure the stability of a feature may be a little vague, but not very clear. This paper starts with the formula and explains it from the perspective of entropy of information theory. Some personal understanding of the differences and connections between IV and PSI will also be given
IV definition
IV is spelled as Information Value, which is generally used for screening of model entry and to measure the predictive ability of features on models:
Among themIs the number of characteristic cases. The larger the IV value of a feature is, the greater the information value of the feature is. Why IV can represent the information of a feature, let’s start with the logarithmic term in the formula.
WOE was extracted from formula IV
Transform the above formula as follows
Among themIs the number of characteristic cases. We find that WOE is the logarithmic term of IV, and as you can see from the formula, originally defined as WOEProportion of bad guys in the total bad guyswithThe proportion of good people under this score to the total good peopleThe difference. ‘WOE’ means’ WOE ‘The ratio of bad users to good userswithTotal number of good users and total number of bad usersThe difference. In conclusion, the larger WOE is, the greater the difference between the ratio of good and bad in this score segment is, and the stronger the distinguishing ability of this feature is.
In practice, WOE needs to be discretized first, which is commonly referred to as “box-splitting”. WOE compartments can be divided into supervised and unsupervised compartments: Generally, unsupervised compartments can be divided into equal frequency, equal distance and clustering methods. There are supervision methods such as Best-KS and Chi-square box splitting, which requires two articles, and we will discuss in detail if we have the opportunity.
Getting back to WOE, the traditional scorecard also adjusts the monotonicity of each segment for explainability after the feature is boxed to ensure monotonicity. In other words, if we define bad users as 1, the probability of predicting bad people will increase or decrease monotonically as the feature increases.
Now, let’s go back to the WOE formula, the logarithmic term of IV, which is actually part of the entropy of information, called self information.
WOE and information entropy
Entropy comes from the thermodynamic definition, and its physical meaning is the measure of the disorder degree of a system. How much information a piece of information contains depends on its uncertainty. If there are two pieces of information, one is that there is going to be a lot of rain in the rainforests this summer; Another is that there will be a lot of rain in the desert this summer. Obviously, the first sentence is highly deterministic based on our experience judgment, so there is no need to introduce other information, so the entropy of the first sentence is low. However, the second sentence is inconsistent with our common sense and has low certainty. If it really happens, we need a lot of external knowledge to verify it, so the second sentence contains high entropy.
In other words, we found that the amount of information of a piece of information is inversely proportional to the probability of its occurrence, and is directly proportional to the uncertainty, that is, the smaller the probability of occurrence, the more uncertain the amount of information, the greater the entropy, and the greater the probability of occurrence, the less uncertain the amount of information, the lower the entropy.
Consider a discrete random variableThe information function we’re looking forIt should be a probability function, and satisfy the following relation:
”
Suppose you have two independent, unrelated events
and
, then the amount of information obtained after two things happened at the same time is equal to the independent amount of information of each thing, i.e
The probability of simultaneous occurrence of two independent, unrelated events is equal to the product of the respective probabilities:
We can easily see from the above relationshipandThere is a logarithm relationship (because of the logarithm rule there is a product with the transformation of the sum relationship). So we have:
Where the negative sign is to ensure that the amount of information is positive or zero. Finally, the concept of entropy is formally introduced, and entropy is about distributionThe amount of informationExpectations.
So here we have our definition of entropy,We can see that the more the value of the random variable is, the more the number of states is, and the greater the information entropy is. And it follows that,Is the maximum entropy of uniform distribution.
Let’s go back to the definition of WOE, WOECan be understood asThe difference in the ratio of good and bad people in sub-boxesIn a certain sub-box, except when the ratio of good and bad guys is the same, i.e., 1, the amount of information is zero. In other cases, the greater the difference between the ratio of good and bad guys, the amount of information, the greater the entropyThe greater the.
Let’s look at PSI
The full spelling of PSI is popularity Stability index, which refers to the group stability index. It is used to measure the data distribution difference between test sample and modeling sample scores and is a common indicator of model stability.
Among them,In general, the smaller the PSI is, the more stable the model will be; PSI less than 0.1 indicates the higher the stability of the model; between 0.1-0.25, the stability is general; above 0.25 indicates that the model is unstable and needs retraining.
conclusion
Through the above analysis, we find that the formula for IV and PSI can be expressed as a general expression:
In the IV definition,Represents the proportion of bad guys in the current group,Represents the percentage of good people in the current group,IV measures the variable’s ability to distinguish between good and badwithThe e greater the difference, the better the ability of distinguishing features. In the PSI definition,Represents the actual proportion in the current group,Represents the expected proportion of the current group, the control group,PSI measures the stability of a variable.withThe smaller the difference, the better the stability of the proof feature.
The resources
[1] the digital risk control ChanLiang/Qiao Yang [2] the method of statistical learning expericnce [3] [4] https://blog.csdn.net/u012837965/article/details/94720028/ https://www.jianshu.com/p/103b4d70fbfd [5] https://www.cnblogs.com/kyrieng/p/8694705.html