Non-invasive load identification method based on improved kNN algorithm

“This is the 19th day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021”

preface

At the beginning of the article, explain some nouns in the article:

NILM: It refers to the installation of monitoring equipment at the power entrance. By monitoring voltage and current signals at the power entrance, the type and operation of a single load in a load cluster can be analyzed. After the load information obtained to identify, we can conclude that the current electrical appliances used is what? Whether there is a fault.
KNN: K-nearest Neighbor algorithm, based on Euclidean distance, if a sample belongs to a certain category in most of the K most similar samples in the feature space (that is, the most adjacent samples in the feature space), then the sample also belongs to this category. (Source: Baidu.com)

The article information

Title: Non-invasive load identification method based on improved kNN algorithm
Yan Fei, ZHANG Ruixiang, SUN Yaojie, TAO Yuhui, HUANG Guoping, SUN Weitao
Journal: Fudan Journal (Natural Science edition). 201,60(02)
Key words: Load identification; KNN algorithm. Binary V-I locus; Comprehensive similarity;
Key words: Load identification; KNN algovithm; binary V-I trajectory; comprehensive similarity

The article summarizes

When there is an imbalance problem in KNN data set, the categories with a large number of samples will interfere with the categories with a small number of samples. Aiming at this problem, this paper assigns different weights to training samples to increase the voting rights of minority samples in classification judgment.

The load features selected in this paper are V-I locus and amplitude, and a load classification judgment method based on the comprehensive similarity of the two features is proposed.

For V-I curves, the original V-I data are transformed into binary V-I trajectories through normalization of mapping.

The evaluation indexes were macro mean F1, accuracy rate, recall rate, etc

Finally, the effectiveness of the improved KNN algorithm is verified by PLAID data set and collected laboratory data.

Learning record

Disadvantages of KNN algorithm

The disadvantage of kNN algorithm is that when there is an imbalance problem in the data set, most of the training samples with a large number of samples are easy to be selected as the K-nearest neighbor, which interferes with the decision of a few classes.

KNN disadvantage solution

Undersampling and oversampling: Remove most samples and synthesize a few samples to eliminate data set imbalance (similar to synthesized data mentioned in English literature)

How and according to what rules is the data synthesized? (To be queried)

The algorithm is improved to assign different weights to training samples and increase the voting rights of minority samples in classification and judgment.

The method used in this paper is the second method, and the weight allocation method is also relatively simple, using weight(I) = 1 / size(I), following the principle that a few samples have a large weight and most samples have a small weight.

On the question and prospect of KNN weight allocation

Is it really good to assign weights by sample size alone? Would a more professional weight allocation method improve the accuracy of the algorithm?

I have checked several articles. Currently, ds-WKNN based on distance weighting or KDF-WKNN based on kernel difference reconstruction method is generally used to allocate weight for KNN in other research directions, or correction factors are added to the above methods to further improve the rationality of weight allocation.

I mainly checked the papers on CNKI. At present, KNN is not widely used in NILM, and there is only one improvement. I don’t know whether it can be the research direction to improve again.

Synthetic discriminant

The idea of comprehensive discrimination method is not difficult, which can be roughly divided into the following four parts:

Calculate the V-I trajectory similarity and amplitude similarity of the test sample and all the training samples. Denoting Sim1 and Sim2 respectively, Sim1 = 1 / (1 + dist1) Sim2 = 1 / (1 + dist2) Dist1 and dist2 are the distances of V-I tracks and amplitude between the two samples respectively, which are both Euclidean distances.
In descending order of Sim1, the largest training samples of the first K Sim1 are taken as the K nearest neighbor of the current test sample.
Calculate the comprehensive similarity between the current sample to be tested and all K nearest neighbors

Sim(a, Tj) = Sim1(a, Tj) * weight(Tj) – Sim2(a, Tj)
The total comprehensive similarity between the samples to be tested and K nearest neighbors was counted, and the class with the largest total comprehensive similarity was taken as the prediction result.

The evaluation index

The effectiveness of the algorithm was evaluated using the macro average F1 value.

Macro average F1 value reference links: macro average, accuracy, recall, etc

doubt

binaryV-IHow to do the trajectoryKNN, how to find its Euclidean distance?
binaryV-IHow are trajectories normalized?