Original works, reprint please contact the author: wechat (SUNx5126).
Abstract: With the advent of the era of big data, the emergence of massive data from all walks of life has posed new challenges to data processing technology, and feature selection as a common dimensionality reduction method has also received more and more attention. In this paper, the process and classification of feature selection are summarized, and then the research and application of feature selection algorithms of different categories are described in detail from their optimization development process, and based on this, the development direction of feature selection in the future is pointed out. Key words: feature selection feature correlation, unsupervised, incomplete system, disequilibrium category
The Abstract: With the advent of the era of big data, huge amounts of data that appeared in all walks of life brings a new challenge to data processing technology, as a common dimension reduction method, feature selection is becoming a more and more hop topic. This article outlined the general process of feature selection and classification, and then mainly analyse the development and application of the different category feature selection algorithm based on its performance optimization, and points out the future trend of feature selection. Keywords: feature selection, non-supervision, incomplete information systems, unbalanced category
Feature selection refers to the process of selecting a subset of features from the original input to make a certain evaluation criteria optimal. In the early stage of its development, it is mainly studied from the perspective of statistics and information processing, and the problems involved are usually not characterized by a large number of features [1, 2, 3]. With the development of Internet technology and the increase of data in various industries, feature selection has also received more and more attention and been widely studied and applied.
2 Definition and basic process
2.1 Feature selection Defines a given sample data set T={O,F,C}, whose F={f1,f2… fm},C={c1,c2… cm},O={o1,o2,… Om} represents characteristics, categories, and data sample sets, respectively. Let J:2F→[0,1] be the evaluation function of the feature subset, where the larger the value of J(X) is, the more information contained in the feature subset X. In this case, the feature selection algorithm usually has the following three types: 1. Find a feature subset X from the feature set F, making J(X) the largest; 2. 2 given the threshold J0, find a minimum self X from F such that J(X)>J0; 3 Find a subset X from F such that J(X) is as large as possible and the eigenbundles in X are as small as possible. The three representations reflect different aspects and emphases of feature selection. The first one focuses on the information contained in the selected feature subset, that is, information should not be lost as much as possible in the selection process. The second emphasizes choosing a minimum subset that satisfies a given condition; The last rule is to strike a compromise between the subset size and the amount of information.
2.2 Basic Process Generally speaking, feature selection process consists of initial subset setting, search strategy, subset evaluation and termination conditions.
The setting of the initial subset is the beginning of the feature selection algorithm, and its selection has a direct impact on the subsequent search strategy. If the initial subset S is empty, the search process will add candidate features to the selection subset, that is, forward search. If the initial subset is the original feature space, the search process will continuously remove irrelevant or unimportant features from the feature subset S, namely backward search. If feature subsets are randomly generated from feature set F, then the search process adopts random search strategy to add candidate features or delete selected features. Termination condition is according to the candidate subset evaluation score J (S) or other constraints determine whether the current candidate subset S meet preset conditions, if the condition is met, the end of the selection algorithm, selection feature subset S as the final result after return, continue to cycle or search process, generate new candidate subset, until the termination condition is met. The following termination conditions are often used in feature selection algorithms: (1) The number of features of candidate subset S exceeds the preset threshold; (2) the number of search cycles exceeds the preset threshold; (3) the evaluation function J(S) reaches the highest or optimal value; (4) The evaluation function J(S) exceeds the preset threshold search strategy and evaluation criteria, which are two key problems of feature selection algorithm. A good search strategy can speed up the degree of selection and find the optimal solution, and a good evaluation criterion can ensure that the selected subset has a large amount of information and little redundancy. Evaluation criteria refer to the means of evaluating the advantages and disadvantages of selected features and their subsets according to certain evaluation criteria, which directly determine the output results of selection algorithm and the performance of classification model. The selection of evaluation criteria has always been a research focus of feature selection algorithms. So far, many evaluation criteria have been proposed [4,5,6], including distance measure, consistency measure [7,8], dependence measure [9], information measure [10], and classification error measure [11,12].
3. Different evaluation criteria are adopted for classification, and feature selection algorithms can be roughly divided into the following types:
1) Search strategy: The feature selection process based on exhaustive search, sequence search and random search is to some extent a subset search optimization problem. Exhaustive search refers to the evaluation and measurement of every feature in the feature space, and every feature subset can be searched. Usually expensive, long calculation time, not suitable for large-scale data processing. In sequence search, features are continuously added or removed to the current feature subset according to a certain order in order to obtain the optimized feature subset. Typical sequence search algorithms include forward-backward search [13], floating search [14], bidirectional search [13], sequential forward and sequential backward algorithm, etc. Sequence search algorithm is easy to implement, and the computational complexity is relatively small, but easy to fall into local optimal. Random search starts from a random subset of candidate features and approaches the global optimal solution step by step according to certain heuristic information and rules. Such as: Genetic Algorithm (GA), SimulatedAnnealing Algorithm (SA), Particl Swarm Optimization (PSO), and Immune Algorithm (Immune Algorithm). IA), etc.
2) Metrics: Based on distance, dependence, consistency, information entropy and classification error metrics refer to the means used to evaluate the performance of the feature selection algorithm and its subset. Distance based refers to the use of Euclidean distance, Mahalanobis distance and other standards to measure the correlation between features and the correlation between features and categories. With the support of vector space model, distance means small correlation, and distance means large correlation. Dependence measures the importance of features by the statistical correlation between features in category domain, that is, if two variables are known to be statistically correlated, the value of one variable can be used to predict the value of the other variable. Currently, there are many statistical correlation coefficients, such as T-test, F-measure and Parson correlation coefficient. Probability error and Fisher score [15] are used to describe the interdependence between features and between features and categories. Consistency measures: Given two samples, if their eigenvalues are the same but they belong to different categories, they are said to be inconsistent; otherwise, they are inconsistent [16]. In other words, inconsistent samples are contradictory, because they have the same properties and characteristics but belong to different categories. Sample data set is not consistency refers to the sample of the data set is not consistent with the ratio between the total number of samples, consistency measure is to use the feature point response characteristics of the importance degree of [8], if, after removing a feature dataset of inconsistency will significantly increase, so that the feature is very important, or that is not important, The advantage of this standard is that it can obtain a small subset of features, but it is sensitive to noise data and only suitable for discrete features. The information measurement standard mainly uses the uncertainty degree of quantized features such as information entropy to determine the content of classified information. The advantage of information metric is that it is a non-parametric nonlinear metric, and it does not need to know the distribution of sample data in advance. Information entropy can well quantify the degree of uncertainty relative to the category. The classification error metric is based on the following principle: In the text classification problem, the feature selection is aimed at making the later classification accuracy high. Therefore, if the classification error is adopted as the evaluation standard, the feature subset obtained will have better performance. For example, Huang et al. [11] used hybrid genetic algorithm to get feature subsets together with classifiers, which can significantly improve the classification performance of the final classification model. Neumann et al. [12] used the classification performance of support vector machines as the metric for feature selection.
3) Number of evaluation features: Single-feature selection method, multi-feature selection method The so-called single-feature selection method refers to the evaluation of the importance of features is based on the principle of feature independence, regardless of the correlation between features, while the multi-feature selection method is based on feature correlation. That is, whether a feature is a candidate depends on its own importance and its influence on the importance of existing features.
4) any category information: supervision, half supervision, unsupervised For supervised feature selection, the samples have category information in the classification, characteristics and the correlation between categories to select, unsupervised feature selection, no category information, according to the characteristics of the correlation between aggregated, generally speaking, the greater the relevance of the characteristics, the higher importance. In recent years, some scholars have begun to study semi-supervised feature selection. Since category information is relatively scarce in text classification, but simple unsupervised technology is not mature enough, many scholars adopt unsupervised clustering first, and then select based on the correlation between category and feature.
5) relationship with learning algorithm: Embedded[17,18,19], Filter, Wrapper[20,21] and hybrid selection algorithm feature selection is widely used and studied in machine learning. According to the relationship between feature selection and learning algorithm, feature selection can be divided into different categories: In embedded architecture, feature selection algorithm itself is embedded in learning algorithm as a component. For example, some logical formula learning algorithms are realized by adding and subtracting features to the formula expression [22]. The most typical are decision tree algorithms, such as Quinlan’s ID3 and C4.5[17,18] and Breiman’s CART algorithm [19], etc. The process of decision tree generation is also the process of feature selection. The evaluation criteria of filtering feature selection are independent of the learning algorithm and obtained directly from the data set. The evaluation depends on the data set itself and usually selects features or feature subsets that are highly related to the objective function. It is generally believed that features with high relevance or feature subsets have higher accuracy in subsequent learning algorithms. There are many evaluation methods for filtering feature selection, such as distance between classes, information gain, correlation degree and inconsistency degree. Considering that has nothing to do with learning method of filtering type characteristic evaluation can produce larger deviation and subsequent classification algorithm, and the learning algorithm based on the performance of the selected feature subsets are characteristic of the better evaluation criterion, different preferences for different learning algorithm feature subset, since after the feature selection of feature subset eventually will be used in subsequent learning algorithm, Then the performance of the learning algorithm is the best evaluation criteria. Therefore, the performance of learning algorithm is taken as the evaluation standard in Wrapper feature selection.
Feature selection has been studied by scholars since the 1960s, and it has been more than 50 years since its development. The status and role of feature selection are constantly changing with the change of data processing needs. At the same time, the change of external needs also puts forward new requirements for feature selection technology. In order to adapt to the constantly updated data from all walks of life, feature selection technology is also undergoing qualitative changes, gradually becoming powerful and convenient to use in all walks of life. In general, feature selection algorithm has undergone the following major changes:
4.1 From the single feature selection algorithm based on threshold to the combination of multiple feature selection algorithms to search for the optimal feature subset, the single feature selection based on threshold has simple calculation, low complexity and high efficiency, and is suitable for feature selection in text classification, mainly including: Document frequency method (DF)[23], Information gain method (IG) [23], Mutual information method (MI) [23],CHI[23], Expected cross entropy [24], text evidence weight [24], dominance rate [24], Word frequency coverage [25], principal component analysis [26], The feature selection methods of Focus,Relief and ReliefF, etc., are representative studies on feature selection of text classification by Yang Yiming[27] and Dunja Mladenic[27]. Combined feature selection refers to the use of multiple feature selection algorithms to select the optimal feature subset. Because each feature selection algorithm has different advantages and disadvantages, it cannot overcome its own defects when used alone, so different algorithms just complement each other. The combination mode group shall include the following: a. Series combination of various feature selection algorithms based on information theory and information measurement, such as TF-IDF (simple combination, position-based combination)IG-DF, TF-DF, etc. B. Genetic algorithm [23] and Tabu Search [24] Chen et al. proposed the search strategy GATS, which is a mixture of genetic algorithm and tabu search. Based on this strategy, the feature selection algorithm FSOSGT was proposed to improve the speed of feature selection [28]. C. Genetic Algorithm and Artificial Neural Network [29] Xie et al. used mathematical statistics to analyze the weight changes of neural network before and after training, improved the weight connection shear algorithm, and obtained a non-fully connected neural network suitable for specific problems. Feature selection based on feature fuzzification and neural network was proposed, and its effectiveness was proved through experiments [29]. D. Sequence search strategy and classification performance evaluation criteria. Literature [25], [26] and [30] combined the sequential search strategy (SBS, SFS, FSFS) with the classification evaluation criteria to evaluate the selected features, which also achieved good results. Compared with the random search strategy, it saved time. E, Wrapper, and random search policies. Literature [31] proposed the Wrapper method of feature selection using decision tree, and used genetic algorithm to find a set of feature subsets that could minimize the classification error rate of decision tree. Literature [27] combines normal maximum likelihood model to carry out feature selection and classification, and [32] uses genetic algorithm combined with artificial neural network to make the same attempt. [33] Support vector Machine SVM is adopted as a classifier to further improve the accuracy of classification. [34] Zhang proposed a series of feature selection algorithms combining Filter and Wrapper, such as relief-wrapper, principle Component Analysis. Recorre, Resbsw, relief-ga-Wrapper, etc. The combination of Filter and Wrapper methods is a hot research topic.
4.2 From Feature Selection Based on Complete Decision Table to Feature Selection Algorithm Based on Incomplete Decision Table In the early stage of the development of feature selection technology, the data is relatively single and the amount of data is small, so the default value is directly filled, and then treated as a data set without defect. Therefore, feature selection was initially developed based on complete decision tables, and some scholars proposed effective reduction algorithms. Hu et al., for example, proposed a good heuristic function [35] and proposed an attribute reduction algorithm based on the positive field. Wang et al. studied knowledge reduction from the viewpoint of information theory and algebra [36]. Conditional entropy was used to solve the reduction of decision tables for heuristic information. Liu et al. proposed a complete algorithm based on attribute order based on discriminant matrix [36]. Guan et al. defined the equivalent matrix on the basis of the equivalence relation, and characterized rough set calculation by matrix calculation [37]. The above algorithm can reduce feature selection time consumption in the sense of completeness and improve efficiency. The default value is completed by some standard so that the feature selection algorithm in the complete sense can work normally. However, the filled value has errors with the actual value or in the case of direct default. Moreover, if accurate prediction of default value is needed, relatively complex prediction method is required. This undoubtedly brings great time cost and complexity to the pre-processing before feature selection. Therefore, it is very important to extract useful information features from the existing incomplete information system without processing the default values. The equivalence relation of classical rough set theory is no longer suitable. Therefore, complete information system is extended to incomplete information system [38,39]. In recent years, some authors have also made preliminary exploration on feature selection of information system or decision table in the sense of incompleteness [40,41]. Liang et al. gave the definition of rough entropy in incomplete information system [41] and proposed a knowledge reduction algorithm based on rough entropy. Huang et al. [42] described the importance of attributes by introducing information amount, and proposed a heuristic reduction algorithm based on information amount. Meng et al. [43] proposed a fast algorithm for attribute reduction of incomplete decision tables. However, the existing reduction algorithms based on incomplete decision tables are time-consuming to varying degrees. Forward approximation proposed by Qian and Liang et al. [44,45] is an effective method for describing the concept of goals. Qian and Liang et al. [46] further studied the forward approximation in the incomplete sense and discussed how to depict the grain size structure of rough sets by using the forward approximation method in the incomplete sense. The idea of forward approximation under dynamic granularity provides a new research Angle for granularity calculation and rough set theory, and is also applied in rule extraction and attribute reduction.
4.3 from feature selection based on the characteristics of independence principle to feature associated feature selection algorithm based on the characteristics of the independent principle of feature selection is the premise of assumption among characteristics has nothing to do, think the document recognition feature set is features on each document linear and recognition rate, so that support vector machine (SVM) on the algorithm to obtain the very good application, at the beginning of the development of feature selection, It’s all assumed to be characteristic independent. However, in practice, many features are highly correlated, and these features are very similar in classification ability. If all of them are considered as a subset of candidate features, a large number of feature redundancy will result, thus affecting the performance of the classifier. This problem is more prominent in the case of fewer training samples in some categories, because the evaluation value of features in sparse categories is lower than those in the main categories, and traditional feature selection algorithms tend to favor feature association in those main categories. From the perspective of information theory, the goal of feature selection is to find a feature subset that contains all or most of the information of the original feature set. The existence of this feature subset can reduce the uncertainty of other unselected features to the greatest extent. According to the definition of feature selection in classification, it is to find the feature subset with the greatest correlation with the classification category and the least correlation with each other. Based on this, scholars have proposed a series of feature selection algorithms: Weston introduced a feature selection algorithm based on support vector machine [47], according to which features with clear classification information can be selected. Qiu et al proposed a feature selection algorithm based on fuzzy correlation between features and linear combination of X2 statistics [48]. Advanced text feature selection based on two-word association [49]. Jiang et al proposed feature selection based on feature correlation [50], and Liu et al proposed a feature selection algorithm based on conditional mutual information [51]. First, clustering features remove noise, and then select features with the highest degree of class correlation to remove irrelevant and redundant features. Zhang proposed an optimal feature selection algorithm based on minimum joint mutual information deficit [52]. Grandvalet also introduced an algorithm that can automatically calculate the relationship between attributes [53]. This kind of feature selection algorithm considers the correlation between features and effectively reduces the redundancy of feature subsets. In the research process of feature selection algorithms considering feature correlation and redundancy, the relatively famous one is the emergence of Markov Blanket theory. Yao et al. proposed the definition of Markov Blanket and a feature selection algorithm based on approximate Markov Blanket and dynamic mutual information [54]. The approximate Markov Blanket principle is used to remove redundant features accurately, and the feature subset that is much smaller than the original feature size is obtained. The emergence of Markov Blanket is an important achievement of feature correlation research. Based on this, Cui et al. proposed an approximate Markov Blanket feature selection algorithm based on forward selection [55] to obtain an approximate optimal feature subset. Aiming at the problem that the existence of a large number of irrelevant and redundant features may reduce the performance of the classifier [56], Yao et al. proposed a feature selection algorithm based on approximate Markov Blanket and dynamic mutual information and applied it to ensemble learning, thus obtaining an ensemble feature selection algorithm.
4.4 From Feature Selection for Balanced Data to Feature Selection for Unbalanced Data Sets Feature selection algorithms based on balanced data sets have the same size of each category in the data set to be processed by default, or ignore the effect of category size on the algorithm result of feature selection. However, most feature selection algorithms prefer large categories and ignore small categories. Therefore, feature selection algorithms based on this equilibrium assumption have poor performance in processing data sets with large difference in category sizes. Later, scholars proposed various feature selection algorithms based on unbalanced data sets, assigning different weights to features in categories of different sizes to balance errors caused by category sizes. Two kinds of feature selection algorithms are proposed. One is feature selection based on classification ability for different category sizes, the other is feature selection based on theory domain for different category semantics. The first type of algorithm mainly includes; CTD (Categorical Descriptor Term) SCIW (Strong Class Info Words) [57]. Zhou et al. put forward the concept of category-discriminating words [58], and applied the modified multi-category dominance ratio and category-discriminating words method to achieve better feature selection effect. Xu et al proposed a high-performance feature selection method based on categorization ability [59], and quantified the categorization ability. Zhang put forward considering the characteristics such as negative class in the class and the distribution properties of [60], combined with the feature of distribution of measure category correlation index to evaluate the key, he pointed out that choice has strong category information entry is the key to improve the performance of rare category classification [61], analyze and validate the generally strong category information entry is not a word There is even a tendency to favor rare words, proposed algorithm DFICF. Zheng divided feature selection into two categories [62] : only positive examples were selected (single-sided method) and both positive and negative examples were selected (two-sided method). A method to reasonably select features from positive and negative examples was proposed, and good classification results were obtained. Forrnan analyzed counterexamples and found through experiments that removing counterexamples from features would reduce the performance of classification [63], so counterexamples are also necessary in high-performance classification. Ji et al. proposed a feature selection method based on category weighting and variance statistics [64] to strengthen the features of small categories through weighting. Xie et al. promoted the traditional F2score to measure the discrimination ability of sample features between the two categories and proposed an improved F2score, which can measure the discrimination ability of sample features between multiple categories. Xu proposed an improved feature selection method (IFSM) based on category distribution [65], in addition, Wu proposed a feature selection algorithm TF-CDF[66] based on variable accuracy theory VPRS, and Wang et al proposed a feature selection framework based on category distribution [67]. These algorithms greatly promote the development of feature selection for unbalanced categories. The second type of feature selection algorithm mainly includes: Chinese text feature selection algorithm based on semantic and statistical features proposed by Zhao et al. [48], which extracts feature co-occurrence set using the idea of word co-occurrence model. Xu proposed the method of using the category feature domain to extract the important features in each category as important features [65], wu et al proposed the method of unsupervised text feature selection based on the theory domain [68].
4.5 From Supervised feature selection to Unsupervised Feature selection Methods based on supervised feature selection are widely used in text classification, which can filter out the majority of text feature words without reducing the effect of text classification [69]. However, these mature supervised feature selection methods need category information, which is exactly what text clustering lacks. At present, there are some mature unsupervised feature selection methods, such as document frequency, word weight, word entropy, word contribution, etc., but they can only filter out about 90% of the noise words. If more noise words are filtered out, the effect of text clustering will be greatly reduced [70]. Therefore, unsupervised feature selection is still a hot topic in the field of text clustering. And with the increase of network data, the requirement of feature selection is more and more inclined to unsupervised feature selection. Liu proposed an unsupervised feature selection algorithm based on K-means [71], and the clustering result obtained was close to the ideal clustering result obtained by supervised feature selection. Zhu proposed a heuristic attribute reduction algorithm for information systems without decision attributes [71]. Xu et al. proposed an unsupervised feature selection method based on mutual information (UFS-MI) [72], which comprehensively considered the feature selection standard UmRMR (Unsupervised Minimum Redundancy maximum Correlation) of relatability and redundancy to evaluate the importance of features.
5 Development Direction of feature selection According to the development process of feature selection algorithm above, current feature selection algorithm tends to feature correlation, combination of multiple algorithms, based on incomplete decision table, no supervision process, and can deal with unbalanced data sets. However, with the rapid development of the Internet, the types and richness of data are increasing day by day. In recent years, some new research directions have emerged, such as integrated learning based on feature selection, feature selection based on multi-objective immune optimization combined with clonal selection and immune network, and the combination of enhanced learning and feature selection. It is difficult to determine the specific development direction of feature selection, but with the increase of Internet data, feature selection as an effective dimensionality reduction method will get more research and expansion, and its application direction is becoming more and more abundant.
This paper summarizes the development background and process of feature selection, classifies feature selection algorithms from different perspectives, and points out the difficulties and problems to be solved in theoretical research and practical application. Then, the development process of feature selection algorithm is mainly combined with the detailed analysis of all kinds of feature selection algorithm, and the future development direction and trend of feature selection algorithm.
参考文献
【1】Lewis P M The characterstic selection problem in recognition system IRE Transaction on Information Theory,1962.8:171-178
【2】Kittler J.Feature set search algorithms.Pattern Recognition and rough set reducts.The Third international Workshop on rough sets and Soft Computing,1994:310-317
【3】 Cover TM The best two independent measurements are not the two best.IEEE Transaction on system,Man and Cybernetics,1974,4(1):116-117
【4】Liu H ,Motoda H.feature selection for knowledge discovery and data mining[M]Boston:Kluwer Academic Publishers,1998.
【5】Liu H,Yu L.Toward integrating feature selection algorithms for classification and Clustering [J].IEEE Transactions on knowledge and data engineering,2005,17(4):491-502.
【6】Molina L C,Belanche L,Nebot A.Feature selection algorithms:a survey and experimental evalution number[R].Barcelona,Spain:universitat politecnica de Catalunya,2002.
【7】DashM.LiuH.Consistency-based search in feature selection.[J] Artifical intelligence,2003,151(1-2):155-176.
【8】Arauzo-Azofra A,Benitez J M,Castro J L.Consistency measure for feature selection [J].journal of intelligent information system,2008,30:273-292.
【9】Zhang D,Chen S,Zhou Z-H.Constraint score:A new filter method for feature selection with pairwise constraints[J].Pattern Recognition,2008,41(5):1440-1451
【10】Yu L,Liu H.Efficient feature selection via Analysis of Relevance and Redundancy[J].Journal of machine learning research,2004,5:1205-1224.
【11】Huang J,Cai Y,Xu X.A hybird genetic algorithm for feature selection wrapper based on mutual information [J].Pattern Recognition letters,2007,28:1825-1844.
【12】Neumann J,Schnorr C,Steiidl G.Combined SVM-based feature selection and classification[J].Machine learning,2005,61:129-150.
【13】Kittler J, Feature set search algorithms,in:C.H.Chen, PatternRecognition and Signal Processing, Sijthoff and Noordhoff,1978:41-60.
【14】 Pudil P, Novovicova N, Kittler J. Floating search method[J].Pattern Recognition Letters,1994(15) :1119-1125.
【15】Devijver P A,Kittler J.pattern recognition-A statistical approach[M].London:prentice Hall,192.
【16】DashM.LiuH.Consistency-based search in feature selection.[J] Artifical intelligence,2003,151(1-2):155-176.
【17】Quinlan JR,Learning efficient classification produres and theirapplication to chess end games.Machine learnng:An artificial intelligence approach,San francisco,C ,A:Morgan Kaufmann,1983,463-482.
【18】Quinlan J R,C4.5:programs for machine learning.San Francisco:Morgan kaufmann,1993.
【19】Beriman L,Friedman J H,etal.Classification and Regression Trees.Wadsforth international Group,1984.
【20】John G,Kohavi R,Pfleger K.Irrelevant features and the subset selection problem.In:Cohen W W,Hirsh H,Eds.The eleventh international conference on machine learning.San Fransisco:Morgan Kaufmann,1994,121-129.
【21】Aha D W,Bankert R L.Feature selection for case-based classification of cloud types An empirioal comparison.In:Ada D Weds.In Working Notes of the AAAI94 Workshop on case-based reasoning.Menlo Park,CA:AAAI Press,1994,106-112.
【22】Blum A L.Learning boolbean functions in an infinite attribute space.Machine learning.1992,9(4):373-386.
【23】Holland J.Adaptation in Natural and Artifiicial Systems
【24】GloverF.Feature paths for integer programming and links to artificial intelligence.
【25】 Inza I,Larranaga P,Blanco R.Filter versus wrapper gene selection approaches in DNA microarray domains[J]. Artificial Intelligence in Medicine, 2004,31(2):91-103v
【26】 Zhou Xiaobo,Wang Xiaodong,Dougherty E R.Gene selection using logistic regressions based on AIC,BIC and M DI criteria[J]. Journal of New Mathematics and Natural Computation,2005,1(1):129-145.
【27】 Tabus I,Astola J.On the use of MDI principle in gene expression prediction[J]. EURASIP Journal of Applied Signal Processing,2001,4:297-303.
【28】一种高效的面向轻量级入侵检测系统的特征选择算法。
【29】庞遗传算法和人工神经网络的分析和改进。
【30】 Xiong Momiao,Fang Xiang-zhong,Zhao Jin-ying.Biomarkeri dentification by feature wrappers[J].GenomeResearch,2001,11(11):1878-1887.
【31】 Hsu W H.Genetic wrappers for feature selection in decision trein duction and variable ordering in bayesian network Structure learning[J]. Information Sciences,2004,163(1/2/3):103-122.
【32】 Li I, Weinberg C R, Darden TA. Gene selection for sample
classification based on gene expression data:study of ensitivity to choice of parameters of the GA/KNN method[J].Bioinformatics,2001,17(12):l131-1142.
【33】 Shima K,Todoriki M,Suzuki A. SVM-Based feature selection of latent semantic features[J]. Pattern Recognition Letters,2004,25(9):1051-1057.
【34】Study n feature selection and ensemble learning Based on feature selection for High-Dimensional Datasets.
【35】 Hu Xiao-Hua, Cercone N.Learning inrelational databases:A rough set approach. International Journal of Computational Intelligence, 1995, 11(2): 323-338
【36】 LiuShao- Hui,ShengQiu-Jian,WuBin, ShiZhong-Zhi, HuFei.Research on efficient algorithms for Rough set methods.Chinese Journal of Computers, 2003, 26 (5): 524-529 (in Chinese)
【37】 Guan Ji-Wen, Bell David A, Guan Z. Matrix computation for informat I systems.Information Sciences,2001,131:129-156
【38】 Krysz kiewicz M. Rough set approach to incomplete information systems. Information Sciences,1998,112:39-49
【39】Slow in skir R,Vsnderprooten D.Ageneralized definition of rough approximations based on similarity. IEEE Transactions on Data and Knowledge Engineering, 2000,12(2) :
【40】Leung Yee, Wu Wei-Zhi, Zhang Wen-Xiu. Knowledge acquisition in incomplete information systems: A rough set approach. European Journal of Operational Research, 2006(68): 164- 183[ 22] Sun Hui-Qin, Zhang Xiong, Finding minimal reducts from incomplete information systems
【41】 Liang Ji-Ye, Xu Zong-Ben. The algorithm on knowledge reduction in incomplete information systems. International Journal of Uncertainty Fuzziness and Knowledge-Based Systems, 2002,10(1):95-103
【42】 Huang Bing, Zhou Xian-Zhong, Zhang Rong-Rong. At tribute reduction based on information quantity under incomplete information systems. Systems Engineering-Theory and Practice, 2005,4(4):55-60(in Chinese)
【43】 Meng Zu-Qiang, Shi Zhong-Zhi. A fast approach to attribute reduction in incomplete decision systems with tolerance relation-based rough set s.Information Sciences,2009,179:2774-2793
【44】Liang Ji-Ye, Qian Yu-Hua, Chu Cheng- Yuan,LiDe-Yu,Wang Jun-H ong. Rough set approximation based on dynamic granulation, Lecture Notes in Artificial Intelligence 3641,2005:701-708
【45】 Qian Yu-Hua, Liang Ji-Ye, Dang Chuang-Yin. Convers approximation and rule extraction from decision tables in rough set theory. Computers an d Mathem at ics with Applicati on s,2008, 55: 1754-1765
【46】 Qian Yu-H ua, Liang Ji-Ye. Positive approximation and ruleext racting in incomplete information systems. International Journal of Computer Science and Knowledge Engineering,2008,2(1):51-63
【47】Stewart M Yang ,Xiao bin Wu,Zhi hong Deng,etal.Modification of feature selection methods using relative term frequency。
【48】Feature selection m ethod for text based on linear comb ination
Q IU Yun..fe,i WANG Jian..kun, LI Xue, SHAO Liang..shan
【49】GAO Mao-ting,WangZheng -ou.New model for text feature selection based on twin words relationship.Computer Engineering and Applications,2007,43(10):183- 185.
【50】JIANG Sheng-yi,WANG Lian-xi.Feature selection based on feature similarity measure.Computer Engineering and Applications,2010,46(20):153-156.
【51】LIU Hai-yan, WANG Chao, NIU Jun-yu。Improved Feature Selection Algorithm
Based on Conditional Mutual Information(School of Computer Science, Fudan University, Shanghai 201203, China)
【52】Kenneth Ward Church. Patrick Hanks Words accociation norms mutual information and lexicography
【53】Guyon 1 Weston J.Barnhil S .Vapnik V. Gene Selction for cancclassfication using support vector machine.
【54】Freature Selection Algorithm -based approximate markov blanket and dynamic mutual information Yao Xu Wang,Xiao-dan,Zhang yu-xi,Quan wen.
【55】An Approximate Markov Blanket Feature Selection AlgorithmCUI Zi-Feng, XU Bao..Wen1, ZHANG Wei Feng, XU Jun Ling
【56】 Yao Xu,Wang Xiao-dan,Zhang Yu-xi,Quan, Wen(Missile Institute,Air Force Engineering University, Sanyuan 713800,China)
【57】 Yang Yiming, Pederson J O. A Comparative Study on Feature Selection in Text Categorization [ A]. Proceedings of the 14th International Conference on Machine learning[ C]. Nashville:Morgan Kaufmann,1997:412- 420.
【58】Study on Feature Selection in Chinese Text CategorizationZHOU Qian, ZHAO Ming..sheng, HU min
【59】Xu Y, Li JT, Wang B, Sun CM. A category resolve power-based feature selection method. Journal of Software,2008,19(1):82.89.
【60】ZHANG Yu-fang,WANG Yong,XIONG Zhong-yang,LIU Ming(College of Computer,Chongqing University,Chongqing 400044,China)
【61】Xu Yanl”,Li Jinta01,Wang Binl,Sun Chunmin91一,and Zhang Senl
1(Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100080)2(North China Electric Power University,Beijing 102206)
【62】基于同义词词林的文本特征选择方法 郑艳红,张东站
【63】G Forrnan.An extensive empirical study of feature selectionmetrics for text classification.Journal of Machine Learning Research,2003,3(1):1289—1305
【64】JI Jun-zhong1,WU Jin-yuan1,WU Chen-sheng2,DU Fang-hua1 1. Beijing Municipal Key Laboratory of Multimedia and Intelligent Software Technology,College of Computer Science and Technology,Beijing University of Technology,Beijing 100124,China;2. Beijing Institute Science and Technology Information,Beijing 100048,China)
【65】Xu Hong-guo , WANG Su-ge( School of Mathematical Science , Shanxi University, Taiyuan 030006, China)
【66】Wu Di①② Zhang Ya-ping① Yin Fu-liang①LiMing ②①(Department of computer science and Engineering, Dalian university of technology, Dalian 116024, China)
【67】Jin g Hong-fang ,Wang Bin , Yangya-hui, Institute ofeomputing ehnolo, chinese Aeadmyo f seiencees ,Beijin g, 1 0 0-9 0
【68】基于论域划分的无监督文本特征选择方法 颢东吴怀广( 郑州轻工业学院计算机与通信工程学院,郑州450002)
【69】 Gheyas I A,Smith L S. Feature subset selection in large dimensionality domains. Pattern Recognition,2010; 43(1): 5—13
【70】 朱颢东,李红婵,钟勇. 新颖的无监督特征选择方法. 电子科技大学学报,2010; 39( 3) : 412—415
【71】 An unsupervised feature selection approach based on mutual information.Xu特征平等:
【72】Leonardis A,Bischof H.Robust recognition using eigenimages.Computere Vision and Ima Understanding.2000,78(1):99-118.