Read and write data
Two functions: xlsread() xlswrite()
xlsread('url',sheet,'A1:B5')
- Url: is the absolute path of the Excel table on the computer/or the relative path of the table to the MATLAB code
- Sheet: 2 indicates that it is located in Sheet2
- ‘A1:B5’ represents the range of data to be read, i.e. the range from column A, row 1 to column B, row 5
The official documentation portal ww2. Mathworks. Cn/help/matlab…
outliers
In correlation and regression analysis, as well as studies such as variance and T test, they will be disturbed by outliers. As long as there are outliers, the data conclusions will be more or less affected. In serious cases, the relationship is distorted, and in less serious cases, it will also affect the results of various indicators, so outliers need to be treated seriously.
For outliers, there are three common steps: the first step is outlier detection; The second step is the outlier determination; The third step is outlier handling.
Outlier monitoring
- Boxplot: often used in experimental studies to visually display anomalous data
- Scatter chart: when studying the relationship between X and Y, it can be visually displayed to check whether there is abnormal data
- Description analysis: It can roughly judge whether the data is abnormal by various indicators such as maximum and minimum values
- Other: such as the combination of normal distribution map, frequency analysis and so on to determine whether there are outliers
Outlier determination
- The missing number
- A number less than the set standard
- A number greater than the set standard
- Greater than 3 standard deviations
Outlier handling
The processing methods of outliers are mainly divided into two categories: direct elimination and filling
- Cull: applies to the case of few outliers
- Fill: average fill, median fill, mode fill, random number fill
The following is an example of an individual working with outliers, using the three sigma criterion to determine outliers, and then separating outliers from normal data
%{abnormal data processing function reads excel data row/column, column is passed in, normal and abnormal output normal and abnormal data respectively %}
%data=xlsread('C:\Users\liuyi\Desktop\start_file1.xlsx'); % Reads raw data
% column = data (1 to 0, 1);
function [normal,abnormal]=exceptionHandle(column)
l=length(column);
average=mean(column);The mean %
standard=sqrt((column'-average)*(column-average)/l);% standard deviation
variance=standard^2;% variance
extreme=max(column)-min(column);% is very poor
top=average+3*standard;% the upper bound
bottom=average- 3*standard;% lower bound
normal=[];
abnormal=[];
i=1;
j=1;
for k=1:l
So this is one dimensional data, if you have a point in the plane or a point in space
% the if condition statement rewritten as' column_1 (k) > = bottom_1 && column_1 (k) < = top_1 && column_2 (k) > = bottom_2 && column_2 (k) < = top_2 '
if column(k)<bottom || column(k)>top
abnormal(i.1)=k;
abnormal(i.2)=column1(k);
i=i+1;
end
if column(k)>=bottom && column(k)<=top
normal(k,1) =j;
normal(k,2)=column1(j);
normal(k,3)=column2(j);
k=k+1;
end
end
end
Copy the code
The law of the People’s Republic of China national standard “www.360doc.com/document/18…
Data transformation
Standardized treatment
Data normalization is scaling the data so that it falls into a small, specific interval. It is often used in some comparison and evaluation index processing, which is used to remove the unit limit of data and convert it into a dimensionless pure value, so that indicators of different units or magnitudes can be compared and weighted.
Several common standardization methods are described below
- Z standardization:
- Normalization :(x-min) /(max-min)
- Interval: Compress data between A and B. Default is 1 and 2, respectively. a+(b-a)*(X-Min)/(Max-Min)
Reverse the
Indicators are generally divided into positive indicators (the bigger the better), reverse indicators (the smaller the better), moderate indicators (can not be too small and can not be too big). In order to carry out comprehensive summary, it is necessary to solve the same direction, and it is generally necessary to forward backward indicators.
More ways spssau.com/helps/datap…
Data visualization
% to start with a link, followed by a sample supplement
Ww2. Mathworks. Cn/help/matlab…
Data dimension reduction
In mathematical modeling, we often encounter the problem of multiple variables, and in most cases, there is a certain correlation between multiple variables. When the number of variables is large and there are complex relationships among variables, the complexity of the analysis problem will be significantly increased. At this time, we can adopt the method of data dimension reduction to synthesize multiple variables into a few representative variables, so that these variables can represent the vast majority of information of the original variables without correlation.
Principal Component analysis (PCA)
Principal component analysis (PCA) is a method of mathematical dimensionality reduction. What it does is to try to recombine many variables that have certain correlation into a group of new unrelated comprehensive variables to replace the original variables.
Here are a few pictures first, have time to put the mathematical formula and code code