“This is the 10th day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”
content
1. Analyze the test data set of WEKA;
2. Use WEKA to realize data mining in the database;
3. Preprocessing the data with weKA preprocessing algorithm, including adding attributes, deleting attributes/instances, and discretization of data.
Steps and Results
Analyze weKA’s own test data set;
First, install weKA
After installation, unpack weka.jar
Check out the Data folder, which contains the data sets that come with WEKA
Weka is used to realize data mining in the database
When mining data in the database, we need to connect WEKA to mysql
First, configure environment variables
% WEKA_HOME % \ lib \ mysql connector – Java – 5.1.47. Jar;
Second, start the database, create a database named WEKA, and create the following table
Third, modify the following configuration files
Modify the following two lines
After setting up, open weKA and enter the Explorer page
Click the following button
The database connection success message is displayed
Query the data in the WEka1 table
The results are as follows
The preprocessing algorithm in WEKA is used to preprocess data, including adding attributes, deleting attributes/instances and discretization of data.
First, load the data
The following page is displayed after the data is loaded
Second, delete the attribute
Click choose
The appropriate filter for removing attributes is Remove, and we find the Remove entry under unsupervised \attribute
And then click apply
Attribute deleted successfully
Third, add attributes
Click the Choose button again, and then weka-filter-unsupervised – attribute-addUserFileds filters in sequence
Set up properties
New attributes are generated after Apply
Add filter AddValues
Click Edit to view
Fourth, delete the instance
<1> Select choose-weka-filter-unsupervised- instance-removefolds, the filter will segment the dataset into a given cross validation folds and specify the output folds. Click the text box next to Choose to pop up the following dialog box
Hitting Apply leaves only two data points
<2>choose-weka-filter-unsupervised-instance-RemovePercentage,
Filter to remove instances of a given percentage of the dataset, click the text box next to Choose,
The following dialog box is displayed,
Only 1 data is left after apply
< 3 > select choose – weka – filter – unsupervised – instance – RemoveRange,
Filter to remove instances of a given range from the dataset, click the text box next to Choose,
The following pops up:
When I hit Apply
Fifth, use WEKA to discretize the data
Locate the glass data set glass.arff file in the Data directory
RI property histogram
Uniform width discretization: Open the choo-weka-filters -unsupervised- attribute-discretize one by one. Leave the default parameters unchanged
Click Apply and the following image appears:
Equifrequency discretization: Set the value of Discretize to true. RI property after constant frequency discretization is obtained as shown in the figure below:
Check the Ba, Fe
Sixth, supervised discretization
First, open the iris data set in the data set, namely the iris.arff file, and the attributes of the data set are as follows
Open iris data set in Weka, as shown in the figure below
Then click on choo-weka-filters-image-attribute-discretize one by one and click Apply to open the visualization window and find the value range of each attribute as follows: