2020 Graduate student mathematical modeling problem B - gasoline octane number modeling of each question solution ideas and experimental results show

— — — — — — — — — — — — — — — — — — — — — — 2021.4.11 update — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Thesis Download Address

There are a lot of students looking for my code and paper, but I really can’t give the code, which involves the research direction of several research groups. Now the competition is over for a period of time, but the paper is ok. If you need it, you can pay attention and like it

Thesis Download Address

The problem

Second, the target

Basis from catalytic cracking gasoline refining device to collect 325 data samples (each data sample has 354 operating variables), through the data mining technology to establish the gasoline octane number (RON) loss prediction model, and optimize the operating conditions of each sample is given, in guarantee product gasoline desulfurization effect (Europe and the six criteria are not more than 10 mu g/g, However, in order to leave room for the operation of enterprise equipment, this modeling requires that the sulfur content of the product is not more than 5μg/g), and the octane number loss of gasoline should be reduced to more than 30%.

Three, problem,

Data processing: Please refer to the pre-processing results of industrial data in recent 4 years (see Annex I “325 data samples data.xLSX”). Data samples 285 and 313 were preprocessed according to the “Sample Determination Method” (Annex II) (see Annex III “Original data of Samples 285 and 313. XLSX” for original data) and the processed data were respectively added to the corresponding sample numbers in Annex I for the following study.

Finding the main variables for modeling: Because the FCC gasoline refining process is continuous, although the operational variables are sampled every 3 minutes, the octane number (dependent variable) measurement is more troublesome, only twice a week cannot correspond. However, according to the actual situation, it can be considered that the measured octane number is the comprehensive effect of the operational variables within two hours before the measurement time, so the mean value of the operational variables within two hours in the pretreatment corresponds to the measured octane number. This resulted in 325 samples (see annex I).

Lower octane loss model relating to the property of raw materials, 2 to 7 raw adsorbent properties, two regeneration of adsorbent properties, product properties and another 354 operation variables (a total of 367 variables), engineering applications often use the dimension reduction before modeling method, which is beneficial to ignore the secondary factors, Discover and analyze the main variables and factors that affect the model. Therefore, please you according to the 325 samples of data (see annex 1), from 367 operating variables by using the method of dimension reduction, modeling the main variables, make it as representative and independence (convenient for engineering application, it is recommended that the dimension reduction after the main variables under 30), and please elaborate on modeling the screening process and the rationality of the main variables. (Tip: Consider the octane number of the feedstock as one of the modeling variables). 3. Establishment of octane number (RON) loss prediction model: Based on the above samples and the main modeling variables, the octane number (RON) loss prediction model was established by data mining technology, and the model was verified. 4. Optimization of operation scheme of main variables: It is required to obtain 325 data samples (see Annex IV “325 data samples data.xLSX”) using your model under the premise that the sulfur content of the product is not greater than 5μg/g. The optimized operating conditions of the main variables corresponding to samples with a drop in octane number (RON) of more than 30% (the properties of raw materials, waiting adsorbents and regenerated adsorbents remain unchanged during the optimization process, based on their data in the sample). 5. Visual display of the model: In order to stabilize the production of industrial equipment, the optimized main operational variables (i.e. The main variable in question 2) is often only a gradual adjustment in place, please you to sample no. 133 (raw materials, to be born adsorbent, and the properties of the regenerated adsorbent data remains the same, will be subject to sample the data in the), graphically display the major operating variables optimization in the process of adjusting the corresponding changes in the content of sulfur of gasoline octane number and track. (See Annex IV “information of 354 operational variables. XLSX” for the allowable adjustment amplitude δ of each major operational variable).

Ideas:

Build two decision trees according to the rules given in his second schedule. Why two? Rules 1, 2 and 3 are for features, while rules 4 and 5 are for samples. And then you take the average

To find the main variables, considering the traditional PCA low-dimensional features of this kind of feature extraction approach is other high-dimensional feature fusion, it is difficult to have a specific physical interpretation and meaning, not do the fourth and fifth asked, so I was using a feature selection method for data dimension reduction, is a hybrid feature selection method. The result of this question is 23 features
Simple linear model should not work, so I borrowed the idea of “extreme learning machine” to establish a nonlinear model, using extreme learning machine to find the nonlinear relationship
The multi-objective problem is transformed into the problem of solving Pareto frontier solution, which is solved by nsGA-II with elite strategy
After solving the optimization scheme of the corresponding operational variables of each sample in question 4, the corresponding octane number and sulfur content were predicted by gradually adjusting the operational variables. Draw the adjusted octane number and the change in sulfur content