preface

  • Recently, the company had a requirement for data profiling of users.
  • The company’s big data team made a training model PKL file package using Python through machine learning after analyzing online user data.
  • My department is required to analyze and calculate user data. The projects of my department are all developed using Java, so we need Java to call the PKL training model package.
  • After investigation, python PKL training model package cannot be directly called by Java, and cross-platform calls need to use PMML format files. Therefore, we asked the big data department to package a PMML file in accordance with the generated training model PKL file.

PMML

<? The XML version = "1.0" encoding = "utf-8" standalone = "yes"? > <PMML XMLNS ="http://www.dmg.org/PMML-4_3" XMLNS :data="http://jpmml.org/jpmml-model/InlineTable" version="4.3"> <Header> <Application name=" jpmML-sklearn "version="1.6.27"/> <Timestamp> 2021-08-30T06:48:45z </Timestamp> </Header> <DataDictionary> <DataField name="y" optype="categorical" dataType="integer"> <Value value="0"/> <Value value="1"/> </DataField> <DataField name="x1" optype="continuous" dataType="double"/> <DataField name="x2" optype="continuous" dataType="double"/> <DataField name="x3" optype="continuous" dataType="double"/> </DataDictionary> <RegressionModel functionName="classification" algorithmName="sklearn.linear_model._logistic.LogisticRegression" normalizationMethod="logit"> <MiningSchema> <MiningField name="y" usageType="target"/> <MiningField name="x1"/> RegressionTable Intercept =" RegressionTable intercept= 0.5920457931585216" TargetCategory ="1"> <NumericPredictor name="x1" coefficient=" 0.758677834214665" Coefficient =" RegressionTable "/> <NumericPredictor name=" RegressionTable "/> RegressionTable intercept="0.0" targetCategory="0"/> </RegressionModel> </PMML>Copy the code

Java invokes PMML files

  • The Maven package that parses PMML is first referenced in the project
<dependency> <groupId> org.jpmML </artifactId> PMML-evaluator </artifactId> <version>1.4.1</version> </dependency> < the dependency > < groupId > org. JPMML < / groupId > < artifactId > PMML evaluator - the extension < / artifactId > < version > 1.4.1 < / version > </dependency>Copy the code
  • Java calling method
  • When you have the test. PMML file, you can put it in the Resources directory of the SpringBoot project and use the ClassPathResource class to get the file stream
/** * @Author: ZRH * @Date: 2021/8/30 9:17 */ @Slf4j public final class ClassificationModelOld { private static Evaluator modelEvaluator; static { PMML pmml; try { Resource resource = new ClassPathResource("test.pmml"); InputStream is = resource.getInputStream(); pmml = PMMLUtil.unmarshal(is); try { is.close(); } catch (IOException e) { log.info("InputStream close error!" ); } ModelEvaluatorFactory modelEvaluatorFactory = ModelEvaluatorFactory.newInstance(); modelEvaluator = modelEvaluatorFactory.newModelEvaluator(pmml); modelEvaluator.verify(); Log.info (" Model loaded successfully!" ); } catch (Exception e) { e.printStackTrace(); }} /** * privatize constructor, */ private ClassificationModelOld () {} ** ** Public static List<String> getFeatureNames () { List<String> featureNames = new ArrayList<>(); List<InputField> inputFields = modelEvaluator.getInputFields(); for (InputField inputField : inputFields) { featureNames.add(inputField.getName().toString()); } return featureNames; } public static String getTargetName () {return;} public static String getTargetName () {return modelEvaluator.getTargetFields().get(0).getName().toString(); } / generated using model probability distribution * * * * * @ param the arguments * @ return * / private static ProbabilityDistribution getProbabilityDistribution (Map<FieldName, ? > arguments) { Map<FieldName, ? > evaluateResult = modelEvaluator.evaluate(arguments); FieldName fieldName = FieldName.create(getTargetName()); return (ProbabilityDistribution) evaluateResult.get(fieldName); Public static ValueMap<String, Number> predictProba (Map<FieldName, Number> arguments) { ProbabilityDistribution probabilityDistribution = getProbabilityDistribution(arguments); return probabilityDistribution.getValues(); } public static Object predict (Map<FieldName,? > arguments) { ProbabilityDistribution probabilityDistribution = getProbabilityDistribution(arguments); return probabilityDistribution.getPrediction(); } private static Integer setScore (float probability) { int score = 0; // TODO calculates score = 520; // TODO calculates score = 520; } catch (Exception e) { } return score; } public static void main (String[] args) { {{" value ":" x1 "} : 0.216918810277242, {" value ":" x2 "} : 0.0583184157700168, {" value ":" x3 "} : 0.653728631926331} final ArrayList<Double> Doubles = Lists. NewArrayList (-0.216918810277242, 0.0583184157700168, -0.653728631926331); Map<FieldName, Number> waitPreSample = new HashMap<>(8); waitPreSample.put(FieldName.create("x1"), doubles.get(0)); waitPreSample.put(FieldName.create("x2"), doubles.get(1)); waitPreSample.put(FieldName.create("x3"), doubles.get(2)); final ValueMap<String, Number> values = ClassificationModelOld.predictProba(waitPreSample); System.out.println(" + setScore(values.get("1").floatValue())); }} --------------------- Result: The model is loaded successfully! The machine algorithm calculates the score value result: 520Copy the code

Version of the problem

  • The above example uses an older version of the package, and the PMML file is also 4.3
  • So if you are using version 4.4 PMML files

  • Then you need to update the package introduced by Maven
< the dependency > < groupId > org. JPMML < / groupId > < artifactId > PMML - evaluator < / artifactId > < version > 1.5.11 < / version > </dependency> <dependency> <groupId>org.jpmml</groupId> <artifactId>pmml-evaluator-extension</artifactId> The < version > 1.5.11 < / version > < / dependency >Copy the code
  • The loading method needs to be updated when the model is loaded
static { PMML pmml; try { Resource resource = new ClassPathResource("test.pmml"); InputStream is = resource.getInputStream(); pmml = PMMLUtil.unmarshal(is); try { is.close(); } catch (IOException e) { log.info("InputStream close error!" ); } ModelEvaluatorBuilder modelEvaluatorBuilder = new ModelEvaluatorBuilder(pmml); ModelEvaluatorFactory modelEvaluatorFactory = ModelEvaluatorFactory.newInstance(); modelEvaluatorBuilder.setModelEvaluatorFactory(modelEvaluatorFactory); modelEvaluator = modelEvaluatorBuilder.build(); modelEvaluator.verify(); Log.info (" Model loaded successfully!" ); } catch (Exception e) { e.printStackTrace(); }}Copy the code
  • In this way, the 4.4 PMML training model file can also be executed to obtain results

The last

  • Learn with an open mind and make progress together