Today we will continue Python network programming, learning a more concise and powerful language than JSON ————YAML. In this article, Hu briefly introduces the syntax and usage of YAML, as well as the application examples of YAML in machine learning projects. Welcome everyone to study together, also welcome to like, look at, share!

I started learning Python network programming

YAML

YAML is a recursive abbreviation of “YAML Ain’t a Markup Language” (YAML is not a Markup Language). YAML has a similar syntax to other high-level languages and can easily express data forms such as lists, hash tables, and scalars. Its use of whitespace indentation and a number of look-and-feel features makes it particularly suitable for expressing or editing data structures, various configuration files, debugging content, and file Outlines. The configuration file suffix of YAML is.yaml

The basic syntax rules of YAML are as follows:

  • Case sensitivity
  • Use indentation to indicate hierarchy
  • The Tab key is not allowed for indentation. Only Spaces are allowed.
  • The number of Spaces indented does not matter, as long as elements of the same rank are aligned to the left
  • The # sign indicates a comment

YAML supports three types of data structures:

  • Object: a collection of key-value pairs that use the colon structure to represent key: value followed by a space.
  • Array: AN ordered set of values, also called a sequence/list, denoted by -.
  • Scalar quantities (scalars) : single, non-separable values

YAML usage

The installation

pip install pyyaml
Copy the code

Yaml file formats are simple, such as:

# categories.yaml file sports: Soccer # array - football - basketball-cricket - hockey-table tennis countries: soccer # array - football - basketball-cricket - hockey-table tennis countries: - Pakistan - USA - India - China - Germany - France - SpainCopy the code

Python reads yamL files

# read_categories.py file

import yaml

with open(r'categories.yaml') as file:
    documents = yaml.full_load(file)

    for item, doc in documents.items():
        print(item, ":", doc)
Copy the code

Running results:

sports : ['soccer', 'football', 'basketball', 'cricket', 'hockey', 'table tennis']
countries : ['Pakistan', 'USA', 'India', 'China', 'Germany', 'France', 'Spain']
Copy the code

This is the most basic application of YAML. If you are still confused, let’s take a closer look at how to write a YAML configuration file in a machine learning project.

YAML & Machine Learning

We direct rewriting 100 days of machine learning | Day62 random forests and combat the code.

Write the configuration file rf_config.yaml

#INITIAL SETTINGS
data_directory: ./data/
data_name: creditcard.csv
target_name: Class
test_size: 0.3
model_directory: ./models/
model_name: RF_classifier.pkl


#RF parameters
n_estimators: 50
max_depth: 6
min_samples_split: 5
oob_score: True
random_state: 666
n_jobs: 2
Copy the code

Complete code, can compare the source code to see the difference:

# rf_with_yaml_file.py import os import yaml import joblib import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.metrics  import roc_auc_score CONFIG_PATH = "./config/" def load_config(config_name): with open(os.path.join(CONFIG_PATH, config_name)) as file: config = yaml.safe_load(file) return config config = load_config("rf_config.yaml") df = pd.read_csv(os.path.join(config["data_directory"], config["data_name"])) data = df.iloc[:, 1:31] X = data.loc[:, data.columns != config["target_name"]] y = data.loc[:, data.columns == config["target_name"]] number_records_fraud = len(data[data.Class == 1]) fraud_indices = np.array(data[data.Class == 1].index) normal_indices = data[data.Class == 0].index random_normal_indices = np.random.choice( normal_indices, number_records_fraud, replace=False) random_normal_indices = np.array(random_normal_indices) under_sample_indices = np.concatenate( [fraud_indices, random_normal_indices]) under_sample_data = data.iloc[under_sample_indices, :] X_undersample = under_sample_data.loc[:, under_sample_data.columns != config["target_name"]] y_undersample = under_sample_data.loc[:, under_sample_data.columns == config["target_name"]] X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=config["test_size"], random_state=42 ) rf1 = RandomForestClassifier( n_estimators=config["n_estimators"], max_depth=config["max_depth"], min_samples_split=config["min_samples_split"], oob_score=config["oob_score"], random_state=config["random_state"], n_jobs=config["n_jobs"] ) rf1.fit(X_train, y_train) print(rf1.oob_score_) y_predprob1 = rf1.predict_proba(X_test)[:, 1] print("AUC Score (Train): %f" % roc_auc_score(y_test, y_predprob1)) joblib.dump(rf1, os.path.join(config["model_directory"], config["model_name"]))Copy the code

reference

www.runoob.com/w3cnote/yam… www.ruanyifeng.com/blog/2016/0…