PyCon2018: two new ML data visualization libraries: Altair and Yellowbrick

Author: David 9

The original article was posted in the author’s personal blog, click to see the original article, nuggets has been reprinted authorized. Thanks again to the author.

PyCon2018 has two new ML data visualization libraries: Altair and Yellowbrick, the functional programming visualization library and the scikit-learn enhanced visualization library

Data science visualization libraries, like deep learning framework libraries, emerge endlessly, but can be roughly divided into two types:

One is a generic visualization library that can be used to plot any static data similar to A JSON schema: Pandas, Seaborn, ggplot, Bokeh, Pygal, Plotly.

The other is a visual library that is highly coupled to the framework, such as TensorBoard from TensorFlow and Yellowbrick, the SciKit-Learn enhanced visual library.

For the first general-purpose library, the trend toward convenience, simplicity and ease of use has not changed. Talk at PyCon2018: Exploratory Data Visualization with Vega, Vega-Lite, and Altair Add a line of declaration code to visualize:

import altair as alt# to use with Jupyter notebook (not JupyterLab) run the following# alt.renderers.enable('notebook')# load a simple Dataset as a panda DataFramefrom vega_datasets import datacars = data.cars() alt.Chart(cars).mark_point().encode( x='Horsepower', y='Miles_per_Gallon', color='Origin',)Copy the code

Altair routines

To change the point style to the line style, simply change the function mark_point() to mark_line() :

alt.Chart(cars).mark_line().encode(    x='Horsepower',    y='Miles_per_Gallon',    color='Origin'.)Copy the code

Notice here that no matter how many features you have in your CAR data set, what features you need for visualization can be declared in an encode function. Of course, the Altair API has many other advantages,Many examples of jupyterNOTEBOOK can be tried first.

For Yellowbrick, a visualization library with a high coupling to SciKit-Learn, the training process has even been integrated into the visualization process:

from sklearn.linear_model import LogisticRegressionfrom yellowbrick.classifier import ROCAUCLogistic = LogisticRegression()visualizer = ROCAUC(logistic)visualizer. Fit (X_train, The # Visualizer object is an extension of the estimater class, Visualizer. Score (X_test, y_test) # Score g = Visualizer. Poof () # to get ROCAUC analysis diagramCopy the code

As the above code, the analysis graph will be output immediately after the logistic regression model training is completed:

From: http://www.scikit-yb.org/en/latest/api/classifier/rocauc.html

As with PCA analysis, the visualization and training code are coupled:

from yellowbrick.features.pca import PCADecompositionvisualizer = PCADecomposition(scale=True, center=False, color=y)visualizer.fit_transform(X,y)visualizer.poof()Copy the code

The above code directly implements a two-dimensional PCA visualization:

References:

www.scikit-yb.org/en/latest/
Github.com/altair-viz/…

In this paperAttribution — Non-commercial — No Deductive 3.0 Mainland China License agreementGrant permission. Copyright belongs to “David 9’s blog” original, if you need to reprint, please contact wechat: David9ML, or email: [email protected]

Or scan the QR code directly:

PyCon2018: two new ML data visualization libraries: Altair and Yellowbrick

Related Posts

Reply to send grand prize “with AI”

Case study of Hadoop pseudo-distribution pattern

Computational vision — content-based image scaling