Public account: You and the cabin by: Peter Editor: Peter
Hi, I’m Peter
Sanketo told you those stories about the workers
This article introduces a relatively rare visualization using Plotly: the Sankey graph, which is a great tool for showing the flow of data.
Although sankey chart is not used as frequently as bar chart and pie chart, I still like it very much.
The first time you are exposed to Sankey carts are made using Pyehcarts (we will show you this later). This article will show you how to implement this using Plotly.
A brief introduction of Sankey diagram
1.1 What is a Sankey diagram
Sankey diagram, namely Sankey energy distribution diagram, is also called Sankey energy balance diagram. It describes the flow from one set of values to another, and is a specific type of flow diagram. Sankey, in fact, was a full name: Matthew Henry Phineas Riall Sankey, an Irish-born engineer and captain in the Royal Army Engineers.
In 1898, he used this graph to represent the energy efficiency of the steam engine. In an article on the energy efficiency of the steam engine in the Proceedings of the Society of Civil Engineers, he first introduced the first energy flow diagram, which was named after the Sankey diagram.
Charles Minard’s Map of Napolean’s Russian Campaign of 1812, drawn in 1869, is a flowchart for overlaying sankey maps on a Map. The graph shows the strength of napoleon’s army as it attacks and retreats:
1.2 Characteristics of Sankey diagram
The main characteristics of Sankey diagram:
- The initial and end flows are the same, and the sum of all main branch widths and branch widths is equal, preserving the conservation of energy
- Inside the Sankey diagram, different lines represent different flow distribution, and different widths of nodes represent the flow size in a specific state
Sankey diagram consists of three elements: node, flow and edge
Sankey diagram is often used for visualization data analysis in energy, material composition, finance and other fields. At the end of this article, a real life example will be presented to illustrate the use of sankey diagrams.
Consider another example of a Sankey diagram: the economic situation of a country or region
2. Basic Sankey diagram
The following example shows the basic Sankey diagram based on the plotly. Graph_objects implementation:
import pandas as pd
import numpy as np
import plotly_express as px
import plotly.graph_objects as go
# construct data
label = ["Node 0"."Node 1"."Node 2"."Node 3".4 "node".5 "node"]
# source and target are the index values for the corresponding elements in label, and python lists start at 0
source = [0.0.0.1.1.0] # can be seen as a parent node
target = [2.3.5.4.5.4] # child nodes
value = [9.3.6.2.7.8] # value is the value connecting source and target
Generate dictionary data for drawing
link = dict(source = source, target = target, value = value)
node = dict(label = label, pad=200, thickness=20) # Node data, interval and thickness Settings
# Add drawing data
data = go.Sankey(link = link, node=node)
# Draw and display
fig = go.Figure(data)
fig.show()
Copy the code
To explain the above drawing code, we need to prepare the following data:
- Label: indicates the name of each node
- Soure: Parent node, which in Plotly is represented by the index of the node, starting from 0 in Python
- Target: indicates the child node of the data flow
- Value: connects the parent node to the child node
Another way to write it is:
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 200,
thickness = 20,
line = dict(color = "black", width = 0.1),
label = ["Node 0"."Node 1"."Node 2"."Node 3".4 "node".5 "node"],
color = "blue"
),
link = dict(
source = [0.0.0.1.1.0].# Indicates the index of the corresponding label
target = [2.3.5.4.5.4],
value = [9.3.6.2.7.8]
))])
fig.update_layout(title_text="Plotly plotting sankey plots", font_size=10)
fig.show()
Copy the code
Sankey graph based on JSON file format data
Plotly provides an example of how to draw a Sankey diagram by downloading a JSON file from a given website:
1. Read json files and convert them to Python dictionary data
import urllib, json Import multiple libraries at the same time
url = 'https://raw.githubusercontent.com/plotly/plotly.js/master/test/image/mocks/sankey_energy.json'
response = urllib.request.urlopen(url) Get the JSON file
data = json.loads(response.read()) Convert json files into Python dictionaries
Copy the code
How to export dictionary formatted data to JSON file and beautify the format?
with open("sankey.json"."a",encoding="utf-8") as f:
json.dump(data, # Data to be written
f, # File object
indent=2.# space indent to write multiple lines
sort_keys=True.# order of keys
ensure_ascii=False) # display Chinese
Copy the code
The general format of the beautified file (some screenshots) :
opacity = 0.6 # Transparency Settings
fig = go.Figure(data=[go.Sankey(
valueformat = ".0f",
valuesuffix = "TWh".# node definition
node = dict(
pad = 15.# interval
thickness = 15.The width of the side
line = dict(color = "black", width = 0.5),
label = data['data'] [0] ['node'] ['label'].# Label and color corresponding to data
color = data['data'] [0] ['node'] ['color']),# connect data
link = dict( # Parent node, child node, traffic value, node name, color Settings
source = data['data'] [0] ['link'] ['source'],
target = data['data'] [0] ['link'] ['target'],
value = data['data'] [0] ['link'] ['value'],
label = data['data'] [0] ['link'] ['label'],
color = data['data'] [0] ['link'] ['color')))# Important: HTML tags can be used in headings
fig.update_layout(title_text="Plotly read json file map sankey via the < a href =" https://bost.ocks.org/mike/sankey/ "> Mike Bostock < / a >",
font_size=10)
fig.show()
Copy the code
You can also set the background color of the graph:
import plotly.graph_objects as go
import urllib, json
Read data online and convert it to dictionary format
url = 'https://raw.githubusercontent.com/plotly/plotly.js/master/test/image/mocks/sankey_energy.json'
response = urllib.request.urlopen(url)
data = json.loads(response.read())
# Set image parameters
fig = go.Figure(data=[go.Sankey(
valueformat = ".0f",
valuesuffix = "TWh",
node = dict(
pad = 15,
thickness = 15,
line = dict(color = "black", width = 0.5),
label = data['data'] [0] ['node'] ['label'],
color = data['data'] [0] ['node'] ['color']
),
link = dict(
source = data['data'] [0] ['link'] ['source'],
target = data['data'] [0] ['link'] ['target'],
value = data['data'] [0] ['link'] ['value'],
label = data['data'] [0] ['link'] ['label')))# Set the background color
fig.update_layout(
hovermode = 'x',
title="Sankey diagram drawing _ Change background Color",
font=dict(size = 10, color = 'white'),
plot_bgcolor='green',
paper_bgcolor='black' # Background of the whole image (black part)
)
fig.show()
Copy the code
Four characteristic Sankey diagrams
4.1 Sankey Diagram of User-defined Positions
The Sankey diagram, drawn here, is a self-defined node position by XY:
import plotly.graph_objects as go
fig = go.Figure(go.Sankey(
arrangement = "snap",
node = {
"label": ["Node 0"."Node 1"."Node 2"."Node 3".4 "node".5 "node"].# node name
"x": [0.2.0.1.0.5.0.7.0.3.0.5].# xy to determine the position
"y": [0.6.0.5.0.2.0.4.0.2.0.5].'pad':1}, # interval
link = {
"source": [0.0.1.2.3.4.3.5].# Parent node and flow value
"target": [5.3.4.3.0.2.2.3]."value": [8.12.12.11.11.10.11.12]}))
fig.show()
Copy the code
By looking at the graph, the coordinates of the entire canvas should have the origin at the top left corner, positive on the horizontal axis and positive on the vertical axis.
4.2 Customize node and edge colors
Color_mode and color_link parameters can be used to customize the node and edge colors of mulberry graph:
import plotly.graph_objects as go
Construct node data
label = ["Node 0"."Node 1"."Node 2"."Node 3".4 "node".5 "node"]
source = [0.0.0.1.1.0]
target = [2.3.5.4.5.4]
value = [9.3.6.2.7.8]
# Custom colors
color_node = ['#E8C9B0'.'#48C9B0'.'#A8C9B0'.'#AF7AC5'.'#AF7AC5'.'#AF7AC5']
color_link = ['#A6E3D7'.'#D6E3D7'.'#A6E3D7'.'#CBB4D5'.'#CBB4D5'.'#CBB4D5']
Generate dictionary data for drawing
link = dict(source = source, target = target, value = value, color=color_link)
node = dict(label = label, pad=200, thickness=20, color=color_node) # Node data, interval and thickness Settings
# Add drawing data
data = go.Sankey(link = link, node=node)
# Draw and display
fig = go.Figure(data)
fig.show()
Copy the code
Fifth, Sankey chart _ monthly expenses
The following is to explain how to draw sankey chart in actual data through xiaoming’s total expenditure in a month.
1. First of all, let’s look at the consumption data (virtual data) compiled by Xiaoming.
Xiao Ming’s expenses are mainly divided into five blocks: accommodation, catering, catering, transportation, clothing and red envelopes. Each block is divided into its own sub-blocks and corresponding consumption.
2. Collate the data to show the consumption from the parent level to the child level
Since the drawing of sankey graph requires data between parent and child nodes, we need to first summarize the data as follows:
The graph below is the collated data of the five main blocks:
The graph below is the parent and child data collation corresponding to each child block:
Details: Sanketo tells you stories about workers