Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.
preface
Hello! Friend!!!
Thank you very much for reading haihong’s article, if there are any mistakes in the article, please point out ~
Self-introduction ଘ(੭, ᵕ)੭
Nickname: Haihong
Tag: programmer monkey | C++ contestant | student
Introduction: because of C language to get acquainted with programming, then transferred to the computer major, had the honor to get some national awards, provincial awards… Has been confirmed. Currently learning C++/Linux/Python
Learning experience: solid foundation + more notes + more code + more thinking + learn English well!
Little White stage of learning Python
The article is only used as my own study notes for the establishment of knowledge system and review
The problem is not to learn one more question to understand one more question
Know what is, know why!
Graph theory model – Dijkstra
Dijkstra can find the shortest path from one vertex to another.
Basic concept
- Undirected graph: a graph is undirected if every edge in it is directionless.
- Directed graph: a graph is directed if every edge is directed.
- Mixed graph: a graph in which some edges are oriented and some edges are unoriented is called mixed graph.
Sample 1
As shown in the figure below, we need to go from ① to ⑨. The red characters on each edge represent the length of the edge. How do we find the shortest path from ① to ⑨?
Steps:
- Mark ① as P and others as T to find the point from ① to which the shortest edge currently reaches and change the point T to P
- Find the shortest edge of all P points on the reachable T mark points, and change the reached point T to P
- Repeat the steps to change T to P at the guiding end point
Process demonstration: the circle plus the number represents each vertex, the number inside () represents the current walking distance
1. (1) (0) 2. (1) (2) (1) 3. (1) (2) (5) and (3) 4. (1) (2) (5) and (3) (1) (4) and (3) 5. (1) (2) (5) 7 (5) (1) (4) and (3) 6. (1) (2) (5) 7) 6 (6) (1) (4) and (3) 7. (1) (2) (5) 7) 6 (6) (1) (2) (5) (3) (6) (1) (4) and (3) 8. (1) (2) (5) 7) 6 (6) (1) (2) (5) (3) (4) (7) (1) (4) and (3) 9. (1) (2) (5) 7) 6 (8) (1) (2) (5) (3) (4) (7) (1) (4) and (3) 10. (1) (2) (5) 7) 6 (8) (1) (2) (5) all landowners pet-name ruby (8) (1) (2) (5) (3) (4) (7) (1) (4) and (3)
So the shortest path is ① ② ⑤ ⑦ ⑨ with length 8
Weighted adjacency matrix: A weighted adjacency matrix is a matrix that represents the adjacency relationship between vertices. For example, the weighted adjacency matrix of the graph above is shown as follows
Note: there is an error in the original PPT, point 9 -> point 7 should be INF.
Each point is zero away from itself (the main diagonal is zero)
If two adjacent points on the graph are connected, the distance is the value of the matrix, undirected symmetric about the main diagonal; The only directed paths have numbers
The unconnectable distance is infinity, inf
The Demo code
Runtime environment: Vs Code
# There are some mistakes in the code of PPT and it has been modified ~
# The following code has been tested to work
from collections import defaultdict
from heapq import *
inf = 99999 # disconnected value
mtx_graph = [[0.1, inf, 3, inf, inf, inf, inf, inf],
[1.0.5, inf, 2, inf, inf, inf, inf],
[inf, inf, 0.1, inf, 6, inf, inf, inf],
[inf, inf, inf, 0, inf, 7, inf, 9, inf],
[inf, 2.3, inf, 0.4.2, inf, 8],
[inf, inf, 6.7, inf, 0, inf, 2, inf],
[inf, inf, inf, inf, inf, 1.0, inf, 3],
[inf, inf, inf, inf, inf, inf, 1.0.2],
[inf, inf, inf, inf, 8, inf, inf, 2.0]]
m_n = len(mtx_graph)# Order of weighted connection matrix
edges = [] # save the distance between two connected points (point A, point B, distance)
for i in range(m_n):
for j in range(m_n):
ifi! =jandmtx_graph[i][j]! =inf: edges.append((i,j,mtx_graph[i][j]))def dijkstra(edges, from_node, to_node) :
go_path = []
to_node = to_node-1
g = defaultdict(list)
for l,r,c in edges:
# L: point A, r: point B, c: distance
Point A -> (distance, point B)
g[l].append((c,r))
q, seen = [(0, from_node-1, ()),set(a)while q:
(cost, v1, path) = heappop(q)Minimum cost of heap eject current path
if v1 not in seen:
seen.add(v1)
path = (v1, path)
if v1 == to_node:
break
for c, v2 in g.get(v1, ()):
if v2 not in seen:
heappush(q, (cost+c, v2, path))
ifv1! =to_node:# unreachable
return float('inf'), []
if len(path)>0:
left=path[0]
go_path.append(left)
right=path[1]
while len(right)>0:
left=right[0]
go_path.append(left)
right=right[1]
go_path.reverse() # Reverse transformation
for i in range(len(go_path)): # label plus 1
go_path[i]=go_path[i]+1
return cost, go_path
leght, path = dijkstra(edges, 1.9)
print('The shortest distance is:'+str(leght))
print('The path forward is:'+str(path))
Copy the code
The results
The shortest distance is:8The forward path is: [1.2.5.7.9]
Copy the code
Graph theory model -Floyd
Floyd algorithm can solve the shortest path problem (multi-source shortest path) between any two points through dynamic programming, and can correctly deal with the shortest path problem with negative weight
Key principles:
d[i][j]=min(d[i][k]+d[k][j])(1<=k<=n)d[i][j] = min(d[i][k] + d[k][j]) (1 <= k <= n)d[i][j]=min(d[i][k]+d[k][j])(1<=k<=n) Enumeration intermediate point k, find the smallest d + d [I] [k] [k] [I] [k] [j] d + d [k] [I] [k] [j] d + d [k] [j], [I] [j] as a d d [I] [j] d minimum [I] [j]
Key conclusions:
Let’s say that in the shortest path between I and j, the node with the largest number is x. Then d[I][j]d[I][j] [j]d[I][j] must get the minimum value when k = x in the outer loop.
Strong induction:
Let the largest intermediate number from I to x be x1, and the largest intermediate number from x to j be x2. Since x is the largest intermediate number from I to j, it is obvious that x1
According to the conclusion that when k = x1k = x_1k = x1 d [I] [I] [x] [x] d d [I] [x] has made the minimum, when k = x2k = x_2k = x2 d [x] [x] [j] [j] d d [x] [j] has made the most of little value
So k = xk = xk = x, d [I] [I] [x] [x] d d [I] [x], and [x] [j] d d d [x] [j] [x] [j] have to obtain the minimum value.
So when k=xk= xk=x, Execute d [I] [j] = min (d [I] [j], [I] [k] d + d [k] [j]) have d [I] [I] [j] [j] d = min (d [I] [j], [I] [k] d + d [k] [j]) To d [I] [j] [I] [j] d = min (d [I] [j], [I] [k] d + d [k] [j]) have d [I] [j].
The sample 2
Similar to example 1, except this time we find the shortest distance and path from each point to the other, and the red number on each side represents the length of that side.
The Demo code
from collections import defaultdict
from heapq import *
import numpy as np
inf = 99999 # disconnected value
mtx_graph = [[0.1, inf, 3, inf, inf, inf, inf, inf],
[1.0.5, inf, 2, inf, inf, inf, inf],
[inf, inf, 0.1, inf, 6, inf, inf, inf],
[inf, inf, inf, 0, inf, 7, inf, 9, inf],
[inf, 2.3, inf, 0.4.2, inf, 8],
[inf, inf, 6.7, inf, 0, inf, 2, inf],
[inf, inf, inf, inf, inf, 1.0, inf, 3],
[inf, inf, inf, inf, inf, inf, 1.0.2],
[inf, inf, inf, inf, 8, inf, inf, 2.0]]
def Floyd(graph) :
N=len(graph)
A=np.array(graph)
path=np.zeros((N,N))
for i in range(0,N):
for j in range(0,N):
ifA[i][j]! =inf: path[i][j]=jfor k in range(0,N):
for i in range(0,N):
for j in range(0,N):
A[I][j]+A[k][j]
A[I][k]+A[k][j]
if A[i][k]+A[k][j]<A[i][j]:
A[i][j]=A[i][k]+A[k][j]
path[i][j]=path[i][k]
for i in range(0,N):
for j in range(0,N):
path[i][j]=path[i][j]+1
print('Distance =')
print(A)
print('path =')
print(path)
Floyd(mtx_graph)
Copy the code
The results
5. 8. 8. 8. 8. 8. 9. Because the original mtx_graph setup was wrong: point 9 to point 7 should be INF instead of 3
Conclusion:
- The distance from point 1 to point 9 is 8 (the first row, ninth column in the distance matrix in the result)
- Path: [1-2-5-7-9]
How do we look at paths?
- View the path matrix from the final result
- Point 1 to point 9 first of all, row 1, column 9 has a value of 2
- This is 2 in the middle, and then row 2, column 9, is 5
- Now row 5, column 9, has a value of 7
- And then row 7, column 9, has a value of 9
- Get the final path [1-2-5-7-9]
3 airport route design
The data set comes from the aviation industry and has some basic information about the routes. There is a starting point and destination for a journey. There are also columns representing arrival and departure times for each leg of the journey. This data set is well suited for analysis as a graph. Imagine several cities (nodes) connected by air routes (edges).
If you are an airline, you can ask the following questions:
- What is the shortest way to get from A to B? Think of it in terms of distance and time.
- Is there a way to get from C to D?
- Which airports have the busiest traffic?
- Which airport is “in between” most other airports? So it can be turned into a local transit point.
0. Airlines. CSV data
1. Data import and observe variables
import numpy as np
import pandas as pd
data = pd.read_csv('.. /Profile/Airlines.csv')
data.shape # Data size
# (100, 16) 100 rows, 16 columns
Copy the code
The results Look at the types of individual variables
data.dtypes The type of each variable
Copy the code
The results
2. Data cleaning
# convert sched_dep_time to 'STD' -- scheduled departure time
data['std'] = data.sched_dep_time.astype(str).str.replace('(\d{2}$)'.' ') + ':' + data.sched_dep_time.astype(str).str.extract('(\d{2}$)', expand=False) + '00'
# convert sched_arr_time to "STA" -- scheduled arrival time
data['sta'] = data.sched_arr_time.astype(str).str.replace('(\d{2}$)'.' ') + ':' + data.sched_arr_time.astype(str).str.extract('(\d{2}$)', expand=False) + '00'
# convert dep_time to 'ATD' - actual departure time
data['atd'] = data.dep_time.fillna(0).astype(np.int64).astype(str).str.replace('(\d{2}$)'.' ') + ':' + data.dep_time.fillna(0).astype(np.int64).astype(str).str.extract('(\d{2}$)', expand=False) + '00'
# Convert ARR_time to 'ATA' - actual time of arrival
data['ata'] = data.arr_time.fillna(0).astype(np.int64).astype(str).str.replace('(\d{2}$)'.' ') + ':' + data.arr_time.fillna(0).astype(np.int64).astype(str).str.extract('(\d{2}$)', expand=False) + '00'
Copy the code
3. Merge time information
data['date'] = pd.to_datetime(data[['year'.'month'.'day']])
data = data.drop(columns = ['year'.'month'.'day'])
Copy the code
4. Create the diagram
import networkx as nx
FG = nx.from_pandas_edgelist(data, source='origin', target='dest', edge_attr=True.)# View all nodes
FG.nodes()
Copy the code
The results
#NodeView(('EWR', 'MEM', 'LGA', 'FLL', 'SEA', 'JFK', 'DEN', 'ORD', 'MIA', 'PBI', 'MCO', 'CMH', 'MSP', 'IAD', 'CLT', 'TPA', 'DCA', 'SJU', 'ATL', 'BHM', 'SRQ', 'MSY', 'DTW', 'LAX', 'JAX', 'RDU', 'MDW', 'DFW', 'IAH', 'SFO', 'STL', 'CVG', 'IND', 'RSW', 'BOS', 'CLE'))
Copy the code
Query all edges in the graph
# View all edges
FG.edges()
Copy the code
The results
#EdgeView([('EWR', 'MEM'), ('EWR', 'SEA'), ('EWR', 'MIA'), ('EWR', 'ORD'), ('EWR', 'MSP'), ('EWR', 'TPA'), ('EWR', 'MSY'), ('EWR', 'DFW'), ('EWR', 'IAH'), ('EWR', 'SFO'), ('EWR', 'CVG'), ('EWR', 'IND'), ('EWR', 'RDU'), ('EWR', 'IAD'), ('EWR', 'RSW'), ('EWR', 'BOS'), ('EWR', 'PBI'), ('EWR', 'LAX'), ('EWR', 'MCO'), ('EWR', 'SJU'), ('LGA', 'FLL'), ('LGA', 'ORD'), ('LGA', 'PBI'), ('LGA', 'CMH'), ('LGA', 'IAD'), ('LGA', 'CLT'), ('LGA', 'MIA'), ('LGA', 'DCA'), ('LGA', 'BHM'), ('LGA', 'RDU'), ('LGA', 'ATL'), ('LGA', 'TPA'), ('LGA', 'MDW'), ('LGA', 'DEN'), ('LGA', 'MSP'), ('LGA', 'DTW'), ('LGA', 'STL'), ('LGA', 'MCO'), ('LGA', 'CVG'), ('LGA', 'IAH'), ('FLL', 'JFK'), ('SEA', 'JFK'), ('JFK', 'DEN'), ('JFK', 'MCO'), ('JFK', 'TPA'), ('JFK', 'SJU'), ('JFK', 'ATL'), ('JFK', 'SRQ'), ('JFK', 'DCA'), ('JFK', 'DTW'), ('JFK', 'LAX'), ('JFK', 'JAX'), ('JFK', 'CLT'), ('JFK', 'PBI'), ('JFK', 'CLE'), ('JFK', 'IAD'), ('JFK', 'BOS')])
Copy the code
Calculate the average edge density of the graph
# Note that our 100 rows are all from 3 airports
nx.draw_networkx(FG, with_labels=True)
nx.algorithms.degree_centrality(FG)
# Average edge density of graph
nx.density(FG)
# 0.09047619047619047
Copy the code
Run results (visualize graphs) Find the average shortest path length of all paths in the figure
The average shortest path length of all paths in the graph
nx.average_shortest_path_length(FG)
# 2.36984126984127
Copy the code
The results
For a node of degree K – find the average neighbor degree
# For a node of degree K - what is the average of its neighbor degrees
nx.average_degree_connectivity(FG) # For a node of degree k - What is the average of its neighbours' degree?
#{20: 1.95, 1: 19.307692307692307, 2: 19.0625, 17: 2.0588235294117645, 3: 19.0}
Copy the code
The results
5. Calculate the route
# Find all routes from JAX to DFW
for path in nx.all_simple_paths(FG, source='JAX', target='DFW') :print(path)
Find the Dijkstra path from JAX to DFW.
dijpath = nx.dijkstra_path(FG, source='JAX', target='DFW')
dijpath # ['JAX', 'JFK', 'SEA', 'EWR', 'DFW'] Last line in the figure below
Copy the code
The results
Find the shortest flight time of all routes
Find the shortest flight time of all routes
shortpath = nx.dijkstra_path(FG, source='JAX', target='DFW', weight='air_time')
shortpath
# ['JAX', 'JFK', 'BOS', 'EWR', 'DFW']
Copy the code
Running results:
Matters needing attention
- The start point and destination can be nodes, and other information should be node or edge attributes; A single edge can be thought of as a journey. Such trips will have different times, flight numbers, tail numbers and other relevant information.
- Note that year, month, day, and time information is scattered in many columns; To create a date and time column containing all of this information, separate the scheduled and actual arrival and departure times; You should end up with four date-time columns (eta, ETA, ETA, and ETA)
- Time format problem
- Data type problem
- Trouble with NaN values
conclusion
Source of learning: STATION B and its classroom PPT, the code is reproduced
PPT mistakes a little bit more with a long time to learn the principle of finding mistakes ah ~
The above code has been successfully run! Environment: Vs Code
The essay is just a study note, recording a process from 0 to 1
Hope to help you, if there is a mistake welcome small partners to correct ~