Use graph machine learning to explore A - share correlation changes

Earlier in this series, we described how to use the Python language Graph analysis library NetworkX [3] + Nebula Graph [4] to analyze the character relationships in Game of Thrones.

In this paper, we will introduce how to use The graph analysis library JGraphT [5] of Java language and use the graph library MxGraph [6] to visually explore the change of the correlation of individual stocks in the A-share industry over time.

Processing of data sets

The main analysis methods in this paper refer to [7,8], and there are two data sets:

Stock data (point set)

160 stocks (excluding delisting or ST) are selected from A shares in order of stock codes. Each stock is modeled as a point, and the attributes of each point include stock code, stock name, and the classification of the corresponding listed company by CSRC.

Table 1: Sample point set

Vertex id	Stock code	Stock name	Subordinate to the plate
1	SZ0001	Ping an bank	The financial industry
2	600000	Shanghai pudong development bank	The financial industry
3	600004	Baiyun airport	The transportation
4	600006	Dongfeng motor	Automobile manufacturing
5	600007	China’s international trade	Development zone,
6	600008	The first shares	The environmental protection industry
7	600009	Shanghai airport	The transportation
8	600010	Baotou steel co	The iron and steel industry

Stock relation (edge set)

An edge has only one property, the weight. The weight of edge represents the business similarity of the two stocks of the listed companies represented by the source point and target point of edge — refer to the specific calculation method of similarity [7,8] : Within a period of time (January 1, 2014 — January 1, 2020), PijP_{iJ}Pij defines the distance between individual stocks as (i.e. the edge weight between two points) :

$L_ {ij} = \ SQRT {2 (1-p_ {ij})}$

In this way, the distance ranges from 0 to 2. This means that the farther the distance between individual stocks, the lower the correlation between the two returns.

Table 2: Example edge set

The source point ID of the edge	The target point ID of the edge	The weight of edge
11	12	0.493257968
22	83	0.517027513
23	78	0.606206233
2	12	0.653692415
1	11	0.677631482
1	27	0.695705171
1	12	0.71124344
2	11	0.73581915
8	18	0.771556458
12	27	0.785046446
9	20	0.789606527
11	27	0.796009627
25	63	0.797218349
25	72	0.799230001
63	115	0.803534952

Such a set of points and edges forms a Graph network that can be stored in the Graph database Nebula Graph.

JGraphT

JGraphT is an open source Java class library that not only provides us with a variety of efficient and generic graph data structures, but also provides many useful algorithms for solving the most common graph problems:

Support directed edge, undirected edge, weighted edge, unweighted edge, etc.;
Support simple graph, multiple graph, pseudo graph;
Special iterators (DFS, BFS) for graph traversal are provided.
Provides a large number of commonly used graph algorithms, such as path finding, isomorphism detection, coloring, common ancestor, walk, connectivity, matching, cyclic detection, partition, cutting, flow, centrality and other algorithms;
GraphViz can be easily imported/exported [9]. The exported GraphViz can be imported into the visualization tool Gephi[10] for analysis and display.
You can easily use other graphing components such as JGraphX, mxGraph, Guava Graphs Generators to plot the graph network.

Let’s try this out by creating a directed graph in JGraphT:

import org.jgrapht.*;
import org.jgrapht.graph.*;
import org.jgrapht.nio.*;
import org.jgrapht.nio.dot.*;
import org.jgrapht.traverse.*;

import java.io.*;
import java.net.*;
import java.util.*;

Graph<URI, DefaultEdge> g = new DefaultDirectedGraph<>(DefaultEdge.class);
Copy the code

Add vertices:

URI google = new URI("http://www.google.com");
URI wikipedia = new URI("http://www.wikipedia.org");
URI jgrapht = new URI("http://www.jgrapht.org");

// add the vertices
g.addVertex(google);
g.addVertex(wikipedia);
g.addVertex(jgrapht);
Copy the code

Add:

// add edges to create linking structure
g.addEdge(jgrapht, wikipedia);
g.addEdge(google, jgrapht);
g.addEdge(google, wikipedia);
g.addEdge(wikipedia, google);
Copy the code

Nebula Graph Database

JGraphT usually use local files as data sources, this study of the static network will be a problem, but if the figure network often change – for example, every time the stock data are changing daily – generating new static file loading analysis you some trouble again, it is best to the whole process of change can be written to a database persistence, And it can load subgraphs or full graphs directly from the database in real time for analysis. This article uses Nebula Graph as the Graph database to store Graph data.

Nebula’s Java client, Nebula-Java [11] provides two ways to access Nebula Graph: One is to interact with the query engine layer [13] through the graph query language nGQL [12], which is usually suitable for subgraph access types with complex semantics. The other is to interact directly with the underlying storage layer (Storaged) [14] through an API to obtain the full range of points and edges. In addition to having access to Nebula Graph itself, Nebula Java provides examples of interaction with Neo4j [15], JanusGraph [16], Spark [17], and more.

In this article, we chose to access the Storage tier directly (Storaged) to get all the points and edges. The following two interfaces can be used to read all point and edge data:

// space is the name of the graph space to be scanned, returnCols is the point/edge and its attribute column to be read,
// returnCols parameter format: {tag1Name: prop1, prop2, tag2Name: prop3, prop4, prop5}
Iterator<ScanVertexResponse> scanVertex( String space, Map
       
        > returnCols)
       ,>;
Iterator<ScanEdgeResponse> scanEdge( String space, Map
       
        > returnCols)
       ,>;
Copy the code

Step 1: Initialize a client and a ScanVertexProcessor. ScanVertexProcessor is used to decode the read vertex data:

MetaClientImpl metaClientImpl = new MetaClientImpl(metaHost, metaPort);
metaClientImpl.connect();
StorageClient storageClient = new StorageClientImpl(metaClientImpl);
Processor processor = new ScanVertexProcessor(metaClientImpl);
Copy the code

Step 2: Call the scanVertex interface, which returns an iterator to the scanVertexResponse object:

Iterator<ScanVertexResponse> iterator =
                storageClient.scanVertex(spaceName, returnCols);
Copy the code

Step 3: Keep reading the scanVertexResponse object to which the iterator points until all the data is read. The read vertex data is stored and then added to the graph structure of JGraphT:

while (iterator.hasNext()) {
  ScanVertexResponse response = iterator.next();
  if (response == null) {
    log.error("Error occurs while scan vertex");
    break;
  }
  
  Result result =  processor.process(spaceName, response);
  results.addAll(result.getRows(TAGNAME));
}
Copy the code

Reading edge data is similar to the process above.

Graph analysis in JGraphT

Step 1: Create an undirected weighted graph in JGraphT:

Graph<String, MyEdge> graph = GraphTypeBuilder
                .undirected()
    .weighted(true)
    .allowingMultipleEdges(true)
    .allowingSelfLoops(false)
    .vertexSupplier(SupplierUtil.createStringSupplier())
    .edgeSupplier(SupplierUtil.createSupplier(MyEdge.class))
    .buildGraph();
Copy the code

Step 2: Add the dot and edge data read from the Nebula Graph space in the previous step to the Graph:

for (VertexDomain vertex : vertexDomainList){
    graph.addVertex(vertex.getVid().toString());
    stockIdToName.put(vertex.getVid().toString(), vertex);
}

for (EdgeDomain edgeDomain : edgeDomainList){
    graph.addEdge(edgeDomain.getSrcid().toString(), edgeDomain.getDstid().toString());
    MyEdge newEdge = graph.getEdge(edgeDomain.getSrcid().toString(), edgeDomain.getDstid().toString());
    graph.setEdgeWeight(newEdge, edgeDomain.getWeight());
}
Copy the code

Step 3: refer to the analysis method in [7,8], use Prim minimun-spanning tree for graph and call the encapsulated drawGraph interface to draw the graph:

Prim’s algorithm, an algorithm in graph theory, searches for minimum spanning trees in weighted connected graphs. That is, the tree formed by the edge subset searched by this algorithm not only includes all vertices in the connected graph, but also has the minimum sum of weights of all edges.

SpanningTreeAlgorithm.SpanningTree pMST = new PrimMinimumSpanningTree(graph).getSpanningTree();

Legend.drawGraph(pMST.getEdges(), filename, stockIdToName);
Copy the code

Step 4: The drawGraph method encapsulates the layout of the drawing and other parameters. This method renders stocks in the same sector the same color and aligns stocks that are close together.

public class Legend {...public static void drawGraph(Set<MyEdge> edges, String filename, Map<String, VertexDomain> idVertexMap) throws IOException {
     // Creates graph with model
     mxGraph graph = new mxGraph();
     Object parent = graph.getDefaultParent();

     // set style
     graph.getModel().beginUpdate();
     mxStylesheet myStylesheet =  graph.getStylesheet();
     graph.setStylesheet(setMsStylesheet(myStylesheet));

     Map<String, Object> idMap = new HashMap<>();
     Map<String, String> industryColor = new HashMap<>();

     int colorIndex = 0;

     for (MyEdge edge : edges) {
       Object src, dst;
       if(! idMap.containsKey(edge.getSrc())) { VertexDomain srcNode = idVertexMap.get(edge.getSrc()); String nodeColor;if (industryColor.containsKey(srcNode.getIndustry())){
           nodeColor = industryColor.get(srcNode.getIndustry());
         }else {
           nodeColor = COLOR_LIST[colorIndex++];
           industryColor.put(srcNode.getIndustry(), nodeColor);
         }
         src = graph.insertVertex(parent, null, srcNode.getName(), 0.0.105.50."fillColor=" + nodeColor);
         idMap.put(edge.getSrc(), src);
       } else {
         src = idMap.get(edge.getSrc());
       }

       if(! idMap.containsKey(edge.getDst())) { VertexDomain dstNode = idVertexMap.get(edge.getDst()); String nodeColor;if (industryColor.containsKey(dstNode.getIndustry())){
           nodeColor = industryColor.get(dstNode.getIndustry());
         }else {
           nodeColor = COLOR_LIST[colorIndex++];
           industryColor.put(dstNode.getIndustry(), nodeColor);
         }

         dst = graph.insertVertex(parent, null, dstNode.getName(), 0.0.105.50."fillColor=" + nodeColor);
         idMap.put(edge.getDst(), dst);
       } else {
         dst = idMap.get(edge.getDst());
       }
       graph.insertEdge(parent, null."", src, dst);
     }


     log.info("vertice " + idMap.size());
     log.info("colorsize " + industryColor.size());

     mxFastOrganicLayout layout = new mxFastOrganicLayout(graph);
     layout.setMaxIterations(2000);
     //layout.setMinDistanceLimit(10D);
     layout.execute(parent);

     graph.getModel().endUpdate();

     // Creates an image than can be saved using ImageIO
     BufferedImage image = createBufferedImage(graph, null.1, Color.WHITE,
                                               true.null);

     // For the sake of this example we display the image in a window
     // Save as JPEG
     File file = new File(filename);
     ImageIO.write(image, "JPEG", file); }... }Copy the code

Step 5: Generate visualization:

The color of each vertex in Figure 1 represents the sector classified by CSRC for the listed company to which the stock belongs.

It can be seen that stocks with a high degree of actual business approximation have been clustered into clusters (such as high-speed sector, banking sector and airport aviation sector), but some stocks with no obvious correlation will also be clustered together, and specific reasons need to be studied separately.

Figure 1: Aggregation calculated based on stock data from 2015-01-01 to 2020-01-01

Step 6: Some other dynamic exploration based on different time Windows

In the previous section, the conclusion is mainly based on the aggregation of individual strands from 2015-01-01 to 2020-01-01. In this section, we also made some other attempts: 2 years was taken as a time sliding window, and the analysis method remained unchanged to explore qualitatively whether the aggregation group would change over time.

Figure 2: Aggregation calculated based on stock data from 2014-01-01 to 2016-01-01

Figure 3: Aggregation calculated based on stock data from 2015-01-01 to 2017-01-01

Figure 4: Aggregation calculated based on stock data from 2016-01-01 to 2018-01-01

Figure 5: Clustering based on stock data from 2017-01-01 to 2019-01-01

Figure 6: Aggregation calculated based on stock data from 2018-01-01 to 2020-01-01

Rough analysis shows that with the change of time window, some sectors (high-speed, banking, airport aviation, real estate, energy) have maintained a good concentration of individual stocks within the sector — which means that with the change of time, the various in this sector has maintained a relatively high correlation; But the aggregation of some sectors (manufacturing) continues to change — meaning correlations change all the time.

Disclaim

This article does not constitute any investment advice, and the author does not own any stocks listed in this article.

Limited by suspension, circuit breaker, ups and downs limit, transfer, mergers and acquisitions, main business changes and other circumstances, there may be errors in data processing, did not check one by one.

Limited by time, this paper only selects the data of 160 individual stock samples in the past 6 years, and only adopts the method of minimum extended tree to do clustering classification. In the future, more graph machine learning approaches can be tried using larger data sets (e.g. US stocks, derivatives, digital currencies).

The code in this article can be seen [18]

Reference

[1] A Review of game of Thrones with A Nebula Graph by NetworkX and Gephi nebula-graph.com.cn/posts/game-…

[2] A Review of game of Thrones with A Nebula Graph by NetworkX and Gephi nebula-graph.com.cn/posts/game-…

[3] NetworkX: a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. networkx.github.io/

[4] Nebula Graph: A powerfully distributed, scalable, lightning-fast graph database written in C++. nebula-graph.io/

[5] JGraphT: a Java library of graph theory data structures and algorithms. jgrapht.org/

[6] mxGraph: JavaScript diagramming library that enables interactive graph and charting applications. jgraph.github.io/mxgraph/

[7] Bonanno, Giovanni & Lillo, Fabrizio & Mantegna, Rosario. (2000). High-frequency Cross-correlation in a Set of Stocks. arXiv.org, Quantitative Finance Papers. 1. 10.1080/713665554.

[8] Mantegna, R.N. Hierarchical Structure in Financial Markets. Eur. Phys. J. B 11, 193 — 197 (1999).

[9] graphviz.org/

[10] gephi.org/

[11] github.com/vesoft-inc/…

[12] Nebula Graph Query Language (nGQL).docs. Nebula graph. IO /manual /1…

[13] Nebula Graph Query Engine. github.com/vesoft-inc/…

[14] Nebula- Storage: A Distributed consistent graph storage. github.com/vesoft-inc/…

[15] Neo4j. www.neo4j.com

[16] JanusGraph. janusgraph.org

[17] Apache Spark. spark.apache.org.

[18] github.com/Judy1992/ne…

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Use graph machine learning to explore A – share correlation changes

Processing of data sets

Stock data (point set)

Stock relation (edge set)

JGraphT

Nebula Graph Database

Graph analysis in JGraphT

Disclaim

Reference

Use graph machine learning to explore A – share correlation changes

Processing of data sets

Stock data (point set)

Stock relation (edge set)

JGraphT

Nebula Graph Database

Graph analysis in JGraphT

Disclaim

Reference

Related Posts

Springboot configuration file | August more text challenge

Digital Image Processing Matlab- Color Image Processing

Java Learning Summary 2 — Network programming