preface

Visualization is a theory, method and technology that uses computer graphics and image processing technology to convert data into graphics or images and display them on the screen, and then conduct interactive processing. After the data is displayed in a visual way, it can assist users to analyze complex relational data, so as to discover the value contained in the data. The layout of the diagram is a very important cornerstone in the visualization of the diagram. A reasonable layout of the visualization diagram can help us to quickly analyze and accurately locate the problem. If the diagram is not properly laid out, it will be difficult for users to extract the information they want from the diagram, and sometimes it will mislead users to make the wrong judgment. In the following figure, for example, the left is not for a specific layout desultorily, after have to hierarchical layout is on the right side of figure, we can very quickly from the right to identify the root node in the graph nodes, secondary, tertiary, and the relationship between the nodes, and other important information, but we are hard to identify the information from the left side of the picture. Therefore, it is critical to use proper diagram layout in our everyday business scenarios.

Common layout schemes

Various layout plans

Common layouts in the visualization field include tree layout, Dagre layout, force guided layout, concentric circle layout, grid layout and so on.

Various layout schemes and corresponding scenarios

The layout scheme Applicable scenario
Stowe layout It is suitable for describing the relationship between things, such as social network relationship, computer network relationship, capital transaction relationship and other relationship network scenarios
Grid layout Nodes are arranged in a certain sort network, suitable for hierarchical network, easy to see the whole hierarchy
Concentric circle layout When nodes are sorted according to degree, a group of nodes with large degree will be arranged in the most center, while nodes with small degree will be distributed in the most outer layer, so that they are arranged in concentric circles, which is suitable for fast search of nodes with large degree
Circular layout It is suitable for finding key nodes with many association relationships. For example, it can clearly identify which nodes have more association relationships in a circular layout diagram
Tree/Dagre layout It is suitable for scenarios that have clear hierarchies or require strict topological properties, such as hierarchical network diagrams, flow charts, etc

The tree layout

In many business scenario, we usually compare between the node hierarchy, and also need to judge whether there is a loop in the figure, and so on and so forth, in this case the tree layout is very appropriate, tree all the conventional layout must be understood for the layout of the chart on the left side of this traditional form, starting from the root node, In the same direction from top to bottom or from left to right in a hierarchical relationship. This presentation is the most intuitive and consistent with our conventional perception of trees, but it has some problems, such as a low use of screen space. Hierarchy nodes that are closer to the root node are sparser, resulting in a lot of white space that wastes screen space.Therefore, for some scenarios with limited layout space, we can adopt the layout shown above (right), which displays the whole tree in a radial shape according to node depth, with root nodes at the center and leaf nodes at the periphery. Compared with the traditional tree graph layout, this layout can make better use of space, but it may not be as good as the traditional tree graph layout in accordance with people’s perception of tree structure, and with the increase of the number of nodes, the path of the tree is less clear, but we can also make up for the shortage through certain interactive ways. For example, clicking on a child node shows the backtracking path of that node by default. The tree layout algorithm is not very difficult. I found an article on Compact tree layout on GitHub, and you canClick on the direct, this article also has a detailed explanation of the implementation principle, this article will not tell.

Dagre layout

Dagre (directed acyclic graph) layouts are somewhat similar to tree layouts, with some differences; Any directed tree graph must be directed acyclic graph, but directed acyclic graph is not necessarily directed tree graph. Because directed acyclic graph has its strict topological properties, it has strong ability of process expression. Based on this feature, it is widely used in many scenarios where fragmented tasks need to be organized and controlled. For example, the following screenshot from PAI, Ariyun’s machine learning platform, is a typical directed acyclic graph.GitHub has a well-established algorithm for DAG layoutsportalMany visual libraries also use this scheme for DAG layout. From the runLayout function below, we can roughly see the process of dagre layout.

function runLayout(g, time) {
  time(" makeSpaceForEdgeLabels".function() { makeSpaceForEdgeLabels(g); });
  time(" removeSelfEdges".function() { removeSelfEdges(g); });
  time(" acyclic".function() { acyclic.run(g); });
  time(" nestingGraph.run".function() { nestingGraph.run(g); });
  time(" rank".function() { rank(util.asNonCompoundGraph(g)); });
  time(" injectEdgeLabelProxies".function() { injectEdgeLabelProxies(g); });
  time(" removeEmptyRanks".function() { removeEmptyRanks(g); });
  time(" nestingGraph.cleanup".function() { nestingGraph.cleanup(g); });
  time(" normalizeRanks".function() { normalizeRanks(g); });
  time(" assignRankMinMax".function() { assignRankMinMax(g); });
  time(" removeEdgeLabelProxies".function() { removeEdgeLabelProxies(g); });
  time(" normalize.run".function() { normalize.run(g); });
  time(" parentDummyChains".function() { parentDummyChains(g); });
  time(" addBorderSegments".function() { addBorderSegments(g); });
  time(" order".function() { order(g); });
  time(" insertSelfEdges".function() { insertSelfEdges(g); });
  time(" adjustCoordinateSystem".function() { coordinateSystem.adjust(g); });
  time(" position".function() { position(g); });
  time(" positionSelfEdges".function() { positionSelfEdges(g); });
  time(" removeBorderNodes".function() { removeBorderNodes(g); });
  time(" normalize.undo".function() { normalize.undo(g); });
  time(" fixupEdgeLabelCoords".function() { fixupEdgeLabelCoords(g); });
  time(" undoCoordinateSystem".function() { coordinateSystem.undo(g); });
  time(" translateGraph".function() { translateGraph(g); });
  time(" assignNodeIntersects".function() { assignNodeIntersects(g); });
  time(" reversePoints".function() { reversePointsForReversedEdges(g); });
  time(" acyclic.undo".function() { acyclic.undo(g); });
}
Copy the code

If we calculate each time equivalent to each step of the calculation process, emM… That’s 26 steps, isn’t it complicated? But a closer look at the various functions that are called reveals that most of them actually deal with data. The core steps of a DAG layout can be roughly divided into four steps:

  1. RemoveSelfEdges and makeSpaceForEdgeLabels (if there are removeSelfEdges)
  2. The partition hierarchy is called nestinggraph.run
  3. Sorting nodes in the same hierarchy is addBorderSegments
  4. Calculate position layout

Stowe layout

Force-guided layout was first proposed by Eades, and then redefined and improved by many researchers. The figure below shows three types of force-guided layout on D3.Force to guide the layout of the advantage is that because of the existence of mutually exclusive force can reduce the overlap between nodes, but actually when nodes are too much, will still have overlapping nodes, in limited space to avoid all the nodes do not overlap, this has proven to be a NP problem (unanswered) world difficult problem, at present, and in some scenarios of big data will appear hairball phenomenon, The diagram below.

Circular layout

Circular layout is to all the nodes to a certain sort order can customize (here), then the sequential are arranged in a circle, the benefits of this layout is through the custom sorted circular layout, we can quickly analyze the desired results, such as the image below is sorted by correlation degree after the circular layout, We can clearly see that the nodes in the red box have more degree of association than other nodes, which is difficult to achieve by other layout methods. For example, using the force-guiding layout above, it is difficult to identify which nodes have more degree of association. But it also has obvious disadvantages, because as the number of nodes increases, the radius of the circle will become larger and larger, easily beyond the visible area of our normal screen.There are actually only two core steps to the circular layout:

  1. Custom sort
  2. Circular layout

As for the second step, the algorithm is relatively fixed. It mainly calculates the position of each node in the ring according to the number of nodes. The following is the code for the ring layout in ANTvis/Layout:

public execute() {
    const self = this
    const nodes = self.nodes
    const edges = self.edges
    const n = nodes.length
    if (n === 0) {
      if (self.onLayoutEnd) self.onLayoutEnd()
      return
    }

    if(! self.width &&typeof window! = ='undefined') {
      self.width = window.innerWidth
    }
    if(! self.height &&typeof window! = ='undefined') {
      self.height = window.innerHeight
    }
    if(! self.center) { self.center = [self.width /2, self.height / 2]}const center = self.center

    if (n === 1) {
      nodes[0].x = center[0]
      nodes[0].y = center[1]
      if (self.onLayoutEnd) self.onLayoutEnd()
      return
    }

    let radius = self.radius
    let startRadius = self.startRadius
    let endRadius = self.endRadius
    const divisions = self.divisions
    const startAngle = self.startAngle
    const endAngle = self.endAngle
    const angleStep = (endAngle - startAngle) / n
    // layout
    const nodeMap: IndexMap = {}
    nodes.forEach((node, i) = > {
      nodeMap[node.id] = i
    })
    self.nodeMap = nodeMap
    const degrees = getDegree(nodes.length, nodeMap, edges)
    self.degrees = degrees
    if(! radius && ! startRadius && ! endRadius) { radius = self.height > self.width ? self.width /2 : self.height / 2
    } else if(! startRadius && endRadius) { startRadius = endRadius }else if(startRadius && ! endRadius) { endRadius = startRadius }const angleRatio = self.angleRatio
    const astep = angleStep * angleRatio

    const ordering = self.ordering
    let layoutNodes = []
    if (ordering === 'topology') {
      // layout according to the topology
      layoutNodes = self.topologyOrdering()
    } else if (ordering === 'topology-directed') {
      // layout according to the topology
      layoutNodes = self.topologyOrdering(true)}else if (ordering === 'degree') {
      // layout according to the descent order of degrees
      layoutNodes = self.degreeOrdering()
    } else {
      // layout according to the original order in the data.nodes
      layoutNodes = nodes
    }

    const clockwise = self.clockwise
    const divN = Math.ceil(n / divisions) // node number in each division
    for (let i = 0; i < n; ++i) {
      let r = radius
      if(! r && startRadius ! = =null&& endRadius ! = =null) {
        r = startRadius + (i * (endRadius - startRadius)) / (n - 1)}if(! r) { r =10 + (i * 100) / (n - 1)}let angle =
        startAngle +
        (i % divN) * astep +
        ((2 * Math.PI) / divisions) * Math.floor(i / divN)
      if(! clockwise) { angle = endAngle - (i % divN) * astep - ((2 * Math.PI) / divisions) * Math.floor(i / divN)
      }
      layoutNodes[i].x = center[0] + Math.cos(angle) * r
      layoutNodes[i].y = center[1] + Math.sin(angle) * r
      layoutNodes[i].weight = degrees[i]
    }

    if (self.onLayoutEnd) self.onLayoutEnd()

    return {
      nodes: layoutNodes,
      edges: this.edges,
    }
  }
Copy the code

However, the first step of custom sorting, due to the uncertainty of the sorting algorithm, will lead to a lot of interesting layout of our circular layout. There are also researchers who have studied this type of layout, and you can click on the paper to find out more

Concentric circle layout

Concentric circle layout In the scene of graph analysis, nodes are usually sorted by degree first. Nodes with the largest degree are arranged in the most central position, and the farther out, the smaller the degree is. The overall arrangement is in the form of concentric circles, as shown in the figure below. In the above circular layout, in fact, it is still difficult to accurately locate the nodes with the most degrees, but the concentric circle layout can do it, because the nodes closest to the center of the circle are the nodes with the most degrees, and the concentric circle layout can accommodate more nodes than the circular layout under the same area. Therefore, the concentric circle layout is more advantageous than the circular layout in the scene where the nodes with the most degrees are found.

Incremental layout

Incremental layout is to solve the problem that the original layout can remain relatively stable when the data changes incrementally. In the scenario of relationship diffusion and relationship discovery in graph analysis, the analyzed data is often constantly changing, and it is difficult to complete the analysis at one time. The incremental layout algorithm is needed to keep the overall layout relatively stable when the data changes, as shown in the figure below.

Incremental layout is generally divided into two cases, one is the incremental stable layout, the other is the incremental force guided layout. • Incremental layouts for stable layouts: For example, circle, Grid, the new nodes can be predicted by the original layout function, and the nodes can be completed by the tween animation. If the current canvas is a force guided layout, the new node can find a suitable initial position for the new node through tweak algorithm, and then complete the overall layout through force guided energy optimization. The detailed picture is as follows.

Graph layout

The concept of a subgraph is derived from graph theory. It refers to a graph whose node set and edge set are subsets of the node set and edge set of a graph, respectively. If the node subset or edge subset is a true subset, the subgraph is said to be a true subgraph; If every node of a graph G is also a node of its subgraph H, then H is said to be a supporting subgraph of G. Let S be a subset of V(G), take S as the nodal set, form the edge set with all those edges of G with both endpoints in S, and the resulting subgraph of G is called the derived subgraph of S in G. Let B be a subset of E(G), and the node set is composed of all the nodes of G associated with at least one edge in B, and take B as the edge set. The resulting subgraph of G is called the edge derived subgraph of B in G. For some property P, a graph is said to be a maximal subgraph with P if its subgraph with P is not a true subgraph of any subgraph with P. Among all maximal subgraphs, the one with the most edges is called the maximal subgraph. I’ve talked a long time, but I don’t think you have any sense of the subgraph. But if we look at the diagram below, we might quickly understand the subgraph layout. Because in the real business scenarios, our data is often complicated relationship, so it is difficult to use a layout can clearly express, above, we also talked about the advantages and disadvantages of all kinds of layout way, so if we can be the advantage of the various layout schemes fuses in together, present a more reasonable layout, Subgraph layouts are designed to solve these problems.

The layout method of the figure above is to separate the subgraph layout data, apply the layout algorithm respectively, and get an accurate position, and then render it uniformly on the canvas. However, there are two technical difficulties:

  1. How to determine the initial position of data points A and B?
  2. If there is A correlation point between the two groups of data A and B, what impact will they have on the two sub-networks?

This can actually provide some strategies, such as the initial position of two sets of data, which can be determined by the average center point of group A and group B data. While group A and group B can be associated with A common point through some mathematical calculation, A suitable group edge position

Intelligent layout

Aiming at the existing technical difficulties of subgraph layout, some research direction is also trying to through the way of machine learning model, using a large number of labeled data (classification) tag layout model training, in the real business scenarios through the model to forecast the real figure data layout, which recommend out the most suitable for the data layout, specific process is as follows.

The following is a study of intelligent layout by visualization researchers at Peking University. They proposed a diagram layout preservation algorithm based on Laplace transform. This work can automatically generate the initial layout of the diagram, allowing the user to select any subdiagram in the diagram and make personalized editing. Collect the subgraphs edited by multiple users, scale these subgraphs reasonably, save the distance between any two points in the subgraphs, and use the Laplace subgraph preservation algorithm combined with the initial layout of the graphs to get a more satisfying and easy to understand the layout of the graphs combining with the input of multiple users. As shown in the figure below, the user defines many subgraphs with different shapes. This algorithm can effectively solve the subgraph editing conflict, and integrate the edited subgraph layout into the initial layout. The connection is smooth and natural, and the subgraph layout maintains a very good effect. The number in the figure indicates the similarity between the subgraph layout after algorithm fusion and the subgraph layout input by the user, which is 100%. If you are interested, you can click on it

Layout in graph analysis

With so many layout schemes mentioned above, let’s take a look at the real picture analysis of the case. Below is in our business, we often can see, this is the normal force layout, guide has any overlap between nodes, so the overall look of the picture relationship network is clear, and we can also by clicking on a node highlighting the relationship of the once all interact to view the relationship between each node. The purpose of this analysis is to find outThe degree of correlation is the mostBecause in many weak social relationship scenarios, a certain type of node relationship is complex, which often means risk.The force-guided layout above obviously doesn’t help us find the nodes with the most degrees unless we click node by node and visually count them, which is obviously not realistic. Stowe layout pass, then let me try the tree layout, first of all to switch to the regular tree layout (also called to the hierarchical layout) become the appearance of the chart on the left side, the layout is too long, I was shrunk to a small can see the whole picture of the picture, we’ll switch to the radiant like tree layout as shown in the right, At this point, we can see the whole picture without having to zoom out, and the level of nodes is very clear, even though the nodes with the highest degree are still difficult to discernPass tree layout, let’s switch to circular layout. As mentioned above, circular layout is usually arranged by custom sorting first before layout. This time, our goal is to find the nodes with the most degree, so I decided to choose the method of sorting by degree. emm… There seems to be something wrong, I still can’t see the nodes with the most degree of correlation at a glance, I try to click several nodes on the right side of the layout, because nodes in the circular layout are sorted by degree. It can be seen from the figure that the degree of nodes in the upper right part is sparse, and there is a trend of increasing and decreasing anticlockwise. So as long as in the figure on the right side to find a node degree is larger than the adjacent nodes, then the node is the biggest degree of nodes, as the chart on the right side is the biggest I find out the degree of nodes, but in fact largest node may also exist multiple degrees, so to find through the layout of the circular degree of the largest node is not very perfect.So let’s try the concentric circle layout, because concentric circle layout can place the node with the largest degree closest to the center of the circle, which is theoretically perfect for finding the node with the largest degree. Below is the final result of the concentric circle layout. As expected, this layout is really perfect for locating the node degree, from which we quickly locate the node with the largest degree, and the degree of each node is clear.

conclusion

What kind of layout is appropriate for different business scenarios? This paper mainly discusses a variety of different scenarios layout solutions, hope to give you in the layout of the map to provide some ideas and reference.

The article can be reproduced at will, but please keep the original link.

We welcome you to join ES2049 Studio. Please send your resume to [email protected].