G2Plot: A configurable, elegant, data-analyzing-oriented statistical chart library that helps developers plot high-quality statistical charts at minimum cost. It was born out of the business appeal of real scenarios of Ali Economy BI products.

Introduction to the

In addition to trees, there is a special kind of graph called rectangular tree when displaying hierarchical data. Rectangle tree graph transforms tree structure into plane rectangle. This structure can not only display hierarchical relationship, but also be suitable to display the weight relationship of data.

For example, the image below (left) shows a company’s sales data for the year. Click on the rectangle (office supplies) and drill down to the sub-branch data (right). In the nodes of the same level, the coordinate system is divided into several rectangular blocks according to their weight (sales volume), and the contrast of color enhanced classification is distinguished.

This paper mainly introduces the algorithm used to realize the layout function of “divide coordinate system into several rectangular blocks according to their respective weights in nodes of the same level”, and how to apply it in G2 Plot.

Before going into the details of each algorithm, let’s first consider what characteristics an ideal rectangular tree layout should have:

  • We wanted the layout to be as close to square as possible, because the shape of the slender bars was not conducive to comparing weights with each other, and it was not conducive to click-driller interactions. Define the aspect ratio as long side/short side, that is, the higher the aspect ratio, the worse the effect
  • We want the tree graph to be orderly and easy for users to read continuously
  • We want the tree graph to be as stable as possible: when the input data changes, the tree graph should change as little as possible.

Next, we understand the layout of G2 Plot rectangular tree from aspect ratio, order and stability.

The layout algorithm of rectangular tree graph is based on D3-Hierarchy. The following content is also based on the D3-Hierarchy library. Tip2: The user can change the default algorithm in G2Plot using the hierarchyConfig property

1. TreemapBinary

The Treemap Binary algorithm recursively divides the specified nodes into an approximately balanced Binary tree, with the wide rectangle split horizontally and the tall rectangle split vertically.

The basic idea is to split the Node array into two parts each time, with sum/2 as the target, and repeat the process until the subarray can no longer be split. Take data [50, 10, 6, 20] as an example:

The graph generated for each segmentation is:

The pseudocode is as follows:

TreemapBinary set the initial width and height of the visual space set the initial split start and end positions get the weight of the parent node (in most cases, the sum of the weight of the child nodes) Length) IF(the node is already a leaf node) according to the location, width and length information, assign the total target value to the node = node weight / 2; N(current traversal node subscript) = 0; Left subtree = empty array; WHILE (left subtree < total target value) Add the NTH child node N + 1 to the left subtree; Left subtree width = (left subtree weight/total weight) * width right subtree width = (right subtree weight/total weight) * width Partition (left subtree, position, left subtree width, Length = (left subtree weight/total weight) * Length (right subtree weight/total weight) * length (left subtree weight/total weight) * length (left subtree weight/total weight) Partition (right subtree, position, width, right subtree length)Copy the code

Let’s judge TreemapBinary

  • Aspect ratio: TreemapBinary does not intentionally guarantee aspect ratio, but since the current aspect ratio is judged to be horizontal or vertical during subtree segmentation, the aspect ratio is mediocre
  • Order: The order in which a TreemapBinary is cut depends on the order of the binary tree
  • Stability: Stable. For an approximately balanced binary tree, less is changed when the weight of a child node is changed

See binary for more details

2. TreemapDice & TreemapSlice & TreemapSliceDice

The TreemapDice algorithm is relatively simple, and the rectangular area specified by x0, y0, x1 and y1 is divided horizontally according to the weight of each child node of the specified node. The child nodes are arranged in order.

The schematic diagram is as follows:

Evaluate TreemapDice algorithm in turn

  • Aspect ratio: very high, all data are one-way cutting, the effect is very poor
  • Orderliness: Orderliness. Children are arranged in order
  • Stability: Stable. When the weight of a node changes, the overall layout is vertical, so there are few changes

TreemapSlice algorithm is similar to TreemapDice with the difference of horizontal segmentation

Both TreemapSlice and TreemapDice have poor aspect ratios, especially in nested trees. TreemapSliceDice improves this practice by using the TreemapSlice algorithm if the specified node depth is odd; Otherwise, use the TreemapDice algorithm. The illustration is as follows.

TreemapSliceDice The behavior of TreemapSliceDice is consistent with the behavior of TreemapSlice in a tree of level 1. While in the nested tree diagram, its aspect ratio is improved.

The above specific code can refer to the Slice algorithm, DICE algorithm, sliceDice algorithm

3. TreemapSquarify & TreemapResquarify

TreemapSquarify is a prioritized aspect ratio algorithm that makes the segmented rectangle as close to the square as possible and optimizes efficiency using greedy algorithm.

TreemapSquarify is a recursive process that starts with the shortest edge of the rectangle at a time, and as nodes are added one by one, either add to the current row or open a new row in the remaining rectangle, depending on what operation minimizes its aspect ratio.

A classic example: a 4*6 rectangle with data of [6, 6, 4, 3, 2, 2, 1], its segmentation process and rendering in G2Plot are shown below

As shown in the figure, it is filled from the shortest edge to [6, 6, 4], and there are two choices. Step3 is filled when moving forward, and Step4 opens a new line. The worst aspect is better than Step4, so Step3 is abandoned and the current line should only include [6, 6] for layout. For the next recursion, start with a new line, [4], and the remaining rectangle.

Next, we need to determine how to calculate the aspect ratio of a node.

We set: the weight of the new node is R, the sum of the current node list is S, the shortest side of the distributable rectangle is W, the longest side is H, and the sum is S. The value we want to solve is the aspect ratio of the new node when the current node is in the same row. (Take Step 3 as an example, r = 4, S = 16, W = 4, H = 6, s = 24)

  1. The total area of distributable rectangles is zerow*h, the distributable area of the node column isw*h*s/S, the allocated area of the current node isw*h*r/S, the width of the current node ish*s/S(In the above case, the area of the distributable rectangle is 24. The area occupied by [6, 6, 4] is 16, and the area allocated by node 4 is 4 and the width is 4)
  2. According to the prerequisite: Fill the shortest edge. The length of the current node can bew*r/s(In the above case, the length of node 4 is 1)
  3. Aspect ratio = Max(length/width, width/length). The Max ((w*S*r)/(s*s*h) , (s*s*h/w*S*r)). (Max (0.25, 4))

Because s over w times h is a constant in every recursion. Therefore, when extracting the coefficient and comparing aspect ratio, only comparison can be made. Max((w*w*r)/(s*s), (s*s)/(w*w*r)).

In order to improve efficiency, sort before running TreemapSquarify. Therefore, in a row of nodes, just calculate the aspect ratio of the node with the maximum weight and the node with the minimum weight, and set the maximum weight value as Max. Min is minimum, calculate the Max ((w * w * Max)/(s * s), (s) * s/min) * w * (w)

In the D3. treemapSquarify algorithm, the expected aspect ratio can also be set. In the example above, we expect the aspect ratio to be as close to 1 as possible, that is, as square as possible. The default aspect ratio for treemapSquarify is the golden ratio, which is 0.618 as close as possible. The specific code can refer to squarify algorithm

Now let’s evaluate TreemapSquarify algorithm:

  • Aspect ratio: best effect
  • Orderliness: disorder. The algorithm starts with sorting
  • Stability: Generally, when changing the weight of a node, a large change may occur

TreemapResquarify is modified on the basis of TreemapSquarify, that is, each time record the layout generated using TreemapSquarify last time, if there is a subsequent data update, continue to slice or dice on the basis of the last generated layout. This improved stability by changing only the size without changing the relative position of the nodes, but subsequent updates to the layout were less than ideal and only the first layout applied TreemapSquarify. The specific code can refer to resquarify algorithm

conclusion

After analyzing the differences of each algorithm in detail, we re-compare each algorithm according to the evaluation criteria, and we can get

Slice&Dice > Binary > Squarify stability: Slice&Dice = Binary > Squarify

reference

  • d3-hierarchy
  • treemap-history
  • Squarified Treemaps