introduce

What is data visualization?

Visualization is a theory, method and technology that uses computer graphics and image processing technology to convert data into graphics or display images on the screen, and then conduct interactive processing.

Data visualization is not simply turning data into charts, but viewing the world from the perspective of data. Data visualization is the process of expressing abstract concepts and materializing abstract language.

Why data visualization

  1. First of all, we acquire far more information with vision than with any other sense.
  2. It helps the analyst get a fuller picture of the data, for example 🌰.

Let’s take a look at the following data sets

Perform a simple statistical analysis of the data, with each set of data having two variables X and Y, and then evaluate its characteristics using common statistical algorithms.

  • Means (average) : X = 9Y = 7.5
  • Variance(population Variance): X = 11Y = 4.122
  • Regression: Y = 3.0 + 0.5x

When you look at the data, it’s the same thing. But if you visualise it, it will look different

  1. The human brain is limited in its ability to remember. In fact, when we observe objects, our brain, like a computer, has long-term memory (memory hard disk) and short-term memory (cache). As long as we allow words, objects, etc. from short-term memory to be consolidated over and over again, they can only enter long-term memory. Many studies have shown that texts and pictures are more effective in helping us remember and are more interesting and easier to understand when we are understanding and learning.

What visualization tools are common in front-end development

Those who work in the Data department or do work related to Data must be familiar with visualization. Common scenes include large screen, 3D display and so on. Similarly, at the present stage, a variety of visualization schemes emerge at the front-end level. Here are a few of them:

  • Echarts, which runs smoothly on PC and mobile devices and is compatible with most browsers (IE 8/9/10), uses ZRender as the underlying rendering engine to provide intuitive, interactive and highly customizable data charts
  • Antv, a new generation of data visualization solutions of Ant Financial, is committed to providing a set of simple, professional and reliable data visualization best practices with infinite possibilities. It includes G (visualization engine), G2(visualization chart), G6 (Graph visualization engine), F2 (mobile visualization scheme), and L7 (geospatial data visualization).
  • D3, a JavaScript library that can manipulate documents based on data, follows existing Web standards and can run in modern browsers without any other framework.

How are the front-end visualizations drawn

Here we will briefly introduce 2d drawing schemes

  1. Canvas. Its bitmap-based image. It uses JavaScript program drawing (dynamic generation), provides more primitive functions, suitable for image processing, dynamic rendering and large data volume drawing. Advantages as follows
    1. High performance, you can control the drawing process.
    2. High controllability (pixel level)
    1. Constant memory usage (depending on the number of pixels)
  1. Svg. It’s vector based. Suitable for dynamic generation and easy editing.
    1. No distortion, zoom in and zoom out are clear.
    2. It’s cheap to learn, and it’s also a DOM structure.
    1. Easy to use, design software can be exported (icon is implemented in this way).

After listening to the above introduction, it seems that I have a certain understanding of visualization, but how is it drawn and how is the interaction done?

How to Implement Drawing (Canvas version)

Before we get to how to draw, how many technical terms do we know

  • Bounding box. Enveloping box is an algorithm to solve the optimal enveloping space of discrete point sets. The basic idea is to use slightly large and simple geometry (called enveloping box) to approximately replace complex geometric objects. Common enveloping box algorithms include AABB enveloping box, enveloping ball and convex hull FDH with fixed direction. Bounding box algorithm is an important method for collision interference detection.
  • Bezier curve is a mathematical curve used in two-dimensional graphics applications. It consists of a line segment and a node. The node is a draggable fulcrum, and the line segment is like a telescopic rubber band. Its calculation parameter formula is

  • Interpolation function, simply understood, is to complement the continuous function on the basis of discrete data, so that the continuous curve through all the given discrete data points.
  • B spline basis function. Make U = {u0, u1,… ,um} is a monotone undecreasing sequence of real numbers, i.e. UI <= UI +1, I =0,1… , m – 1. Where, UI is called node, U is called node vector, Ni, P (U) is used to represent the i-th p-th B-spline basis function, which is defined as:


The b-spline basis has the following properties:

    • Recursive sex
    • Local support
    • normative
    • differentiability

Before you get dizzy, let’s see how to draw a line with canvas

Draw a line

Lines are the most common graphical elements in visualization, the most common being line charts

A line is defined by a number of points, according to the connection between points and different ways, we can be divided into “broken line” and “curve”, in the visual rendering can be divided into “dotted line” and “solid line”.

Another way to think about it, we use lines to plot closed paths, to form closed areas, and then we get solid line area maps and radar maps, like this.

Now let’s see how to draw a line graph.

What is a line?

As we all know, a line is made up of points, two adjacent points are joined together to form a segment, and multiple segments are assembled to form a line, like this.

We can see that in terms of procedural thinking

    • The point has coordinates (x, y).
    • Segments have a starting point, an end point, a length, and an order
    • A line has segments and points

Implement line

Access to segment

The implementation of broken line splitting into segments is very simple. According to the point data passed in, two adjacent points are divided into a segment. Here’s a quick demonstration (write some logic)

GetSegment (points, defined) {segCache ← []; TotalLength please 0; for p, I pnext ← Points [I + 1] if pnext // Two points determine a segment call the corresponding function segment = CreateSegment // Cache data segCache ← // Calculate the length of the segment Segment. Length ← distance // Calculate totalLen ···· // Determine whether there is an empty segment if ··· // some logic // Return segment and total length}Copy the code

The implementation is very simple, iterating through the point data, initializing the segment object, and there is a logic to calculate the length of the segment, and we’ll talk about the length of the segment, but we won’t talk about how to calculate the length of the segment. There is a logic above to determine whether the segment is empty. This is done because in practical applications, some business scenarios need to hide certain segments. See the figure below

Draw line segments using canvas

Canvas provides two APIS — moveTo and lineTo. In specific operations, we need to call moveTo to position the brush to the starting point of the line segment, and then draw to the end of the line segment through lineTo. If multiple line segments end to end, we can ignore moveTo (the canvas stores the current context). LineTo directly.

In order to process empty segments, we need to set a start marker variable. If we are in the start state, we will moveTo the new point first, not lineTo

DrawLine (CTX) {defined ← false // Set the start flag (moveTo) lineStart for I ← 0 to len seg ← segCache[I]... If I = len lineEnd strokeLine else DrawSeg (seg, CTX) {if lineStart moveTo ···· drawLine} drawLine(x, y, CTX) {drawLine} drawLine(x, y, CTX) {drawLine} drawLine(x, y, CTX) {drawLine} drawLine(x, y, CTX) {drawLine}Copy the code

There may be a doubt about this one, I feel that it is more troublesome to divide the line into sections and draw it, because there is an extra step of disassembly. Why not directly connect the points? This division is equivalent to splitting different structures, so that the elements under each structure have their own customization, and the visual level may display different styles, etc. Through such flexible assembly, scalability is improved, and other advantages are also introduced in detail below. (For practical use, a red dotted line is used to indicate the predicted value)

Realize the curve

Bessel curve

Previously, we briefly introduced Bessel curves. Canvas also supports Bessel quadratic and cubic curves, and usually uses cubic Bessel curve drawing method. Let’s talk about it in detail.

Bezier curves are mathematical curves used in two-dimensional graphics applications. The number of Bessel curve points determines the order of the curve. Generally, n-1 order Bessel curve composed of N points is the second order with 3 points. In general, we want a curve that has at least three points, because the Bezier curve of two points is a straight line. In order, the first point is the starting point, the last point is the end point, and the rest points are the control points.

Let’s take a quadratic Bezier curve as an example

Quadratic Bessel curve

Points P0,P1,P2, P0, and P2 are given as the starting and ending points, with P1 as the control point. The arc from P0 to P2 is a quadratic Bezier curve.


Here, we want to quantify the drawing of the whole curve as the process from 0 to 1, using T as the progress of the current process, and the interval of T is 0 to 1. Each line needs to generate a point with respect to t, as shown below, a point moving from P0 to P1, that’s how the line goes from 0 to 1.

\

Let us restore the generation of a quadratic Bessel curve.

    1. So first we link P0P1,P1P2, and we get two line segments. Then we evaluate the progress t, such as 0.3, and take a Q0 point so that the length of P0Q0 is 0.3 times the total length of P0P1.

    1. And we take a little bit of Q1 on P1P2, so that P0Q0: P0P1 is equal to P1Q1: P1P2. And then we take a point B on Q0Q1, so that P0Q0: P0P1 = P1Q1: P1P2 = Q0B:Q0Q1

Now the point B that we have is one of the points on the quadratic Bezier curve, and if we start with t equals zero, and we interpolate increments, we get a series of points B, and we connect them together to form a complete curve 21

Finally, through data derivation, we got the quadratic Bessel curve formula (we won’t do the specific derivation, if interested, go to Baidu to see)

Cubic Bezier curves

The cubic Bezier curve consists of four points, and the upper points of the curve are determined by more iterative steps

Draw bezier curves using canvas

Draw three bezier curves on canvas using the bezierCurveTo() method. The parameters can be found on MDN and are not listed here.

Spline curves and acquisition segments

Having seen how to draw a cubic Bezier curve, let’s go back to the actual scenario where a line graph is generated by connecting a number of points. However, using only the functions provided by Canvas cannot meet this requirement. When we drew polylines earlier, we proposed the concept of segments. If we split a complete curve into segments, each of which is a cubic Bezier curve, the problem seems to be solved. The problem then becomes how to generate multiple Bessel curves and connect them smoothly. When we introduced the concept above, we proposed a spline curve, and maybe you didn’t understand it, but it was kind of abstract. Simple will be a set of points, divided into a number of curves, each curve at the junction points can be smoothly connected, into mathematical terms that means that the connection points have consecutive first and second derivatives and the first and second derivatives are the same. Now let’s look at 🌰

The diagram above is made up of several cubic Bezier curves, and before we can divide it, we need to determine several parameters

    • The beginning and end of each cubic Bezier curve
    • Two control points for each cubic Bezier curve

Only when we choose the appropriate starting point, end point and control point can the two adjacent curves join smoothly. There are many splitting algorithms that I won’t go into here, but we implemented the Curves interface that can use D3-Shape directly. The following uses the Basis algorithm to implement the use case, we briefly understand

GetSegment (points, defined){segCache ← [] totalLen ← 0 if points. Len < 3 getSegment start, end, controlL1, Controll2 for I ← 0 to points. Len-2 first ← points[I] second ← points[I + 1] third ← points[I + 2] if I = 0 start ← First else start ← end // Calculate the starting point, end point, control point // calculate the length // complement the last point}Copy the code

This logic is also relatively simple, loop to the point, from the current index position back three points, according to the three points and the start point of the current segment to calculate the end point and control point. The beginning of each new segment is the end of the previous segment. But the current loop logic does not compute the last point, so it loses a paragraph and adds a separate logic at the end.

The calculation of some

We use a simple formula to calculate the values of each point (the formula is derived by combining the first and second orders of b-spline curves and cubic Bezier curves at the end points), which does not introduce the derivation of the specific formula.

if(i ===0){
	start = first
} else {
	start = end
}
end = Point((first.x + 4 * srcond.x + third.x) / 6, (first.y + 4 * second.y + third.y) / 6)
controll1 = Point((2 * first.x + second.x) / 3, (2 * first.y + srcond.y) / 3)
controll2 = Point((first.x + 2 * second.x) / 3), (first.y + 2 * second.y) / 3 )
Copy the code

Curve segmentation and length calculation

That doesn’t sound like an easy thing to do. Since bezier curves are interpolating functions, the calculation can only be done by cutting the curve first, then calculating the approximate length of the curve that is small enough, and then summing it up. It’s a bit of a calculation, but the gods have given us an idea portal

  1. Find the connecting points. Suppose I want to split the current curve into two curves at t=0.25, first we need to know the position of point B. Just plug it in according to the formula.
  2. Get control points. After we get point P, which is the end of the first segment, and the beginning of the second segment, we need to calculate the control points. Based on mathematical logic, we can draw the following conclusions:
    • The trajectory of the first control point of the first curve is line segment P0P1, which is linearly related to T
    • The trajectory of the second control point of the first curve is line segment Q0Q1, which is linearly related to T
    • The trajectory of the first control point of the second curve is line segment Q1Q2, which is linearly related to T
    • The movement trajectory of the second control point of the second curve is line segment P2P3, which is linearly related to T

According to the above conclusion, splitting is simple. (This code is a little long, so I won’t write it.)

  1. Length calculation. We can split the cubic Bezier curve at any position, and with dichotomy, control the number of iterations, and approximate length calculation function, we can get the length value that we want. (No code written either)
  2. To obtain. Now we need to deal with the special logic of the last point, where the second point and the third point are represented by the last point.
First ← Points [I-2] Second ← Points [I-1] Third ← Points [I-1] Start ← End End ← Third ··· ··Copy the code
  1. Curve drawing. Now all you need to do is call the Canvas API to draw lines.

What to do with animation

So we were left with the question, why do we need to compute the length?

Now that we’ve drawn the line, how do we animate it with a few changes? We can see that regardless of whether we have a line or a curve, we have a lot of segments, and those segments depend on t.

plan

The essence of the animation is one part of the drawing area, within a certain amount of time we will be the whole line area into interval [0, 10], start a cycle, each time the drawing update t value, in the above circulation drawing segment code, the whole diagram of t into each section of the internal t value, period of internal according to t value of its own, Just draw the part that should be drawn.

Since we have calculated the length of each segment and the total length, the proportion of each segment can be calculated, which can then be converted to the t value of the whole line graph. (This idea is actually local drawing)

However, for the area graph, it is actually divided into two groups of segments to draw. When drawing, we will find that the appropriateness in the x direction is not synchronized at the same T. For example, when drawing the first segment, calculate the interval that should be drawn for the first segment, and finally fill the closed interval between the upper and lower segments. However, there is a problem. If the same T is inserted into the functions of different groups of segments, the x value generated will be different, then the drawing effect will be wrong, and the section will be oblique.

The way to solve this problem is to invert the t value in terms of x or y, and plug it into the target function. This is another big problem for cubic Bezier curves, which I won’t cover here because of space constraints and the complexity of the code implementation (I don’t know, but there are places where I can).

interaction

Interaction is nothing more than a point and touch, but from the above we know that there are so many points on a line, how do we know that the mouse triggered the point?

Canvas pickup scheme

When drawing, Canvas will not save the information of the drawing. Once the drawing is completed, the user is actually a picture composed of countless pixels in the browser, and the user cannot obtain the clicked graph from the API of the browser when clicking. Common pickup schemes are as follows:

  • Use the cache Canvas to pick up graphics by color
  • Use Canvas’s built-in API to pick up graphics
  • Use a geometric bounding box
  • Mix up the above ways

Each of the above pickup schemes has its own advantages and disadvantages. The following is a detailed introduction of the implementation of various schemes and some problems. Finally, compare the performance.

Use the cache Canvas scheme

The steps of picking up graphics using cached Canvas are as follows:

  • Draws graphics on the displayed Canvas
  • Redraw all the graphics on the cached (hidden) Canvas, using the graphics index value as the graphics color to draw the graphics
  • Click on the displayed Canvas to obtain the corresponding pixel point on the cache Canvas and convert the color of the pixel into a number, which is the index value of the graph
Advantages and disadvantages analysis:
  • advantages
    • Simple implementation, only need to draw the graph two times
    • Good pick performance, core pick algorithm complexity O(1)
  • disadvantages
    • Double rendering overhead
    • The getImageData() method is expensive to fetch cached data when the canvas is too large, reducing the benefits of quick pickups
Suitable scenes and unsuitable scenes
  • Suitable scene
    • Scenes with large numbers of graphics and infrequent redrawing
    • Scenes that support partial refreshes work better
  • Inappropriate scenes
    • Scenes with frequent animations, double the rendering overhead and the overhead of retrieving cached data can degrade performance
    • The advantage is not obvious when the amount of graph data is small
Performance testing
  • Draw and display 10000 graphics in 6ms
  • The graphics in the cache are 14ms, adding overhead to convert numbers into colors
  • GetImageData () takes 14ms to get the cached image data
  • The overhead of graph pickup is 0.1ms

Use the built-in API

Canvas tag provides an interface isPointInPath() to obtain whether the corresponding point is inside the drawing graph. The steps are as follows

  • Draw all graphics
  • When picking up, the isPointInPath() method is called to determine whether the point is in the graph.
The advantages and disadvantages:
  • advantages
    • Simple implementation, only use Canvas native interface
    • Does not slow down the first render
  • disadvantages
    • Poor performance, each detection must go through the drawing of the graph
    • It can only detect if it is surrounded, not if it is online
Suitable scene
  • The number of graphs is very small < 100
  • It can be used together with bounding box detection and quadtree detection
Performance testing
  • The time of picking up 10000 graphs is 2000ms

Geometric bounding box detection scheme

We started with the enveloping box, but now we have a place to use it

The graph drawn on Canvas is standard geometric graph. The detection of point, line and plane is relatively mature in geometric algorithm. Each graph will generate a bounding box and save it when drawing, and data operation can be directly used for detection when picking up graph.

The detection process is as follows:

  • Detect all shapes in reverse order
  • Checks if the point is in the bounding box of the graph, if not, return false
  • If the graph draws a line, determine if it is on line
  • If the graph is filled, it determines whether it is surrounded
The advantages and disadvantages:
  • advantages
    • The graph detection algorithm is mature
    • The idea is clear and the optimization potential is large. Detection performance can be optimized through various caching mechanisms
    • Graphics rendering performance will not be affected
  • disadvantages
    • The implementation is complicated, especially the detection performance of some Bessel curves and non-closed curves is poor
    • In a scene with a large number of layers, there is a transform on each layer, and matrix operation greatly reduces the performance of operation
Suitable scene
  • Wide range of use
Performance testing:
  • Detection performance of 10000 points is 5-20ms

Mixed pick up

In the application process of the example, instead of using a certain pick scheme, a variety of pick schemes are usually mixed, which can be roughly divided into the following schemes:

  • Bounding box + Cache Canvas: When using the cached Canvas, the size of the cached Canvas needs to be consistent with the size of the original Canvas. However, it is possible to create only 1*1 cached Canvas, and first calculate whether it is in the bounding box of the graph. All the graphics containing the pick point are drawn on this one-pixel canvas (translate is required to center the canvas at the pick point), and then color is detected on this pixel.

Note: This promiscuous mode is not faster than the simple geometric algorithm for picking up “circles” and “rectangles”.

  • Bounding box + isPointInPath: Simple graphics use geometric algorithms, complex and filled graphics can be detected using bounding box detection and Canvas built-in isPointInPath.

conclusion

The scheme selection when picking up graphics on Canvas is closely related to the user’s scenario, and different scenarios apply different schemes:

  • The isPointInPath method can be used directly in scenarios (mobile) where the number of graphs is small and precise pickup is not required
  • In the case of infrequent refresh of Canvas and large amount of graphics, it is suitable to use the method of cached Canvas
  • Picking schemes using geometric algorithms are suitable for almost all scenarios, but need to work with various caching mechanisms and pay attention to the overhead of matrix multiplication
  • The above methods can be mixed, and the optimization of pickups is endless, but satisfying the requirements is enough.

conclusion

The above full text introduced what is visualization, and then we analyzed the realization scheme of line graph and interactive realization of graph. In conclusion, visualization exists all the time around us and seems to be full of mysterious color, but we find that it is not difficult to achieve visualization after careful study. If there is something wrong with the above process, please criticize and correct it.

This article refer to

  1. G Render engine documentation
  2. Bessel curve
  3. ByteCharts implementation documentation
  4. BizCharts
  5. D3