Optimization of coordinate data distribution region in cartesian coordinate system

Coordinate axis is a very important part of cartesian coordinate system type chart. The action of the horizontal and vertical axes and scales in the coordinate system produces a data space. Generally speaking, the data space will be determined according to the distribution of data, and the data should be reasonably distributed in the data space, so as not to make the data space have too much blank. A good data space should satisfy at least two of the following conditions:

  • The distribution of data should be fitted as accurately as possible so that display area is not wasted, and data should not be displayed outside the coordinate axis.
  • The data space shall reflect the information of the data according to the characteristics of the data or graphics.

On the basis of these two basic principles, many different value orientations can be derived. In bar charts, for example, the height of the rectangle represents the size of the number. However, the height ratio between different rectangles reflects the difference between different groups of data, which is also a very important information in the data.

For example, in the following three year-sales charts, the gap between different data can be seen directly in Figure 1.If the new 1963 figure is above 160, it intuitively gives the impression that the market was depressed from 1959 to 1962, and sales soared in 1962, doubling the previous year, but this information is actually inaccurate.

Therefore, if the data is all distributed on the same X-axis, the Y-axis will generally start at 0 (unless the cardinality of the comparison is not 0). If the data is distributed on both sides of the X-axis, the rectangle will also start at 0.

The line chart, in addition to the basic values, focuses on the trend of the data, which is also the information contained in the data. Improper y axis may make such information not obvious, or even destroy such information and mislead viewers.

For example, in the figure above, the range of data distribution is relatively small. Intuitively, the sales data fluctuates around 170 without obvious trend change. However, if the characteristics of fluctuations are amplified after processing, it can be found that the data on the whole has a trend of rising first and then falling.

Therefore, it is not appropriate to use 0 as the starting point of the Y-axis when the original distribution of the data is very far from 0. But this is not always the case. The data itself, you might want the Y-axis to start at 0.

The coordinate axis has three elements

There are three elements to determine the axis:

  • Starting and ending points (maximum and minimum)
  • Dial number
  • Step size (spacing between ticks)

The scale number and step size can be calculated from each other when the maximum and minimum values are known.

Step =(max-min) ÷ TickCount

In general, the desired scale number is easy to calculate. Too many scales actually make the axes look crowded. Antv-g2 defaults to 5, so let’s temporarily set TickCount to 5. The maximum and minimum values of the data can be easily obtained by traversing the data. If we stop here, we can already dynamically determine the axes based on the characteristics of each data set. But the results were not as positive as we had hoped.

For example, the maximum and minimum values of a set of data in a line chart are respectively 24,102. Then the step size calculated according to the above formula is step = 15.6, and the scale numbers are:

24,39.6, 55.2, 70.8, 86.4, 102

Obviously, the scale calculated directly in this way is not very readable. So in this link, the standardization of step size is the key.

Step size normalization

After calculating the original step size, we want to come up with an approximate, but normalized step size, such as integers 10, 20, 25. It can be calculated in the following way.

const getStandarInterval = (t: number) = > {
  if (t <= 0.1) {
    t = 0.1;
  } else if (t <= 0.2) {
    t = 0.2;
  } else if (t <= 0.25) {
    t = 0.25;
  } else if (t <= 0.5) {
    t = 0.5;
  } else if (t < 1) {
    t = 1;
  } else {
    t = getStandarInterval(t / 10) * 10;
  }
  return t;
};
const rawTickInterval: number = (max - min) / tickCount;
// Calculate the order of magnitude
mag = Math.pow(10.Math.floor(Math.log10(rawTickInterval)));
if (mag == rawTickInterval) {
  mag = rawTickInterval;
} else {
  mag = mag * 10;
}
tickInterval = Number((rawTickInterval / mag).toFixed(6));
// Select the specification step size
const stepLen = getStandarInterval(tickInterval);
tickInterval = stepLen * mag;
Copy the code

First, we normalize and give an array of standard step sizes [0,1, 0.2, 0.25, 0.5, 1].

Then calculate the order of magnitude of the original step size and normalize the original step size. For example, if 15.6 => 0.156, the order of magnitude is 100

Then standardize the normalized step size, 0.156 => 0.2.

Then restore the standardized step size to the original order of magnitude, the standard step size stepLen = 0.2*100 = 20.

When the step size is standardized, the rule of approximation is not approximation, but round up. Shrinking the step size may result in incomplete data presentation within a given number of scales.

After capturing the standardized step size, only one of the three elements is identified.

Reverse the maximum, minimum, and scale number

All is not well with standardized step sizes. If we directly use the maximum and minimum values of the original data to draw the coordinate axis, there are still some problems. For example, if the original minimum value is 3.123, all scales will have such decimal numbers.

In general, we always have 0 and this point is in the scale. That is, the maximum and minimum value should be an integer multiple of the step size!

There are many factors to consider when determining the maximum and minimum values of the axes:

  1. Whether the data is all distributed on the same side of the X-axis, that is, whether the data is all greater than or less than 0.
  2. If all data are on the same side, do you need to draw the X-axis from 0 (minimum/minimum is mandatory to 0)? The bar chart generally needs to start from 0.

In any case, we can first determine the minimum value, which should be just less than the original minimum value (or 0) and an integer multiple of the step size. After determining the minimum value, the step size is incremented upward from the minimum value until it is just greater than the original data maximum value. The maximum will also be an integer multiple of the step size.

let tempmin = 0;
if (min < 0) {
  while(tempmin > min) { tempmin -= tickInterval; }}else {
  while (tempmin + tickInterval < min) {
    tempmin += tickInterval;
  }
}
min = tempmin;
let tickCount = 1;
while (tickCount * tickInterval + min < max) {
  tickCount++;
}
max = tickCount * tickInterval + min;
Copy the code

In this process, the real tickCount is determined, because when the step size is standardized, the step size is properly enlarged, and the final scale number may be smaller than the given scale number. In order to avoid excessive blank space on the chart, the final scale number is determined according to the actual data space. This completes a simple process of calculating standardized scales from the data.

In this calculation process, there are many places that can be adjusted. For example, if you do not mind that the tickCount calculated at the end is larger than the tickCount input and prefer a slightly denser number of ticks, you can adjust upward to take the nearest standard step in the process of obtaining the standardized step size. You can also make the array of standardized step sizes denser to better approximate the original step size. And whether to start at zero and so on.

The distribution of different grouping data varies greatly

According to the above method, the Y-axis display of a set of data can be optimized, but if the chart shows two sets of data and the distribution of regional differences are relatively large, one axis cannot cover all the data. I can set up a y-sub axis. The above method can be used to calculate the scale of the two coordinate axes respectively. The only problem is that the scale number is uncontrollable. It is possible that the calculation results of the two sides are not consistent with the scale number, which will lead to the uneven scale lines on both sides.

You can extend the small scale of the axis, so that the two axes of the scale number is consistent, to achieve a more beautiful, harmonious display results.

The complete code

const getStandarInterval = (t: number) = > {
  if (t <= 0.1) {
    t = 0.1;
  } else if (t <= 0.2) {
    t = 0.2;
  } else if (t <= 0.25) {
    t = 0.25;
  } else if (t <= 0.4) {
    t = 0.4;
  } else if (t <= 0.5) {
    t = 0.5;
  } else if (t <= 0.6) {
    t = 0.6;
  } else {
    t = 1;
  }
  return t;
};
/ * * *@param Min Indicates the minimum value *@param Max Indicates the maximum value of data *@param TickInterval Indicates the scale interval */
const calcR = (min: number.max: number.tickInterval: number) :number.number.number] = > {let tempmin = 0;
  if (min < 0) {
    while(tempmin > min) { tempmin -= tickInterval; }}else {
    while (tempmin + tickInterval < min) {
      tempmin += tickInterval;
    }
  }
  min = tempmin;
  let tickCount = 1;
  while (tickCount * tickInterval + min < max) {
    tickCount++;
  }
  max = tickCount * tickInterval + min;
  return [min, max, tickCount];
};
/** * Get the Max/min distribution * that conforms to bizCharts' built-in logic@param List Data list *@param TickCount Maximum number of ticks *@param StartWith0 Indicates whether the maximum or minimum value starts from zero */
export function standardRange(
  list: number[],
  tickCount = 5,
  startWith0 = true
) :number.number.number] {
  list = list.map(i= > (isNaN(Number(i)) ? 0 : Number(i)));

  const log10 = (n: number) = > Math.log(n) / Math.log(10);
  let max = Math.max(... list);let min = Math.min(... list);// All data are distributed on the same side of the X-axis and need to be calculated from zero
  if (startWith0 && min * max >= 0) {
    min = min > 0 ? 0 : min;
    max = max < 0 ? 0 : max;
  }
  if (max === min) {
    const t = Math.abs(min);
    const mag = t == 0 ? 1 : Math.pow(10.Math.floor(log10(t)));
    return calcR(min, max, mag);
  }

  // Scale interval length, and length order of magnitude
  let tickInterval: number.mag: number;
  const rawTickInterval: number = (max - min) / tickCount;
  // Calculate the order of magnitude
  mag = Math.pow(10.Math.floor(log10(rawTickInterval)));
  if (mag == rawTickInterval) {
    mag = rawTickInterval;
  } else {
    mag = mag * 10;
  }
  tickInterval = rawTickInterval / mag;

  // Select the specification step size
  const stepLen = getStandarInterval(tickInterval);
  tickInterval = stepLen * mag;

  let res = calcR(min, max, tickInterval);

  if (res[2] > tickCount) {
    // If the calculated scale is larger than the calculated value, the step size is increased by one step
    tickInterval = getStandarInterval(stepLen + 0.1) * mag;
    res = calcR(min, max, tickInterval);
  }
  return res;
}
Copy the code