LOD expressions are commonly used in the field Of data analysis. Their full name is Level Of Detail.

Intensive reading

What is the level of detail and why is LOD needed? You are bound to have this question, and we will answer it step by step.

What is the level of detail

Try asking: How detailed are your numbers?

The answer might be:

  1. The e figures are aggregated, sorry I can’t see the details, but if you happen to want to see the total sales, they’re all summed up here.
  2. In more detail? So this is the raw table, 3 billion tables. Is that detailed enough? If that’s not enough detail, the business process has to be broken down and buried again.

The higher the level of detail, the greater the amount of data, the lower the level of detail, the less data, the more aggregated data.

It is difficult to see valuable information in 3 billion records, so the process of data analysis can also be regarded as a process of summarizing and calculating data, which is gradually reduced in detail.

Detail level of BI tools

Without LOD expressions, the level of detail of a BI query is completely fixed:

  • If the table drags in measures and has no dimensions, that is the highest level of detail, because only one record will end up being summarized.
  • If the line chart drags into a dimension, the result is aggregated measures within that dimension, with more detailed data, granularity for the current dimension, such as date.

If we want more detailed data, we need to drag in more fields on the dimension until we reach the most detailed item-level granularity. However, it is impossible for the same query to contain different detail granularity, because the detail granularity is determined by the combination of dimensions and cannot be changed, as in the following table example:

Row: country, province and city column: GDPCopy the code

In this example, the detailed level is limited to the summary at the level of city, so fine-grained data under the city cannot be seen. Every piece of data is city-grained, so it is impossible for us to see GDP summarized by country or monthly GDP information with more detailed granularity in the query result. Moreover, it is impossible to calculate the urban GDP together with the national GDP to calculate the proportion of urban GDP in the country.

However, requirements like the above example are numerous and common, and BI tools must come up with a solution, hence LOD: LOD is an expression that allows us to describe different levels of granularity in a single query.

Level of detail in terms of expression evaluation

Expression evaluations must be limited to the same granularity of detail. This is the iron law. Why?

Consider the following two tables with different granularity:

Total sales:

10000
Copy the code

Sales by city:

Beijing 3000 Shanghai 7000Copy the code

If we want to calculate the contribution ratio in the sales of each city, then we have to write the formula of [sales by city] / [total sales], but obviously this is impossible, because the former has two data, and the latter only has one data, which is impossible to calculate.

All we can do is have the same number of rows, so we can do IF ELSE, CASE WHEN, and addition, subtraction, multiplication, and division in line granularity.

LOD gives us the ability to compute across granularity, essentially unifying the granularity of data, but we can have a column of data come from a completely different level of granularity:

City sales total sales Beijing 3000 10000 Shanghai 7000 10000Copy the code

Like a chart, LOD can process the data so that total sales are added to the end of each row, even though the granularity is different from the city details, so that it can be calculated.

So loDS can be calculated at any level of detail, “fitting” the final output into the level of detail of the current query.

LOD expressions are divided into three capabilities, namely, FIXED, INCLUDE and EXCLUDE.

FIXED

{fixed [province] : sum([GDP])}Copy the code

The DGP of each province is calculated according to the fixed detail granularity of city, and finally merged into the current detail granularity.

If the query granularity is now province, city, then the LOD field add logic as shown below:

It can be seen that the essence is the result of join after two different SQL queries. The internal sum represents the aggregation method in FIXED expression, and the external sum represents the aggregation method if FIXED detail level is lower than the current view detail level. In this example, FIXED has a higher detail level, so sum has no effect, and avG has the same effect, because the detail level of the merger is one-to-many relationship, and the many-to-one relationship needs aggregation only in the case of the merger.

Outermost aggregation typically comes into play in INCLUDE expressions.

EXCLUDE

{exclude [exclude] : sum([GDP])}Copy the code

In the current query granularity, GDP is calculated after excluding the city granularity, and finally merged into the current detailed granularity.

If the query granularity is now province, city, season, then the logic for adding LOD fields is as follows:

As shown in the figure, EXCLUDE excludes some dimensions from the current view’s level of detail, and the resulting level of detail is bound to be higher.

INCLUDE

Include: avg([GDP])}Copy the code

In the current query granularity, the additional granularity of urban and rural areas is added to calculate GDP, which is finally merged into the current detailed granularity.

Such examples are difficult to understand, and generally have no practical significance in the case of SUM, because there will be no difference in the calculation results, which can only be meaningful in a scenario similar to AVG. Let’s take a look at the following figure:

This is the problem of inaccurate AVG calculation, that is, the average value of calculation at different detail levels is different, but sum and count will not affect the calculation result with the change of detail level. Therefore, when avG calculation is involved, you can specify the calculation detail level through INCLUDE expression to ensure the accuracy of data caliber.

How to use LOD field

In addition to the above example, the LOD field is more commonly used as an intermediate calculation process, such as calculating the proportion of GDP of a province in China. Since LOD has merged the results of different detail granularity into the current detail granularity, the calculation expression is as follows:

Sum ([GDP])/sum({fixed [country] : sum([GDP])})Copy the code

The LOD expressions that follow are only logically evaluated at the specified granularity, but ultimately remain consistent with the current view’s granularity, so they can participate in the calculation.

We will continue to read tableau’s Top 15 LOD expression business scenarios to understand LOD expressions more deeply.

conclusion

LOD expressions let you easily create computed fields that are “detached” from the current view’s level of detail.

You might wonder, why not actively change the current view detail level to achieve the same effect? Like adding or subtracting a dimension.

The reason is that LOD is often used for calculation across the level of detail, such as calculating the proportion of the part to the total, calculating whether the current record is the first order of the user, etc. More scenarios will be interpreted in the next intensive reading.

What is LOD expression · Issue #365 · dt-fe/weekly

If you’d like to participate in the discussion, pleaseClick here to, with a new theme every week, released on weekends or Mondays. Front end Intensive Reading – Helps you filter the right content.

Copyright Notice: Freely reproduced – Non-commercial – Non-derivative – Remain signed (Creative Commons 3.0 License)