Close reading of The Front End and BI

Introduction to the

Business Intelligence, or BI, is the use of data mining and analysis to find insights for Business success.

A complete BI link includes data collection, data cleaning, data mining and data presentation, and its essence is multidimensional analysis of data. The main work of the front end is in the data display link. Due to various display methods, complex analysis model and large amount of data, the complexity of the front end is very high.

It is very challenging to be in the front end of BI. Developers need to fully understand the concept of data, while the visual site construction with high complexity is only the basic ability of BI. To build the upper ability of BI, such as exploratory analysis and data insight, more complex computing model needs to be introduced in the front and back end.

This article serves as an introduction to briefly introduce the author’s BI experience. If there is an opportunity to write a series of articles later to elaborate on the details.

Intensive reading

At present, China is in the BI 1.0 stage, that is, the reporting stage, so the author will elaborate on the core development concept of BI in this stage.

BI 2.0 exploratory analysis stage is the most advanced field of data analysis in China. This part will be shared after the development is completed.

The core concepts of BI 1.0 phase include data set, rendering engine, data model and visualization.

The data set

A data set is a collection of data, which in BI field refers to a standardized data structure.

Any data can be packaged into a dataset, such as TXT text, Excel, mysql databases, and so on.

The basic form of a data set is a two-dimensional table, where column heads represent fields and each row is a piece of data. Data presentation is based on multidimensional analysis of these data fields.

Data set import

Generally speaking, there are two ways to import data sets, namely local file upload and database link. Local file upload is divided into various file type processing, such as excel parsing, and possibly data cleaning; Database link analysis visual import and SQL input.

Visual import needs to analyze the structure of the database in advance, draw the table structure and field structure, without understanding SQL can also carry out visual operation.

For SQL input, web code editors such as Monaco-Editor can be used as input boxes, preferably in conjunction with intelligent prompts to improve SQL writing efficiency. For details about SQL Intelligent Hints, see The previous intensive reading manual SQL Compiler – Intelligent Hints.

Data set modeling

Data set modeling generally includes dimension measurement modeling, field configuration, and layer modeling.

Dimension measurement modeling requires intelligent analysis of whether a field belongs to a dimension or a metric. Generally, the field type is determined intelligently based on the actual value or field name of the field. If the field type has been stored in the database information, it can be classified 100% accurately.

Field configuration means that you can add, delete, or modify a field. You can also add an aggregate field or a comparison field.

An aggregate field encapsulates a field expression into a new field, which also uses a simple SQL editor that supports four operations, field hints, and a combination of some basic functions.

The comparison field means that the newly added field is based on the comparison of the existing field in a certain time period. For example, the year-on-year comparison of the UV field can be encapsulated as a comparison field. Comparing fields is not difficult in front end technology, just understanding the concept.

Rendering engine

The rendering engine includes an engine for editing and rendering reports, which can theoretically be combined into one.

The important modules of the rendering engine include: drag-and-drop, component editing, and event center.

In fact, drag and drop includes a series of technical points from component custom development process to CDN release, CDN loading, component drag and drop, canvas layout and so on. There are endless details in each point expansion, but fortunately, this function belongs to the basic function points of general website construction, so this paper will not repeat it.

In component editing, the basic attribute editing belongs to the category of form model in the field of general site construction, and the general form is described by UISchema. Another part of component editing is data editing, which is covered in more detail in the data Model section below.

The center of the event is the render engine section, which needs to be disabled in the edit state. This feature can realize data capabilities such as chart linkage, roll up and drill down. A general event center generally includes two parts: event triggering and event response. The basic structure is as follows:

interface Event {
  trigger:
    | {
        type: 'callback';
        callbackName: string;
      }
    | {
        type: 'listener';
        eventName: string;
      }
    | {
        type: 'system';
        name: string;
      };

  action:
    | {
        type: 'dispatch';
        eventName: string;
      }
    | {
        type: 'jumpUrl';
        url: string; }}Copy the code

Trigger refers to the event trigger, including basic system events, such as timers or automatic initialization trigger. Component callback such as when a button is clicked; For example, when another event is raised, the event may come from an action.

Action refers to the event response, including the basic event trigger dispatch, which can trigger other events and form an event link. Other actions are data related, which can be used for conditional linkage, field linkage, data set linkage, etc., because different implementations are not introduced here.

The event mechanism also needs to support value passing, meaning that the value of the event triggering source can be passed to the event responder. Value passing can occur within the triggering source, such as when the triggering source is a callback function, the function arguments are passed as values, and the triggering source is passed by… Received in ARGS mode.

Drilling data

All fields configured with layers can be drilled. Layers can be configured in a data set or in a report edit page, and can be understood as a sequential folder. When a folder is used as a field, the first child takes effect by default, and then you can drill down in order.

For example, the “region” hierarchy includes countries, provinces, cities, and districts, so data can be rolled up and drilled down this hierarchy.

If a field is a hierarchical field, the chart needs to have a corresponding operation area to scroll up and drill down, and the data editing area can do the same. The calculation process of data driller is not handled inside the chart, but after a state is triggered, the rendering engine changes the state of the hierarchical field instance to driller down to layer NTH, and each driller gets one more column of data, which is displayed by the chart component for driller down.

Generally speaking, the data is still full after drilling. Sometimes, in order to avoid too much data, such as clicking a column in the bar chart for drilling, I just want to see the data after drilling of this column: For example, for the data of 2017, 2018 and 2019, the amount of data in driller is 3 x 12 = 36. However, if driller is only conducted in 2019 and only wants to see 12 data in 2019, the mode of driller + screening conditions can be transformed: After the global drill-down is expanded 36, click drill-down on 2019 and add a filter condition (year = 2019) to achieve the effect. The whole process is not aware of the chart components.

The data model

Corresponding to the general form model Uischaema, the author calls the data model CubeSchema, because the multidimensional data processing model in THE BI field is a Cube, and the data configuration means how to query the Cube, so the configuration form is Called CubeSchema.

The basic concepts of the data model are common in both exploratory analysis and the BI 1.0 reporting phase (exploratory analysis fixes rows and columns and adds tags) : place fields into different areas that can be divided by function: horizontal and vertical; According to concept: dimension, measure; According to the exploration analysis ideas: solidified into rows, columns and so on.

This may involve technical points such as: drag and drop, batch selection + drag and drop, automatic addition by dimension measure after double click, automatic migration of area fields after chart switch, series configuration of field drag: limit number, limit type, limit data set, duplicate or not, etc.

React-beautiful-dnd and other libraries can be used for drag and drop, which is basically similar to the rendering engine drag and drop scheme. In case of hierarchical data sets, nested hierarchical drag and drop should be supported.

Field migration after chart switch, you can set several types for each drag region:

{
  "dataType": ["dimension"]}Copy the code

In this way, the fields of the dimension type can be automatically migrated to the dimension type area after the switchover. If the number of fields in the corresponding area reaches the limit, the fields are continued to be filled to the next area until the fields are used up or the area is filled up.

If in an exploratory analysis scenario, you need to model the dimension measurement of the field in advance, you can process it according to the diagram when switching. Such as line chart: switch to form a line chart is a natural dimension (spindle) + N measure scene, form is a natural two dimensions (row, column) + 1 measurement scenarios (also can support multiple, the cell segmentation again can), then switch from the line chart to form, measure will fall to markup text area; If you switch from a table with rows and columns to a bar chart (you can’t switch to a line chart because the metrics of a table are generally discrete, whereas the metrics of a line chart are generally continuous), the row and column fields of the table fall on the dimension axis of the bar chart, acting as a drill down on the dimension axis.

Read more about exploratory analysis in Tableau Exploratory Models.

The data model also includes configuration related to data analysis, such as setting comparison fields, or analysis capabilities such as mean lines. These data calculation work is put in the back end, the front end needs to collate the configuration items into the fetch interface, and present them in a data-driven way.

For the analysis function of “extended fields” such as comparison fields, the general fetch interface can be expanded. The chart component has no perception, which is equivalent to adding several hidden fields. The case chart component that operates on standard data, such as removing special values, also does not need to be aware.

Clustering, mean lines and other parts that need to be displayed by the chart component are abstracted into a set of fixed data formats transparently transmitted to the chart component, which is processed by the chart component itself.

It can be seen that both are numbers + display, the difference between common front-end business and BI business development:

Common front-end business is business logic as the core, according to the business needs to determine the interface format; BI business is data-centered, and a fixed set of interface format is determined around the data calculation model. The number taking does not depend on components, and all components have corresponding display of standard data.

visualization

Different from ordinary visualization components, BI visualization components need to be connected to CubeSchema model, and also support big data performance optimization, boundary data display optimization, and interactive response.

Docking CubeSchema means to uniformly docking data of two-dimensional tables. Most of the components are displayed in structures above two-dimensional, so it is not difficult to docking. Some components of one-dimensional data structures, such as single index blocks, need to abandon one dimension and determine a set of rules.

Although the calculation model is based on Cube N dimensions, components can be expanded in multiple dimensions through the standard axis, or drilled down to achieve similar effects. For line charts, the meaning of axis is limited, and multidimensional data can be displayed in a faceted way. Of course, some components are only suitable for displaying a certain amount of dimension data.

Big data performance optimization

Visualization components need to be particularly focused on performance optimization, as the volume of data that BI queries can produce can be very large, especially for multi-layer drilling or geoly-based data.

Technical means include GPU rendering, cache canvas, multi-threaded operation, etc. Business means include data sampling, on-demand rendering of visual areas, limiting the number of data items, etc.

Boundary data display optimization

You never know what data the data set will give you, so BI boundary cases can be very numerous, and points can be very dense, or some data can be lost and render anomalies can occur. The chart component needs to use avoidance algorithms to break up or color dense data for easy reading, as well as a protective completion mechanism for missing anomalous data.

Interactive response

Interactive operations such as scroll up and drill down, click, circle, highlight, etc. are fed back to the rendering engine to cause data changes and feed new data into the diagram component.

These interactions are not complex in business logic. The difficulty lies in whether the visual library used has the capability and how to unify the interaction behavior.

conclusion

The four directions of BI field: data set, rendering engine, data model and visualization all have many technical points that can be done deeply. Each piece needs several years of in-depth technical experience to do well, and a large number of talents are required to cooperate to do well.

At present, we are building an excellent BI tool for the future in Ali Data Center. If you feel challenged in THE BI field, you are welcome to join us at any time. Please contact [email protected]

The discussion address is: Close reading front End and BI · Issue #208 · dT-fe /weekly

If you’d like to participate in the discussion, pleaseClick here to, with a new theme every week, released on weekends or Mondays. Front end Intensive Reading – Helps you filter the right content.

Pay attention to the front end of intensive reading wechat public account

Copyright Notice: Freely reproduced – Non-commercial – Non-derivative – Remain signed (Creative Commons 3.0 License)