How to build automatic reporting tools in Node.js

The front end early chat conference, and nuggets jointly held. Add Codingdreamer into the conference technology group, win at the new starting line.

27th session | special front-end Flutter, understand the Web rendering engine | | UI framework performance optimization, 6-5 live in the afternoon, six lecturer (taobao, jingdong/idle fish, etc.), point I get on the bus 👉 (registration address) :

All the previous period has the full recording, the beginning of the annual ticket to unlock all at one time

The text is as follows

This article is the fifteenth session of the front-end early chat report special, but also the 107th early chat, from the song xiaocai – zhi ge share.

Is SQL programming?

How was SQL born

Before we talk about SQL as programming, let’s be clear about what SQL is, or how SQL was born.

In ancient times, the earliest database has two, one is hierarchical database, for today’s view can also be called tree database, since called tree database, and we are now in contact with the tree model.

So the downside of this data is obvious, that is, in actual relationships, many entities are not one-to-many, but many-to-many.

The other is the mesh model. Compared with the tree model, it can describe complex relationships in the real world, but the problem is relatively obvious: the more complex the network becomes, the more expensive it will be to maintain and use.

Later, an old doctor named Codd published a paper describing the relationship between data through relational algebra.

So let’s look at how this relationship model describes it.

So look is not a face meng force, so this threshold is very high.

But if we translate it like this, it’s easy to understand, and it’s basically close to our natural language.

These are the two key principles of Codd’s SQL model, relational algebra and relational calculus. One is set based operations, which in today’s SQL are called JOIN operations, and the other is tuple and field calculus, which are logical operations like > = <.

Relational database management system

Then System R was born, but since it was a lab product, DB2 was done by IMB and that’s a story from now on, but we can get a glimpse of where the first relational databases came from and how they were implemented.

My guess is a file system at the bottom, a relational model in the middle, parsers at the top, and SQL at the top, so we can conclude that SQL is a computer language that helps us manage database file systems.

Ok, so now that we know what SQL is, let’s see if SQL is out of date. After all, as a standard that’s been around for 30 or 40 years, it’s hard to know if it’s out of date, like the birth of NoSQL, because a lot of data is actually unstructured or semi-structured.

But if you look at the latest list of programming languages, you can see that SQL is still in the top 10, and I think it’s mostly simple, and then it’s declarative, which means that the person who writes SQL doesn’t have to worry about the underlying implementation logic, they just have to describe what they need, Consider the number of non-programmers around you who know SQL.

3. Architecture evolution and SQL

Next I want to talk a little bit about architecture evolution and SQL.

We know that all of the applications are growing up, such as a web site, the earliest time sites are also small, then the general architecture is services and database are put together, then slowly put the service and data separation, again in the future more users, server performance began to slowly to keep up with, the first is to solve by means of vertical scaling, To put it bluntly, upgrade the server configuration.

The disadvantages is also very obvious, because the server configuration the farther up, price will be more and more low, secondly it are limited, then gradually had micro service solution, but you service to multiple servers, service ceiling is raised, can use these services are all the same database, fail to improve the performance of the database is also no use, Then came distributed databases, and THEN OLAP.

Distributed databases currently have three architectures:

Share-disk: Each CPU uses its own private memory area and accesses all disk systems through internal communication.
Share-nothing: Each CPU has private memory area and private disk space, and two cpus cannot access the same disk space. The communication between cpus is over the network.
Share-memory: Multiple cpus that share the same memory communicate with each other on the interconnection network.

Both 1 and 3 represent a machine in the form of many machines, and the latter is a machine in the form of many machines in the form of one machine.

Then let’s talk about the relationship between distributed databases and SQL:

In 2003, Google first published a Paper called Google File System (GFS), which was about how to implement a distributed File System inside Google. Then in 2004 and 2006, It issued MapReduce and GigTable papers respectively. MapReduce is a distributed computing system, and BigTable is a distributed data management system. After Google released these three carts, people in the community were very excited, because they finally had a solution, and then they implemented it according to Paper, which gradually formed the current Hadoop ecosystem. The most important thing in this ecosystem is HDFS, which is the community version of GFS. A lot of other related things are implemented on this basis, like Kafka, Druid, etc.

At the beginning of this set of data search is through MapReduce to search data, but MapReduce is very complicated, you need to write a lot of code to achieve, many people feel troublesome, this time the topic of today’s protagonist SQL involved again, MapReduce encapsulates a set of SQL implementation on top of MapReduce, so users can write MapReduce code in SQL. This is commonly known as Hive.

Later, because MapReduce is written to disk, some people feel that it is slow, so they create Spark, which uses memory to improve data read and write speed, and other optimizations, such as dynamically generating execution plans. Spark also implements SQL, which can be invoked using Spark.sql.

Since the original Spark could only do offline data processing, there are many more that can do real-time data processing, but Flink is still the most active one, as you will find, Flink also supports direct writing of SQL.

In addition, there are some new database and Hive tools, such as PrestoSQL/PrestoDB, which also support SQL, similar to Spark. Through lexical analysis, semantic analysis, execution plan generation, optimization of execution plan, execution plan segmentation and other steps, users can get the data they want. link

Then look at the TiDB overview. TiDB consists of three main components: TiDB, TiKV and PD. TiDB can be seen as similar to MapReduce, which can run SQL. PD is used for scheduling, and it is also the dictionary of your database to manage raw data. TiKV can be seen as HDFS.

So far we can see that in the OLAP scenario, most of the query tasks can be implemented through SQL, so in fact for us, as long as we give the user an SQL input path, in addition to a middle layer of processing, then basically can meet most of the current data needs.

Iv. Project practice

After talking about SQL’s current application, let’s talk about project practice.

The data type

First of all, it is necessary to have a basic understanding of data before the project, including the most basic data type. Right now there are four main types of data, named data, sequential data, equal ratio data and isometric data, and isometric data is something we rarely use, so WE won’t talk about it here.

The name data or categorization data is a bunch of discrete data without any logical relationship, like city names, people’s names, and so on, and the sequential data is the name data with logical relationship, like large/medium/small or “2020-10-10″/”2020-10-11” time series, And the ratio data is height, weight and so on commonly used for measurement.

Once you have these basic concepts in mind, when you draw a chart you can be more specific about which fields of my data are used as dimension data and which fields are used as indicator data. For example, in general, the name/order data can be used as the X-axis, while the Y-axis value is actually the result of the geometric data field aggregated from the previous dimension.

Data is stored

After understanding the basic concepts above, let’s take a look at how we do data storage. In general, we divide tables into entity tables, dimension tables, fact tables, and aggregate tables.

Entity table as the name implies, is our real entity data mapping, such as user table is stored in your user basic information, some basic attributes. A dimension table is a table that stores dimensional information, such as a city table, as dimensional data. A fact table is a table that records some factual information, such as your user order record order table. The aggregation table is the result table of the previous tables by specifying dimension aggregation and indicator aggregation.

Ok, now to the actual service design part, generally speaking, to make a visual SQL editor, there are two ways, one is to make a Web VERSION of SQL Editer in the form of rich text input box, directly let the user write SQL in the rich text SQL editor.

The other option is to convert SQL programming into OOP for visual ORM editing, because we can convert each object, such as select fields, directly into visual checkboxes for the user to choose directly.

Timing task

After you have SQL input and save services, you may also need some scheduled task services, so that you can get some secondary data so that the front end can perform better display and calculation etc. Such as need a regular task to sync your data sources, to obtain the data source in all of the tables, and all the table and table field type, and so on, convenient for you to do your own data dictionary, the service in this way, when users write a SQL data source of the returned data at the same time, combined with your data dictionary, You know which fields are sequential data, which are name data, which are proportion data, which can be used for dimensions, which can be used for indicators.

SUM(a.v. 1)/SUM(a.v. 2)/CASE (a.v. 2) WHEN … You don’t want to send the “END” box directly to the input box, because it’s easy to get non-standard data and so on. This is where you’ll need to be able to write DSL forms, essentially implementing your own rich text editor via CST.

There is also the more and more tables of the project, the data source data will be more and more large, this time the performance of the front end is very critical, here it is recommended to use the formalized form to manage the data on your page, you will find that the performance of the page can be dozens of times.

And then let’s say that over the course of a few months, the number of reports or kanban or components in the project has grown from the first few to thousands. Sometimes, a user can have hundreds of tables, so how to make the user a good management of his statement or kanban is something you need to consider, here I recommend the beginning of the block is designed to be an infinite levels of service menu, similar to a network location, users can create your own folder and can move sorting renaming, etc., Like a web disk to manage his data.

Although your users can write SQL, the level of each person’s SQL writing is not uniform. In this case, you need a SQL auditing service to analyze the user’s SQL. AST parses the SQL (ali’s Druid Parser is recommended here) and gets the information of the SQL written by the user. For example, how many tables are used, how many tables are connected, whether the logic of the table is not in accordance with the specification, and then combined with the execution plan for this SQL audit and scoring, to avoid what a SQL put your entire database hung up the situation.

If you have some implementation problems when writing a project and need to find some project implementations to refer to, I recommend you to look at these two, one is Metabase and the other is Superset, because they are both open source and Kibana is also a good practice reference.

Don’t forget 6-5 PM live oh, click me on the bus 👉 (registration address) :

All the previous period has the full recording, the beginning of the annual ticket to unlock all at one time

Look forward to more posts. Give it a thumbs up