I want to talk to you today about the database level

Note: The databases we are talking about today refer specifically to relational databases

01. Database selection

A few days ago I posted a diagram of what stacks I might introduce in the Austin project, and a lot of people asked me why THE Distributed Configuration Center chose Apollo instead of Nacos. No one asked me why I chose MySQL as my database.

Speaking of MySQL, you can see all kinds of Java tutorials on the Internet, almost all use MySQL as a database. I talk about all kinds of database problems in the group, almost all of them are MySQL, only some of them may use PostgreSQL and Oracle or other

Even in my interviews, I have never been asked, “Why did you choose MySQL for your database? What is the technology selection?”

Look at this, do you think I have the answer? Actually, I didn’t either. Halfway through I realized I couldn’t come up with a good reason… Since I didn’t know, I went to see what they said.

The reasons may be summarized as follows:

  • MySQL is free, open source and easy to use in the early stage. With ecology, it’s hard to kill. (Most important)
  • MySQL is used on the Internet to store data as a “low-cost and fast data storage insertion solution” and is relatively undemanding (I’ll talk more about this later).
  • Many times the selection of a technology is not for technical reasons

Me, I only know MySQL. I’ve only used MySQL in a production environment. I used Oracle when I was a kid, but I’ve forgotten most of it now.

Many times the choice of technology is not for technical reasons (I’m a lazy dog, I admit it). PostgreSQL is very popular in recent years. I heard that many places are better than MySQL. If you are interested, you can replace MySQL with PostgreSQL in Austin project

Interested in database selection, you can also find some information to continue to look up information, but also welcome to output their own experience in the comments section, this topic discussion I think is quite interesting.

Those of you who worked with me on the Austin project are probably familiar with relational databases, but I won’t go into the basics here. For those of you interested in MySQL or preparing for an interview, check out the MySQL chapter in my Online Interviewer series (which has received good feedback on various blogging platforms).

02. ORM framework selection

I remember a few years ago when I just got into contact with database and Java, at that time TO use JDBC to connect database to operate data, I was very puzzled: clearly I can through a variety of database clients can operate on the data, why I want to use JDBC, good trouble ah!

As for why I had this question, I don’t understand what I was thinking (lol). Later, I learned a lot of postures for simplifying JDBC templates (DBUtils/Hibernate/Spring JDBC/Mybatis/SpringData JPA)

Mybatis is what I’ve been exposed to in production environments, but this time I decided to use SpringData JPA as the ORM framework for the asutin project.

03. Experience with databases

In the past two years, I have been working in an Internet company. I would like to talk about what I have come into contact with personally and share my views.

Generally speaking, each business team maintains its own database (a business team may have several libraries). When we need the relevant data of a certain team, the team will provide the corresponding RPC interface for internal business use.

This means that the data logic is transparent to the calling business (the calling business does not need to pay attention to any information about the other team’s databases, whether it be database table design or specific fields).

The benefit is obvious: to request a team’s business data, simply find the interface they provide. As the demand side, just need to adjust the interface can get the data they want.

Going back to the internal database storage itself, we try to keep the table structure as simple as possible: in many cases, we design tables without the three database paradigms.

Here’s a very simple and possibly inappropriate scenario: one author might write multiple articles (meaning that multiple articles belong to the same author)

When you’re starting out, you might have a tutorial design like author table, Content table, autor_content_mapping table

However, in a real production environment, it is likely that we will not design such an associated table, but will simply redundancy the related fields in a table. In this way, the corresponding information can be directly queried through a table, without multi-layer association

If the above structure is used for query: for example, I want to find the basic information of the author of a certain article, then my action is:

  1. associatedAuthor_content tableI found the articleauthorId
  2. throughauthorIdGo to theThe author tableGot basic information on the author

If I store authorId directly in the content table, that means less searching in the AUTHOR_content table.

Note: I don’t mean to put all of your information in one table. There are hundreds of fields in one table. Don’t get me wrong!

Speaking of association, there is another topic that we can talk about: whether or not to join. In mapper. XML, I could not find join in all the projects I came into contact with in my previous company. I only used join when I wrote statistical scripts in Hive.

[Mandatory] Disallow join when more than three tables are used. The data types of the fields to be joined must be absolutely consistent. When multiple tables are used for associated query, ensure that the associated fields have indexes.

Note: Pay attention to table index, SQL performance even if double table JOIN.

I summarized a: mysql performance optimization and high availability architecture practice, click here for free download

Those of you who like to use joins will tell you that I write joins to make my code simpler. It is too troublesome to check the data, the data to check will be stored in multiple tables, directly using join development efficiency is the fastest!

I, on the other hand, am a proponent of writing business logic in code. All are single-table queries that associate data in program code (programmatically you can do anything a database JOIN can do). The benefits are: SQL is simple, SQL is easy to reuse, SQL is easy to optimize

In most cases, our interface bottlenecks come from the “database,” not the application server. Multi-join and complex SQL is not optimized, while simple SQL is optimized, and I think the program logic is often easier to maintain than SQL.

In my two years in the Internet company, relational database is in my mind as a storage to support transactions. If we store data that has no transaction requirements, we probably don’t need to store it in a relational database at all.

There are so many data sources to choose from, you can store data in Redis(in-memory database), Elasticsearch (search engine), HBase (distributed, scalable big Data storage), HDFS (distributed file system), ClickHouse (OLAP storage system), etc

Based on the above background, my query SQL is not complicated, so Spring Data JPA is suitable for me.

04. Database beyond development

Any company of a certain size will have a database-related infrastructure. Here are some common ones

DDL and DML both need work order

The database theory in the production environment cannot be modified in the program by writing interfaces (high-risk actions), and all data repairing or table building needs to be reviewed by the work order system (generally, the database principal +DBA).

For example, if you submit a request to create a table, the DBA will check whether your table is properly designed (indexed, etc.).

DQL query online data requires permissions

When we want to query online data, we generally have to apply for the permission of the database, and then conduct data query on the specific page of the Intranet of the company (we generally only need to search the data within the team, so in fact, it is also good that the permission of the database of other teams is not open, and the data can only be obtained through the interface).

3, generally not directly connected to the database program (there will be a proxy layer)

Generally, only offline databases can be directly connected through IP, and online databases will go through the proxy layer (the proxy layer can do a lot of things, including monitoring, authentication, database, table and so on).

4. Complete monitoring of alarms

The database as one of the most important storage (if a failure is really significant), there will be complete monitoring and alarm. For example, the system monitors various indicators of the database in real time

05, Austin build table DDL

For those of you who have previewed it, you will know that UNDER austin. SQL I put DDL for two tables. To give you an idea of what I’m doing, let me explain exactly what the DDLS for these two tables mean (why I created them).

Let’s start with the message_template table. I’ve annotated all the fields so it should be easier to read.

Note: if the program due to the expansion of the database annotation lag, or it is necessary to update (benefit future generations)

We need to make all messages have a “carrier”, which is literally a template. Templates are the cornerstone of the Austin system (with templates, business processing can be done, source tracing can be done, data statistics can be extended to a whole set of construction…).

Let’s talk about a few fields that you may have questions about:

  • audit_statusandflow_id: Templates need to go through before they can be sentaudit(This is very important in sending messages, which can greatly prevent the error of the message (I believe you can also see that various companies have reported the error of the message)
  • msg_typeMessage type: Separates different message types and can be used during deliveryDifferent types take different pathsImplement message isolation (even if a marketing message is blocked, it does not affect a notification message)
  • send_accountSending account: A channel may have multiple accounts to send messages (for example, the email channel can select different email groups to send messages, and the SMS channel can select different SMS accounts to send messages).
  • deduplication_timeandis_ngiht_shieldPlatform rules: As components (platforms) of the send type, need to havegeneralThe rules. De-weighting and night shielding are ideal for platforms
  • msg_conteng: This field is sent as message contentThe core, different channels have different formats for delivery. I will directly store JSON in later. supportA placeholderTo replace
  • .

It is possible that fields will be extended later (after all, when you first think about designing tables, you won’t be able to do everything). This table, which is used as a template or understood as a configuration, is meant to be used without large amounts of data.

Let’s take a look at the SMs_record table, in fact, this table can not say much (is to send a record of SMS and SMS receipt stored in). It helps to track why a text message to a user failed, and to correlate those records for reconciliation.

06,

What this article really wants me to talk about is: database is a very important role, if it fails, it will have a big impact. But at the same time, we use it “lightweight” a lot of the time (through simple SQL), and it exists a lot of the time because it supports transactions very well (data consistency).

The data we can trust most is stored in the database, other storage we may worry about loss, will be more, will not be real-time, etc. (this is the biggest advantage of database over other storage)

Don’t take my word for it. We can talk in the comments section