1. An overview of the
Good table structure design is the cornerstone of high performance and should be designed in terms of the business queries that the system will perform, which often requires trade-offs. Poor table structure design, will waste a lot of development time, seriously delay the project development cycle, let a person very painful, and directly affect the performance of the database, and need to spend a lot of unnecessary optimization time, the effect is often not good.
There is an important design principle in database table design, called paradigm design.
2. Three paradigms
Normal Form comes from English Normal Form, NF for short. In order to design a good relationship, it is necessary to make the relationship satisfy certain constraints, which have been formed into a standard, divided into several levels, each of which is stricter than the other. A database that meets these specifications is clean, structured, and free of INSERT, DELETE, and update exceptions. The opposite is clutter, which not only creates problems for database programmers, but is also obnoxious and can store a lot of unnecessary information.
2.1 First normal Form
1NF is the atomicity constraint on attributes, which requires attributes to have atomicity and cannot be decomposed again.
The first paradigm emphasizes the atomicity of data tables and is the foundation of the other paradigms. For example, the following table:
The name-age column has two attributes, one name and one age, which does not conform to the first normal form. Split it into two columns:
The table above fits the first normal form relationship. However, in daily life, it is far from enough to standardize tables only with the first paradigm. There are still problems such as excessive data redundancy, abnormal deletion, abnormal insertion and abnormal modification. At this time, it is necessary to introduce the concept of normalization to transform it into more standardized tables and reduce data dependence.
In fact, 1NF is the most basic requirement for any relational database. When you create a table in a relational database management system (RDBMS) such as SQL Server, Oracle, or MySQL, if the design of the table does not meet this basic requirement, the operation will not be successful. That is, any data table that already exists in an RDBMS must be 1NF compliant.
2.2 Second normal Form
2NF, which is a uniqueness constraint on records, requires that records have a unique identity, that is, the uniqueness of the entity.
The second normal form (2NF) is established on the basis of the first normal form (1NF), that is to satisfy the second normal form (2NF) must first satisfy the first normal form (1NF).
Second normal Form (2NF) requires that every instance or row in a database table must be uniquely locatable. Typically, implementations add a column to a table to store the unique identity of each instance. For example, an employee information table has an employee number (EMP_id) column, because each employee’s employee number is unique, so each employee can be uniquely distinguished. This unique attribute column is called the primary key or primary key, primary code. This means that there is only one business primary key in the table, and the second normal form (2NF) requires that the attributes of the entity depend entirely on the primary key.
Complete dependency means that there cannot be attributes that depend on only part of the primary key.
There are two tables: the order table and the product table.
If an order contains more than one product, it is expressed as follows:
Obviously, there are two identical order ID data in the order table, which cannot ensure its uniqueness and does not meet the second normal form. Moreover, there is no strong correlation between the product ID and the order ID, so the normal situation is to split the order table into the order table and the intermediate table of the order and the goods.
2.3 Third normal Form
3NF is a constraint on field redundancy, that is, no field can be derived from other fields. It requires that fields have no redundancy.
Each non-non-primary attribute is neither partially dependent nor transitionally dependent on the business primary key, which removes transitive dependencies of non-primary keys to primary keys on a second normal form basis. For example, there is a department information table, where each department has the department id (dept_id), department name, and department introduction. After listing the department number in the employee information table, the department name, department introduction and other department-related information cannot be added to the employee information table. If the department information table does not exist, it should also be built according to the third normal form, otherwise there will be a lot of data redundancy.
As follows:
If the product ID is changed, the product name is also changed and the product name is changed, which is not in the third normal form and the product name column should be removed from the order table.
3. Anti-three paradigm
Fully conforming to the formal design is not perfect, in the actual business query will be a large number of table associated query, and a large number of table associated a lot of time will affect the performance of the query.
De-canonization is the appropriate violation of the requirements of the database design paradigm for performance and read efficiency. A small amount of redundancy is allowed, in other words antiparadigm is the use of space in exchange for time.
Some relatively low probability of change, such as commodity information and classification information are often queried together, so the classification information is also put in the commodity table, redundant storage.
4. Contrast
For formal design:
- Advantages: Data redundancy can be reduced as far as possible, making the update fast and small;
- Disadvantages: Multiple tables are required for query association, which reduces write efficiency and increases read efficiency, and makes index optimization more difficult.
For anti-formal design:
- Advantages: can reduce table association, can better index optimization;
- Disadvantages: Data redundancy and data anomalies, data modification needs more cost.
Both formalization and antiformalization have their advantages and disadvantages. How to choose the best design?
Generally speaking: I want all of them. Generally speaking, it is reasonable and two cases, and it is the same in practical practice. There will not be complete formalization and complete anti-formalization design.