directory

  1. The relationship between the original document and the entity
  2. Primary and foreign keys
  3. Properties of base tables
  4. Paradigm standard
  5. Three paradigms are popularly understood
  6. Be good at identifying and correctly handling many-to-many relationships
  7. Primary key PK value method
  8. Be aware of data redundancy
  9. There is no standard answer to the E–R diagram
  10. View techniques are useful in database design
  11. Intermediate tables, reports, and temporary tables
  12. Integrity constraints are manifested in three aspects
  13. The way to prevent database design patching is the “three less rule”
  14. Methods to improve the efficiency of database operation

1. Relationship between original documents and entities

It can be one-to-one, one-to-many, many-to-many. In general, they are one-to-one: that is, one original document corresponds to one entity and only one entity. In special cases, they may be one-to-many or many-to-one relationships, i.e. one original document for multiple entities, or multiple original documents for one entity. The entities here can be thought of as basic tables. After making clear this correspondence relation, it is beneficial to us to design the input interface.

[example 1] : In the human resource information system, an employee’s curriculum vitae corresponds to three basic tables: the employee’s basic information table, the social relation table and the resume table. This is a classic example of “one original document corresponds to multiple entities”.

2. Primary and foreign keys

In general, an entity cannot have both primary and foreign keys. In the e-R diagram, an entity in the leaf position may or may not have a primary key (because it has no descendants), but must have a foreign key (because it has a father).

The design of primary key and foreign key plays an important role in the design of global database. When the global database design is completed, there is an American database design expert said: “keys, keys everywhere, but in addition to keys, there is nothing”, this is his database design experience, but also reflects his high abstraction of the core of information system (data model). A primary key is a high abstraction of an entity, and the pairing of a primary key with a foreign key represents the connection between entities.

3. Properties of basic tables

Basic tables are different from intermediate and temporary tables because they have the following four features:

  • Atomicity. Fields in the base table are not decomposable.

  • Primitiveness. Records in a base table are records of raw data (underlying data).

  • Apriority. All output data can be derived from data in the base table and code table.

  • Stability. The structure of the basic table is relatively stable, and the records in the table are to be preserved for a long time.

Once you understand the nature of basic tables, you can distinguish them from intermediate and temporary tables when designing your database.

4. Paradigm standards

The relationship between the base table and its fields should meet the third normal form as much as possible. However, database design that meets the third normal form is often not the best design. In order to improve the efficiency of database operation, it is often necessary to reduce the standard of paradigm: to increase the redundancy appropriately, to achieve the purpose of space for time.

[2] There is a basic table for goods, as shown in Table 1. The existence of the field “amount” indicates that the design of the table does not meet the third normal form, because “amount” can be obtained by multiplying “unit price” by “quantity”, indicating that “amount” is a redundant field. However, adding the “amount” of the redundant field can improve the speed of the query statistics, which is the practice of space for time.

In Rose 2002, there were two types of specified columns: data columns and computed columns. Columns such as “amount” are called “calculation columns”, while columns such as “unit price” and “quantity” are called “data columns”.

Table 1 Table structure of commodity table

Name of commodity marque The unit price The number of The amount of
The TV 29 inches 2500 40 100000

5. Understand the three paradigms informally

A simple understanding of the three paradigms is of great benefit to database design. In database design, in order to better apply the three paradigms, it is necessary to understand the three paradigms popularly (popular understanding is sufficient understanding, not the most scientific and accurate understanding) :

  • ** First normal form: **1NF is the atomicity constraint on attributes, which requires attributes to be atomic and cannot be decomposed;

  • ** Second normal form: **2NF is a unique constraint on records, requiring records to have a unique identity, that is, entity uniqueness;

  • ** Third normal Form: **3NF is a constraint on field redundancy, that is, no field can be derived from any other field, which requires that fields have no redundancy.

No redundant database design can do that. However, a database without redundancy is not necessarily the best database, and sometimes in order to improve the efficiency of operation, it is necessary to lower the paradigm standards and retain redundant data appropriately.

The specific approach is to follow the third normal form in the conceptual data model design, and reduce the standard of the normal form to the physical data model design. To reduce the normal form is to add fields, allowing redundancy.

6. Be good at identifying and correctly handling many-to-many relationships

If a many-to-many relationship exists between two entities, it should be eliminated. This was eliminated by adding a third entity in between. In this way, what was once a many-to-many relationship is now two one-to-many relationships. The attributes of the original two entities should be properly allocated among the three entities. The third entity here is essentially a more complex relationship that corresponds to a base table. In general, database design tools cannot recognize many-to-many relationships, but can handle many-to-many relationships.

In the library information system, the book is an entity, and so is the reader. The relationship between these two entities is a classic many-to-many relationship: a book can be borrowed by multiple readers at different times, and a reader can borrow multiple books. Therefore, to increase the third entity between them, the entity named “borrowed books”, its properties are: borrowed time, borrowed marks (0 borrow books, 1 book), in addition, it also should be two foreign keys (” book “of primary keys,” the reader “primary key), make it can connect with” book “and” the reader “.

7. Primary key PK value method

PK is a tool for programmers to use between tables, can be no physical meaning of the number string, by the program to automatically add 1 to achieve. It can also be a field name or a combination of field names that have physical meaning. But the former is better than the latter. When PK is the combination of field names, it is recommended that the number of field names be not too large. If the number of field names is too large, index space is occupied and the index speed is slow.

8. Recognize data redundancy

The repeated occurrence of primary and foreign keys in multiple tables is not data redundancy, and it must be clear that many people are not. The repetition of non-key fields is data redundancy! And it’s a low-level redundancy, which is repetitive redundancy. Advanced redundancy is not the repetition of fields, but the derivation of fields.

[example 4] : The three fields of “unit price, quantity and amount” in a commodity, “amount” is derived from “unit price” multiplied by “quantity”. It is redundancy, and it is a kind of advanced redundancy. The purpose of redundancy is to increase processing speed. Only low-level redundancy increases data inconsistencies, because the same data may be entered multiple times, from different locations, and roles. Therefore, we advocate high level redundancy (generative redundancy) and oppose low level redundancy (repetitive redundancy).

9. There is no standard answer for E–R

The E–R diagram of information system has no standard answer, because its design and drawing method is not unique, as long as it covers the business scope and functional content of the system requirements, it is feasible. Instead, modify the E–R diagram. Just because there is no single answer doesn’t mean it can be designed arbitrarily. The standard of a good E-R graph is: clear structure, concise association, moderate number of entities, reasonable attribute allocation, and no low-level redundancy.

View technology is useful in database design

Unlike basic tables, code tables, and intermediate tables, a view is a virtual table that depends on the real table of the data source. View is a window for programmers to use database, is a form of base table data synthesis, is a method of data processing, is a means of user data confidentiality. In order to perform complex processing, speed up computation, and save storage space, the depth of views is generally limited to three layers. If the three-tier view is not enough, you should define a temporary table on the view, and then define the view on the temporary table.

By iterating definitions over and over again, the depth of the view is unlimited. The role of views is even more important for certain information systems related to national political, economic, technological, military and security interests. After the physical design of the basic table of these systems is completed, the first layer of views is established on the basic table, and the number and structure of this layer of views are exactly the same as the number and structure of the basic table. And it says that all programmers are only allowed to work on views. Only the database administrator, armed with a “security key” shared by multiple people, can operate directly on the base table. Ask the reader: Why?

11. Intermediate tables, reports and temporary tables

An intermediate table is a table that holds statistics and is designed for use in a data warehouse, output reports, or query results, sometimes without primary and foreign keys (except for data warehouses). Temporary tables are designed by individual programmers to store temporary records for personal use. Base and intermediate tables are maintained by the DBA, and temporary tables are automatically maintained by the programmer himself.

12. Integrity constraints are manifested in three aspects

Domain integrity: Use Check to implement constraints. In the database design tool, when defining the value range of a field, there is a Check button that defines the value city of the field. Referential integrity: PK, FK, table level triggers. User-defined integrity: It is a set of business rules implemented with stored procedures and triggers.

13. The way to prevent database design patching is the “three less principle”

  • ** The fewer tables in a database, the better. ** Only when the number of tables is less, it can be explained that the system’s E-R graph is less and more precise, which removes redundant redundant entities, forms a high abstraction of the objective world, carries on the data integration of the system, and prevents the patch-type design;

  • ** The fewer fields in a table that combine primary keys, the better. ** Because of the primary key’s function, one is to build the primary key index, and the other is to serve as the foreign key of the child table, so the number of fields combined with the primary key is less, which not only saves the running time, but also saves the index storage space;

  • ** The fewer fields in a table, the better. ** Only the number of fields is less, in order to show that there is no data duplication in the system, and there is little data redundancy, more important is to urge readers to learn “column change row”, so as to prevent the field in the child table to pull into the main table, leaving many empty fields in the main table. The so-called “column change row” is to pull out part of the contents of the main table and create a separate child table. This method is very simple, some people are just not used to, do not adopt, do not implement.

The practical principles of database design are: ** Find the right balance between data redundancy and processing speed. **** “three less” is a whole concept, comprehensive point of view, can not isolate a certain principle. The principle is relative, not absolute. The “three more” principle is definitely wrong. ** Consider this: if you cover the same function of the system, an E-R diagram with 100 entities (1000 attributes) is much better than an E-R diagram with 200 entities (2000 attributes).

Advocate “three little” principle, is to ask the reader to learn to use the database design technology to carry on the system data integration. The steps of data integration are to integrate file system into application database, application database into topic database, and topic database into global comprehensive database.

The higher the degree of integration, the stronger the data sharing, the less information island phenomenon, the number of entities, the number of primary keys, the number of attributes in the global E-R graph of the entire enterprise information system will be less.

Advocated the principle of “three little” purpose, is to prevent the reader using patching technology, constantly to increase bowdlerize database, make the enterprise database becomes a random “dump” design a database table, or database table “painting”, finally caused the base table in the database, code table, the intermediate table, temporary table desultorily, countless, As a result, the information system of enterprises and institutions cannot be maintained and paralyzed.

Anyone can do the “three more” principle, which is the “patching method” of database design. “Three less” principle is the principle of less and better, it requires a higher database design skills and art, not anyone can do, because the principle is to put an end to the use of “patch method” design database theoretical basis.

14. Methods to improve the efficiency of database operation

Under the given system hardware and software conditions, the methods to improve the operation efficiency of the database system are as follows:

  • In database physical design, reduce paradigm, increase redundancy, use fewer triggers, and use more stored procedures.

  • When the calculation is very complex and the number of entries is very large (for example, 10 million), the complex calculation is performed outside the database in C++ language as a file system before being added to the table. This is the experience of telecom billing system design.

  • If a table has too many records (for example, more than 10 million records), split the table horizontally. In horizontal split, records of the table are split horizontally into two tables based on a value of the primary key PK. If a table is found to have too many fields (for example, more than 80 fields), the table is split vertically. The original table is split into two tables.

  • The system optimization of database management system DBMS is to optimize various system parameters, such as the number of buffers. SQL database tips, this recommended look.

  • In the use of data – oriented SQL language programming, try to adopt optimization algorithm.

In a word, to improve the operation efficiency of the database, we must from the database system level optimization, database design level optimization, program implementation level optimization, these three levels at the same time.

The fourteen skills mentioned above are gradually summarized by many people in a large number of database analysis and design practices. For the use of these experiences, readers should not help hard set, rote memorization, but to digest understanding, practical and realistic, flexible grasp. And gradually do: in the application of development, in the development of application.