Function dependency and table design

First of all, all the work we did was to decouple a redundant table before we created the table using SQL statements. You can follow this clue to read the full article:

decomposition

We split some large tables that were poorly designed, essentially for repeated inserts, or delete exceptions, insert exceptions, etc. Decomposition is the splitting of attribute coupled relationships into two or more uncoupled relationships. However, decomposition may result in the loss of information, which is called lossy decomposition, or lossless decomposition.

“Information loss” is a vague concept, so here it is more clearly stated: if the decomposed relationship can recover the original relationship’s information by natural connection, then it is lossless decomposition. That is, suppose a complete relationship R (r) is vertically decomposed into two tables R (R1) and R (R2), where R = R1 ∪ R2. A symbolic expression for proving that lossy decomposition occurs is R ⊂ R1 (r) ⋈ π R2 (r).

Suppose there is a relationship like this:

employee(ID, name, street, city, salary)

The street, City, and Salary attributes seem to be related only to the employee’s name or ID. Here’s a bad breakdown:

r1(ID,name)\\ r2(name,street,city,street)

Originally in the Employee relationship, we had tuples like this:

ID	name	street	city	salary
10001	Kim	Main	Perryridge	75000
10002	Kim	North	Hampton	60000

They are cut into R (R1) :

ID	name
10001	Kim
10002	Kim

And r(R2) (assuming here we treat {name,street,city,salary} as a candidate) :

name	street	city	salary
Kim	Main	Perryrige	75000
Kim	North	Hampton	60000

If we rejoin the two tables naturally:

ID	name	street	city	salary
10001	Kim	Main	Perryridge	75000
10001	Kim	North	Hampton	60000
10002	Kim	Main	Perryridge	75000
10002	Kim	North	Hampton	60000

As you can see, the restored relationship is not the original r(r). We get more tuples, but less information. The decomposed version does not represent the connection between name and address, or salary, which is lost in the decomposition process.

Function dependencies and closures

Pay attention to. The closures discussed in this section are closures of function dependency sets. There is also a closure for the property set, not to be confused.

In order to achieve lossless decomposition, we need to know what can and cannot be disassembled. To do this, we need some notation for how properties are related to each other. First, understand the concept of function dependence. In addition, this convention uses the Greek letters (α, β, γ, etc.) to represent A set of attributes, and uses A, B, etc., to represent A single attribute.

Let α → β, for any tuple T1 and T2 in r(r), if T1 [α] = T2 [α], then T1 [β] = T2 [β] must be true. The beta function is called dependent on alpha. Each dependency can be abbreviated as f.

There are some special cases for β ⊂ α, then α → β is always true. This is divided into two cases:

In particular, if beta = alpha, alpha → alpha is always true, we will only use this property in the initialization of subsequent algorithms.
If a β ⊆ α, then α → β is valid, we call this a trivial function dependence. See Armstrong’s axiom below: reflexivity.

We’re mostly talking about nontrivial function dependencies.

A database may contain various forms of dependency FIs that are artificially set for requirements, which explicitly constitute the set F of function dependencies. Note that in addition to our explicitly declared function dependency set F, there may be implicit dependencies between fi’s within it. These implicit dependencies are also called “logical implications” by F. The F set and all its implicit dependencies together constitute the closure of F, denoted as F+.

Armstrong axiom

To efficiently and quickly find implied dependencies from existing F, we need a valid and complete set of axioms (or experience) rather than relying on “sound intuition”. That’s why we introduced the Armstrong axiom. There are three original rules and three additional rules (which can be proved by the previous three theorems)

Reflexivity law: α → β if a β ⊆ α (a nontrivial function dependence).
Supplementary law: if α → β, then γα → γβ.
Transfer law: if α → β, and β → γ, then α → γ.
Combining law: if α → β and α → γ, then α → βγ.
Decomposition law: if α → βγ, then α → β and α → γ.
Pseudo transfer law: if α → β, and γβ → θ, then γα → θ.

It’s important that we rely on these axioms to mine more functional dependencies. In addition, it may also be used for regular coverage.

Attribute closures and judgment candidate codes

Pay attention to. The closures discussed in this section are closures of property sets. There is also a closure F+ that depends on the set F.

If α → B, then the attribute B is determined by the function α. According to the existing function dependence set F, we can also dig out all the other properties that can be deduced by α, and these properties form a set, which we call the closure of α, that is, α+.

For example, assuming A relation R (A, B, C, G, H, I) and its set of functional dependencies {A → B, A → C, CG → H, CG → I, B → H}, we are looking for (AG) +. Assuming the result set is result, in step 1 we can make result = {AG} (AG → AG is always true).

A → B makes result := result ∪ {B} = {ABG}.
A → C makes result = {ABCG}.
CG → H makes result = {ABCGH}.
CG → I makes result = {ABCGHI}.
(AG)+ = R

Obviously, if all properties of a relation R are determined by alpha properties — or if alpha + contains all properties of R — then alpha is clearly a supercode. We take advantage of this feature in candidate code lookups below.

Regular cover

A maze of dependencies is not exactly conducive to lossless decomposition. We all want a minimal, compact set of dependencies. In other words, we need to know which properties are “jumbled”. For A dependent α → β, the superfluous attribute A (or B) is likely to appear on the left, so we may simplify as :(α -a) → β; It can also be on the right side, so we can simplify it to alpha → β -b. Properties that are proven to be deletable are called irrelevant.

If the attribute is removed from the left, the dependency becomes stronger. If the attribute is removed from the right, the dependency is even weaker. Therefore, how to check that the dependency relationship is not broken after the deletion of attributes varies according to the direction of deletion.

Suppose that the function dependency set of a function is F, assuming that all irrelevant attributes are deleted, then a regular cover will be obtained, and its symbol is: Fc. Some sources also call the regular cover the minimum dependency set. In Fc, the α I to the left of any FI: α I → β I is unique, that is, there is no α1 and α2, and α1 = α2. So our first step to derive regular coverage is to use the combination law of the Armstrong axiom to sort out those α identical terms.

Remove from the right

If the attribute to be judged is to the right of the dependency, such as {A → BC}, we assume that B or C is irrelevant. Then delete the attribute and calculate the closure α+ of the left part of the dependency set in the remaining function dependency set F’. If the attribute to be judged is contained in α+, it is irrelevant.

The first example is: F{A → BC, B → AC, C → AB}. Let’s focus on the B property, which is dependent on the right hand side. Delete this attribute, the remaining F’ = {A→C, B → AC, C→AB}. Compute the closure (A)+ for the remaining attributes on the left. Because A → C, C → AB, (A)+ contains the deleted attribute B. So B is an independent property.

Delete from left

If the attribute to be judged is to the left of the dependency, as in {AB → C}, we assume that A or B is irrelevant. Then, delete this attribute and calculate the closure α+ of the remaining left part of the dependency set in the original dependency set F. If the α+ closure contains all of the attributes to the right of the dependency, the attribute is irrelevant. For example, in the function dependence set F{Z → X, X → P, XY → WP, XYP → ZW}, we speculate that the P attribute in XYP → ZW may be redundant. Delete it, solve the closure of the remaining left set (XY), find the domain is the original F{Z → X, X → P, XY → WP, XYP → ZW}.

(XY)+ = {X, Y, W, P}; (XY)+ = {X, Y, Z, W, P} Contains ZW of original dependence on the right side, so P is an irrelevant attribute.

Another example: F{A → BC, B → C, AB → C}. Focus on the A property of AB → C. Following the steps, we ask if (B)+ contains C. But obviously, the remaining part B → C after deleting it already exists in F, so A must be irrelevant attribute.

3 nf with BCNF

Boyee-codd Normal Form, also known as BCNF, bass Normal Form is currently a satisfactory paradigm, so this chapter will not discuss higher levels of Normal Form.

1NF requires all attributes to be atomic, which is now a basic condition for discussing database design. In 2NF → BCNF, partial function dependence (2NF) and transfer function dependence (3NF) of non-main attributes on codes are deleted step by step, as well as partial transfer dependence and transfer function dependence (BCNF) between main attributes. For a table with a single main code, it must satisfy 2NF.

There are many similar explanations on the Internet for the judgment of 3NF and BCNF, but the author prefers to refer to the explicit, programmatic and symbolic expressions in The Concept of Database System.

BCNF discriminant conditions

F is the set of functional dependencies of relation R, and F+ is the closure of the set of F dependencies. Functions such as α → β in closure F+ depend on F, and at least one of the following is true:

α → β is a trivial function dependence, i.e., β ⊆ α.
Alpha is a supercode of R (supercodes can contain candidate codes).

3NF criterion

α → β is a trivial function dependence, i.e., β ⊆ α.
Alpha is a supercode of R (supercodes can contain candidate codes).
Each property of beta – α A_iThey all show up in one of themThe candidate codeIn the middle.

Note that the Ai may appear in different candidate codes, not all in one candidate code. Compared with BCNF, 3NF relaxes the restrictions on partially nontrivial function dependencies. It can be seen from the discriminant conditions that the relationship satisfying high normal form must satisfy low normal form; Similarly, relationships that violate low paradigms must not satisfy high paradigms.

Candidate code search

Reasoning about candidate codes by “intuition” is inefficient. Here’s a handy way to do it:

First, the attribute R in relation R (r) is divided into the following sets: L, LR, R, N. Then examine all dependencies fi in the dependency set F: α I → β I. Among them:

L contains elements that appear only on the left side of the dependency.
N contains elements that do not appear on either side of the dependency (note that this does not imply that it is an independent attribute).
LR contains all the elements that occur on the left and right sides.
R contains elements that occur only on the right.

Obviously, these four sets intersect in pairs with each other as empty sets, and we call them a partition of R. Here are the important implications:

If an elementx∈ L, then it must appear in any candidate code.
If an elementx∈ N, then it must also appear in any candidate code (because there is no other way to get it).
If an elementx∈ R, then it must not appear in any candidate code, otherwise, the candidate code is not the minimum supercode.

Therefore, we can quickly locate candidate codes in the following situations:

In the absence of n-class elements, the set L itself is a unique candidate code if the closure L+ of L contains all elements of R (abbreviated L+ = R).
In the case of n-class elements, LN itself is the only candidate if the closure of L ∪ N (abbreviated LN) contains all elements of R.

Otherwise, things get a little messy. We need to continuously extract the set Yi of one or more elements from LR and test whether (LRYi)+ is R. Here is one “convenient” and another “not so convenient” example.

Example 1: Let the relational mode R(A, B, C, D, E, P), its function dependence set: F = {A → D, E → D, D → B, BC → D, DC → A}, find all candidate codes of R. L = {C, E}, R = ∅, N = {P}, LR = {A, B, D} Obviously (CEP)+ = {CEPDBA}, so CEP is the only candidate code for R.

Example 2: Let the relational mode R(S, D, I, B, O, Q), its function dependence set: F = {S→D, I→B, B→O, O→Q, Q→I}, find all candidate codes of R. ∴ L = {S}, R = {D}, N = ∅, LR = {I, B, O, Q} S+ = {SD}, (SI)+ = {SIDBOQ}, (SB)+ = {SBDOQI}, (SO)+ = {SODQIB}, (SQ)+ = {SQDIBO},

The PC candidate code set = {SI, SB, SO, SQ}

BCNF decomposes losslessly

The general idea of decomposing BCNF paradigm is as follows. Suppose the input is a relation R (r) and its function dependent set F, while our output is a lossless decomposition of R satisfying BCNF ρ={R1, R2… , fairly Rk}. Then:

Let ρ = (R).
If ρ satisfies BCNF normal form for all modes, go to step 4.
If there is A relational mode S in ρ that does not satisfy the BCNF normal form, then we must find that the function depends on f: X → A, X is not any candidate code of S, and A does not belong to X. Set S₁= XA, S₂= S – A, use {S₁And S₂} replace the relation S in ρ and go to step 2.
Output rho.

This idea of decomposition will lead us to construct a decomposition tree. The diagram on the left is shown below. Similarly, the characteristics of this decomposition idea are listed in the figure.

Here is a more complex example. Student(S#, Sname, Sdpet, Mname, Cname, Grade);

(S #,Cname)→Sname,Sdpet,Mname \ S\#→Sname,Sdept,Mname \ (S #,Cname)→Grade \ Sdpet→Mname

Where {S#, Cname} is the primary key. To facilitate our derivation, we can alias these attributes first:

AB→CDE\ A→CDE \ AB→F\ D→E

The candidate code is {AB}. Obviously, the previous two functions depend on us to know: the current relation R does not belong to 2NF. From the discriminant condition of BCNF, A → CDE is the functional dependence that destroys BCNF. Therefore, we decompose the relation R into R1 and R2. Among them:

R1 = XA = {ACDE}, while R2 = R – A = {ABF}.

In addition, we need to do the projection of F(R1) onto F(R) : for each function dependency F in F(R), remove the properties that are not in R1, and add the projected function dependency F ‘inside F(R1). If the left or right side is empty after the attribute is deleted, for example, AB → F is projected by F(R1) into AB → _. So this function dependence is discarded by F(R1).

Therefore, F(R1) = {A → CDE, D → E}, whose primary key is A, and F(R2) = {AB → F}, whose primary key is AB.

R2 obviously doesn’t need to decompose anymore. D → E in F(R1) does not satisfy the BCNF condition: it is a nontrivial dependence, and D is not the primary key. Additionally, attribute E does not appear in any of the candidate codes, so R1 does not satisfy 3NF either (more commonly, there is A transitive dependency between non-primary attribute E and primary attribute A). So according to the algorithm, we continue to decompose R1 into R11 and R12. Among them:

R11 = XA = {DE}, F(R11) = {D → E}, the primary key is D.

R12 = R1 – A = {ACD}, F(R12) = {A → CD}, the primary key is A.

R = ρ{R11, R12, R2} = {R (Sdept, Mname), R (S#, Sname, Sdept), R (S#, Cname, Grade)}.

Of course, before we decompose these tables, if we take the regular coverage first, we can easily get: since CDE can be derived from A alone, then all the attributes to the right of AB → CDE are irrelevant and this dependency is redundant. However, the final result will not change.

3NF decomposes losslessly

With prior knowledge, the decomposition of 3NF is relatively simple. First, the given dependency set F is covered by regular coverage to obtain Fc. For each function dependence f:α → β of Fc, we let Ri = αβ of the attribute set of the subrelation. Finally, if none of the (αβ)k is a supercode (containing any candidate codes), it is only necessary to establish a separate relationship for the attribute set of any candidate codes.

For example, let the relation r(r), r = {ABCDEF}. Assuming F = {A → F, B → A, D → B, DE → B, DF → B}, find A 3NF decomposition of R.

Step 1: Find the regular covering Fc. It can be seen that E in DE → B and F in DF → B are irrelevant attributes, because D → B has been given. After finishing Fc = {A → F, B → A, D → B}.

Step 2: Divide L, R, LR and N sets to seek candidate codes. L = {D}, R = {F}, LR = {AB}, N = {CE}, so the candidate code must contain {CDE} (refer to the previous candidate code search method). Validation (CDE)+ contains all attributes, so {CDE} is the only candidate code.

Step 3: according to the algorithm, we can immediately obtain ρ = {R1(AF), R2(AB), R3(BD)} through Fc. Since none of the subtables contains (CDE) candidate codes, an additional R4(CDE) is added. Finally, the relationship of the table is decomposed into ρ = {R1(AF), R2(AB), R3(BD), R4(CDE)}.

The resources

Database normalization: Schema decomposition algorithm (3NF,BCNF decomposition, with tips, easy to understand) – silent technology – Blog Garden (CNblogs.com)

[Database] — Nondestructive decomposition and maintain dependencies – step, step – blog Park (CNblogs.com)

Four, the decomposition of maintaining lossless connection into BCNF _ Seven nights floating snow -CSDN blog _ decomposition into BCNF

The relational schema is decomposed into BCNF_ bilibil_bilibili

Introduction to database systems-BCNF decomposition algorithm for maintaining lossless connection_bilibili

Database – candidate code problem _wodegeCSDN’s blog -CSDN blog _ Database L class attributes LR class attributes

Candidate code/Minimum dependence Set and Decomposition of the third normal form (Candidate Key & Minimal Basis & 3NF Decomposition

Finding method of regular coverage and judging whether attribute is redundantVault Boy’s blog -CSDN BlogRegular cover