High-Level Database Design & Normalization
Features of Relational Database Design
When a database designer is planning to design a database, they consider the process, key points, and
methodologies involved in the construction of a relational database.
- Relational Database design is based on four key parameters that serve as guidelines for the
construction of a database.
1. Semantics of Attributes:
- This aspect revolves around the meaning of each individual attribute within the database.
- Guideline: Design a relation schema to facilitate easy explanation of its
meaning; names should align with the data stored in them.
- Do not combine attributes from multiple entity types to avoid confusion.
2. Reducing Redundant Information from Tuples:
- Duplicate values should not exist in multiple tuples.
- Guideline: Design the database to eliminate insertion, deletion, and updation
anomalies.
3. Reducing Null Values in Attributes:
- The value of an attribute should not be frequently null.
- Guidelines: Avoid including attributes with frequent null values in the
relation.
4. Disallowing the Possibility of Generating Spurious Tuples
- This refers to avoiding the generation of spurious tuples, meaning undesired results when
joining tables.
- Guidelines: Design the relational schema to enable joining with equality
conditions on common attributes.
Anomalies in Relational Database:-
- Refers to undesirable problem occured while designing the database.
- Basically we have 3 anomolies (problems):
- Insertion anamoly
- Deletion anamoly
- Updation anamoly
Insertion Anomaly
- An insertion anomaly occurs when there is a desire to insert information, but it cannot be added
due to certain constraints or restrictions.
Example:
+-------------------------------------------+
| EID | Name | Salary | DID | Department |
+-------------------------------------------+
| 1 | Paul | 10,000 | 1 | A/C |
| 2 | John | 6,000 | 1 | A/C |
| 3 | Sharav | 7,000 | 3 | Sales |
| 4 | Nike | 6,000 | 2 | Marketing |
| 5 | Hudson | 10,000 | 2 | Marketing |
| 6 | Smith | 20,000 | 1 | A/C |
| 7 | Paula | 50,000 | 2 | Marketing |
| 8 | Maithli | 15,000 | 1 | A/C |
| | | | 5 | Finance |
+-------------------------------------------+
- We can't add Finance and DID 5 due to EID constraints, as EID (primary key) cannot be left blank
or null.
Deletion Anomaly
- A deletion anomaly occurs when we remove some existing information from a database, and this
action unintentionally leads to the deletion of other related but necessary data.
- Example:
+-------------------------------------------+
| EID | Name | Salary | DID | Department |
+-------------------------------------------+
| 1 | Paul | 10,000 | 1 | A/C |
| 2 | John | 6,000 | 1 | A/C |
| 3 | Sharav | 7,000 | 2 | Sales |
+-------------------------------------------+
- If we delete the tuple with EID 3, which is the only occurrence of the "Sales" department,
we lose the definition of the "Sales" department entirely.
Updation Anomaly
- An updation anomaly occurs when we attempt to update data in a way that introduces inconsistency
into the dataset.
- Example: Suppose we want to change the Department ID (DID) from 1 to 2 for the A/C department:
+-------------------------------------------+
| EID | Name | Salary | DID | Department |
+-------------------------------------------+
| 1 | Paul | 10,000 | 2 | A/C |
| 2 | John | 6,000 | 1 | A/C |
| 3 | Sharav | 7,000 | 3 | Sales |
+-------------------------------------------+
- This update would create inconsistency because now the Department A/C is associated with
both DID 1 and DID 2, making it unreliable and confusing.
- For accurate updates, one would need to change the Department ID for all instances of A/C,
leading to a more complex and error-prone process.
Normalization: Introduction
Normalization is a systematic process aimed at decomposing or dividing a relational database into
multiple relations to eliminate anomalies and improve its overall structure.
- It is a step-by-step process, and each step is referred to as a normal form. Normalization is a
reversible process.
Benefits of Normalization
- Smaller tables with smaller rows:
- Normalization results in breaking down large tables into smaller, more manageable ones,
reducing redundancy and improving efficiency.
- Searching, sorting & creating indexes are faster:
- Normalized tables simplify data retrieval operations, leading to faster search, sort, and
index creation processes.
- Small number of null values & less redundant data:
- Normalization aims to minimize null values and redundant data, ensuring a cleaner and more
efficient database design.
- Data modification anomalies are reduced:
- Normalization helps mitigate issues such as insertion, update, and deletion anomalies,
promoting data integrity.
- Easier to maintain & change as the needs change:
- A normalized database is more adaptable to changes in requirements, making it easier to
maintain and modify over time.
- Reduces data dependency:
- Normalization minimizes data redundancy, leading to reduced dependency and ensuring that
changes in one part of the database do not adversely affect other parts.
Pros and Cons of Normalization in Databases
Advantages
- Reduces data redundancy, minimizing storage space.
- Enhances overall organization of the database.
- Promotes data consistency within the database.
- Facilitates a more flexible database design.
- Reduces the number of indexes per table, speeding up maintenance tasks.
- Allows for selective table joins, improving query efficiency.
- Enhances database security management.
- Increases storage efficiency.
- Speeds up data access.
Disadvantages
- Database design must align with user requirements before development.
- Normalization can consume significant CPU, memory, and I/O resources, affecting performance.
- Requires more joins for desired results; poorly written queries can impact database
performance.
- Normalizing relations of a higher degree is time-consuming and challenging.
- Careless decomposition may lead to poorly designed databases and serious issues.
Types of Normal Forms:
- First Normal Form (1NF):
- Focuses on eliminating duplicate data by ensuring that each attribute in a table contains
only atomic (indivisible) values.
- Second Normal Form (2NF):
- Builds on 1NF by addressing partial dependencies. All non-prime attributes must be fully
functionally dependent on the primary key.
- Third Normal Form (3NF):
- Further refines data integrity by removing transitive dependencies. No non-prime attribute
should be transitively dependent on the primary key.
- Boyce-Codd Normal Form (BCNF):
- Aims to eliminate all non-trivial functional dependencies. Every non-trivial functional
dependency is a superkey.
- Fourth Normal Form (4NF) based on Multivalued Dependency:
- Addresses situations where a table has multiple independent multi-valued attributes. It
ensures data is organized efficiently.
- Fifth Normal Form (5NF) based on Join Dependency:
- Deals with scenarios where a table contains multiple overlapping candidate keys, and it aims
to eliminate redundancy through the use of join dependencies.
The first four depend on the concept of functional dependency.
What are Functional Dependencies?
Functional dependencies result from the interrelationship between attributes or tuples in any relation.
- They play a crucial role in normalization and reduce redundancy.
- In relation R, where X and Y are two subsets of sets of attributes, Y is said to be functionally
dependent on X if a given set of values for all attributes in X uniquely determines the values of
all attributes in Y.
X → Y (Y depends on X or X determines Y)
Example:
+-----------+
| X | Y |
+-----------+
| a | 1 |
| b | 22 |
| c | 65 |
| a | 1 |
+-----------+
If X → Y, then for any two tuples (t1 and t2) where t1[X] = t2[X], it follows that t1[Y] = t2[Y].
For every unique value of X, we get a unique value from Y.
a → 1
X is the determinant, and Y is dependent.
Question: Is X → Y?
+-----------+
| X | Y |
+-----------+
| a | 2 |
| b | 2 |
| a | 2 |
| b | 2 |
| c | 2 |
+-----------+
For every 'a' in X, we get Y = 2, and the same holds for 'b'; however, 'c' won't affect anything.
a → 2, b → 2, c → 2
Another example:
+---------------------------+
| EID | Name | Salary |
+---------------------------+
| 1 | Aditya | 15000 |
| 2 | Manoj | 16000 |
| 3 | Sandeep | 9000 |
| 4 | Vikas | 10000 |
| 5 | Manoj | 9000 |
+---------------------------+
Consider the functional dependency:
- X(EID) → Y(Name, Salary)
- Which means X(EID) → Y(Name)
- X(EID) → Y(Salary)
- Vice versa is not true. For instance, when finding the name of a person with a salary of 9000, we
might get two people, but when searching for the name of the person with ID 2, we get only one
person.
First Normal Form - 1NF
- A relation is in 1NF if the domain of each attribute contains only atomic values.
Examples:
+-------------------------------------+
| ID | Name | Age | Phone |
+-------------------------------------+
| 1 | x | 32 | 666 |
| 2 | y | 65 | 777 |
| 3 | z | 42 | 888, 222 |
+-------------------------------------+
ID 3 has two phone numbers, and we shouldn't have multiple values in a single attribute. To convert
this to 1NF form, we can repeat the row like this ↓
+-------------------------------------+
| ID | Name | Age | Phone |
+-------------------------------------+
| 1 | x | 32 | 666 |
| 2 | y | 65 | 777 |
| 3 | z | 42 | 888 |
| 3 | z | 42 | 222 |
+-------------------------------------+
Now, each attribute contains only atomic values, adhering to the 1NF requirements.
We can't even have composite attributes as it violates the atomic property for 1NF. For example
↓
+----------------------------------------+
| ID | Name | Address |
+----------------------------------------+
| 1 | x | 74A, Block 7, Delhi |
| 2 | y | 664b, Block 8, Delhi |
+----------------------------------------+
Converting it to 1NF form by distributing the address attribute into sub-attributes.
+--------------------------------------------+
| ID | Name | Street | Block | State |
+--------------------------------------------+
| 1 | x | 74A | Block 7 | Delhi |
| 2 | y | 664b | Block 8 | Delhi |
+--------------------------------------------+
Working with derived attributes:
As derived attribute values can be easily calculated at runtime, they are removed in 1NF form.
+-----------------------------------------+
| ID | Name | DOJ | WExp |
+-----------------------------------------+
| 1 | x | 16 May 2018 | 3 |
| 2 | y | 14 Aug 2019 | 2 |
+-----------------------------------------+
Converting to 1NF form ↓
+-------------------------------+
| ID | Name | DOJ |
+-------------------------------+
| 1 | x | 16 May 2018 |
| 2 | y | 14 Aug 2019 |
+-------------------------------+
Prime and Non-Prime Attributes
Prime or Key Attribute
-
For a given relation R = {A1, A2, A3, A4, ..., An}, an attribute a is a prime attribute if A
is part
of any candidate key of R.
-
Alternatively, an attribute is considered prime if it is part of a key.
Non-Prime or Non-Key Attribute:
-
If an attribute is not part of any key, it is classified as a non-prime attribute.
Example:
Relation Schema:
- Relation Schema: Student(Rollno, S-Name, S-Address, S-DOB, S-Fees, S-Course)
-
Now, let's determine which attributes can serve as candidate keys.
- Since Rollno uniquely identifies each student, it qualifies as a candidate key: Rollno →
Ck.
- The combination [S-Name + S-Address] can also uniquely identify a student: [S-Name +
S-Address] → Ck
(This combination can also identify a unique value).
- Therefore, Rollno, S-Name, and S-Address are considered prime attributes as they are
either individually or
collectively part of a candidate key.
- On the other hand, S-DOB, S-Fees, and S-Course are classified as non-prime key
attributes since they do not
contribute to the identification of unique tuples.
Fully Functional Dependency:
- In the context of database normalization, a fully functional dependency refers to the condition
where a non-prime key attribute (denoted as A) is entirely dependent on all prime key attributes. In
other words, the value of A is uniquely determined by the combination of values in all prime key
attributes.
Example:
Consider a relation named "Student" with prime key attributes (rollno & games) and non-prime key
attributes (grade, name, and fees).
For illustration purposes:
- Name: This attribute can be uniquely identified using only the "rollno."
- Fees: Similarly, the "fees" attribute can be uniquely identified using only the
"games."
-
Grade: However, the "grade" attribute requires knowledge of both "rollno" and
"games" to uniquely identify it. In this scenario, the grade is considered fully dependent on both
prime key attributes (rollno and games).
This exemplifies a fully functional dependency, where certain non-prime key attributes rely on the
entirety of the prime key for their uniqueness.
Partial Dependency
- Partial Dependency occurs when you have more than one prime key attribute, and a non-prime key
attribute is not dependent on all prime key attributes.
Example:
Consider a relation named "Student" with prime key attributes (rollno & games) and non-prime key
attributes (grade, name, and fees).
-
Name: The "name" attribute, in this case, can be retrieved solely using the
"rollno." Since it is not dependent on both prime key attributes (rollno and games), it is
considered a partial dependency.
-
Fees: Similarly, the "fees" attribute can be uniquely identified using only the
"games" attribute. This situation also represents a partial dependency since it does not depend on
both prime key attributes.
-
Grade: Unlike the previous examples, the "grade" attribute requires knowledge of
both "rollno" and "games" for unique identification. Therefore, it does not exhibit partial
dependency and is fully dependent on the entire combination of prime key attributes.
This provides a more comprehensive view of partial dependency within the context of a relational
database, emphasizing that certain non-prime key attributes depend on only a subset of the prime key
attributes for their uniqueness.
Trivial and Non-trivial Dependency
Trivial Functional Dependency
-
If A → B and B ⊆ A, then this dependency is a Trivial Functional
Dependency. In simpler terms, if the right-hand side (B) is a subset of the left-hand side (A),
the dependency is considered trivial.
-
A → A and B → B are also categorized as Trivial Dependencies. These
cases express that an attribute is functionally dependent on itself, which is inherently
evident.
-
Example: Let's consider a relation where Rollno → Rollno and
Rollno, S-name → S-Name. In the first case, Rollno → Rollno is
trivial because it merely states that you can find the value of Rollno from itself,
which is self-evident and doesn't provide new information. The second case, Rollno, S-name
→ S-Name, is also trivial as it essentially says that knowing both Rollno
and S-name, you can find the value of S-Name. These examples highlight
situations where the dependency is obvious and doesn't contribute additional insights.
-
Trivial dependencies are characterized by their predictability and lack of novelty, making them
less informative in the context of database relationships.
Non-trivial Functional Dependency
-
If A → B and B is not a subset of A, then this dependency is a
Non-trivial Functional Dependency. In other words, the right-hand side (B) provides new
information beyond what is already known from the left-hand side (A).
-
Example: Consider a relation where Rollno → S-Name. This is a
non-trivial dependency because knowing the Rollno uniquely determines the value of
S-Name, and the relationship is not self-evident or redundant. The dependency
introduces new information about the dataset.
-
Non-trivial dependencies are essential in database design as they contribute to the
normalization process, helping to eliminate redundancy and ensure data integrity.
-
Unlike trivial dependencies, non-trivial dependencies play a crucial role in providing valuable
insights and maintaining the accuracy of data relationships within a relational database.
Transitive and Non-transitive Dependency
Transitive Dependency
-
A transitive dependency occurs when there is an indirect relationship between two attributes
through a third attribute. In other words, if A → B and B → C, then
A → C is a transitive dependency.
-
Example: Consider a relation where EmployeeID → DepartmentID and
DepartmentID → DepartmentName. Here, EmployeeID → DepartmentName is
a transitive dependency, as the department name is indirectly determined by the employee ID
through the relationship with the department ID.
-
Transitive dependencies can lead to data redundancy and are typically addressed through
normalization techniques.
Non-transitive Dependency
-
A non-transitive dependency, on the other hand, exists when there is a direct relationship
between two attributes without involving a third attribute.
-
Example: In a relation where StudentID → StudentName, this is a
non-transitive dependency. The student name is directly determined by the student ID without the
need for an intermediary attribute.
-
Non-transitive dependencies are desirable in database design as they help maintain simplicity
and reduce the risk of data anomalies.
-
Identifying and eliminating transitive dependencies is a key step in the normalization process
to ensure a well-structured and efficient relational database.
Second Normal Form (2NF)
-
A relation is in the second normal form (2NF) if it is in 1NF and all non-prime key attributes are
fully functionally dependent upon the primary key attributes.
Example:
Consider a relation named "Student" with prime key attributes (rollno & games) and non-prime key
attributes (grade, name, and fees).
-
Using the diagram, we can observe that the "name" is dependent on "rollno" only, "fees" is dependent
on "games" only, and "grade" is dependent on both "rollno" and "games."
-
So, according to the definition of 2NF, we can create three tables:
- (Rollno, Games, Grade)
- (Rollno, Name)
- (Games, Fees)
In the new structure, each table represents a distinct functional dependency, ensuring that non-prime
key attributes are fully dependent on the primary key attributes. This adheres to the principles of the
second normal form, promoting data integrity and reducing redundancy in the relational database.
Dependencies & Logical Implication
- Dependencies refer to the relationship where Y is dependent on X, prompting us to explore the
logical consequences and other derived relationships.
We can utilize Armstrong's Axioms to further understand these dependencies:
-
Reflexivity: If Y is a subset of X, then X → Y, signifying the unique
identification of Y using X. This axiom also implies that X → X.
-
Augmentation: If X → Y holds and Z is a set of attributes, then ZX → ZY.
-
Transitivity: If X → Y holds and Y → Z, then X → Z holds.
Additional Axioms:
-
Additivity or Union: If X → Y and X → Z, then X → YZ holds.
-
Projectivity or Decomposition: If X → YX holds, then X → Y and X → Z also hold.
-
Pseudo-Transitivity: If X → Y and ZY → W hold, then XZ → W holds.
These axioms play a crucial role in determining closures, which, in turn, assist us in identifying keys,
particularly in the context of achieving Third Normal Form (3NF).
Closure of Functional Dependencies
- If we have a relation with some attributes and given dependencies, the closure of dependencies
signifies what we can derive from the provided dependencies. This process is facilitated by applying
the Armstrong Axioms, as discussed earlier.
- Closure is represented as (AB)+ = {} where all the derived dependencies are enclosed
within the curly brackets.
The closure, (AB)+, essentially represents the set of all attributes that can be functionally
determined by the attributes A and B, considering the given set of dependencies.
Q→ R = {X, Y, Z, H, D}
F.D = X → YZ, DX → W, Y → H then find closure F+ of FD's.
Solution:
Given Functional Dependencies (FD):
We need to find the closure F+ of these FDs.
- Applying X → YZ, we get X → Y and X → Z.
- Now, considering X → Y and Y → H, applying transitivity, we find X → H.
- No other dependencies can be derived from the given FDs.
Therefore, the closure F+ is: {X, Y, H}
Q→ R = {A, B, C, D, E}
F.D = A → BC, CD → E, B → D, E → A
Find F+
Solution:
Given Functional Dependencies (FD):
- A → BC
- CD → E
- B → D
- E → A
We need to find the closure F+ of these FDs.
- Starting with A → BC, we get A → B and A → C.
- Considering CD → E, no new attributes are added.
- Considering B → D, we add B → D.
- Considering E → A, we add E → A.
- No more attributes can be added. The closure F+ is: {A, B, C, D, E}
Q→ R = {A, B, C, D, E, F, G}
F.D = A → B, BC → DE, AEF → G, E → C, A → E
Solution:
Given Functional Dependencies (FD):
- A → B
- BC → DE
- AEF → G
- E → C
- A → E
We need to find the closure F+ of these FDs.
- Starting with A → B, we get A → B.
- Considering BC → DE, we add BC → D and BC → E.
- Considering AEF → G, we add AEF → G.
- Considering E → C, we add E → C.
- Considering A → E, we add A → E.
- No more attributes can be added. The closure F+ is: {A, B, C, D, E, F, G}
Finding the Candidate Key of a Relation by Closure
- Candidate key: It is the minimal set of attributes whose attribute closure includes all
attributes of the relation; this set is called the candidate key of the relation.
Example: R{A, B, C, D, E}
F.D = AB → C, CD → E, DE → B
Find the candidate key.
Solution:
According to the rules, we don't consider attributes on the right-hand side of the functional
dependencies. In this case, B, C, and E are ignored, leaving us with A and D.
Now, using the AD combination, we will create more combinations such as ADC, ABC, ADE. For each
combination, we will find the attribute closure.
- For {ADC}+ = {A, D, C}
= {A, C, D, E} as CD → E
= {A, B, C, D, E} as DE → B
So, Ck = {ADC}, and it is a candidate key.
- For {ADB}+ = {A, D, B}
= {A, B, C, D}
= {A, B, C, D, E}
So, Ck = {ADB} is also a candidate key.
- For {ADE}+ = {A, D, E}
= {A, B, D, E}
= {A, B, C, D, E}
So, Ck = {ADE} is yet another candidate key.
Convert Relation to 2NF
R(A, B, C, D)
A, B prime key attributes (denoted as Ck)
C, D non-prime key attributes
F.D = AB → D, B → C
In 2NF, all non-prime attributes should be fully functionally dependent on the prime key attributes.
Let's analyze the given relation:
- On the right side, we have AB, so we will find the closure (AB)+ = {A, B, C, D}.
As we are obtaining all the attributes, it qualifies as a candidate key.
- Now, the goal is to convert the relation into 2NF form.
- Observing that D is completely dependent on AB, but C is not completely dependent on AB; it is
only dependent on B. This implies that the relation is not in 2NF, and decomposition is
required.
- We will create two relations:
R1 (ABD), which is in 2NF as all non-prime attributes are fully functionally dependent on
the prime key.
R2 (B, C), where all non-prime attributes are dependent on the prime key according to the
2NF rule.
Third Normal Form (3NF)
-
A relation is in the third normal form (3NF) if it is in 2NF and non-primary key attributes are
non-transitively dependent upon primary key attributes.
-
Alternatively, a relation is in 3NF if it is in 2NF and has no transitive dependencies.
Example:
-
In this example, we have a relation named "Student" with "Rollno" as the primary key attribute and
"name," "semester," and "hostel" as non-prime attributes.
-
Notably, we can determine "name," "semester," and "hostel" using "rollno," and we can also find
"hostel" using "semester" directly.
-
However, the main issue arises when different hostels are allotted to different semesters, creating
a potential transitive dependency.
-
This becomes evident when attempting to update or delete information, leading to problems in
maintaining data integrity.
-
To overcome this problem, we can divide the original relation into two sub-relations:
- R1(Rollno, Name, Semester)
- R2(Semester, Hostel)
By doing this, we eliminate the transitive dependency and represent the data in 3NF. Each
sub-relation now independently reflects meaningful relationships and adheres to the principles of
the third normal form.
Properties of Normalization
Normalization is a crucial process in database design, aiming to minimize redundancy and dependency
issues within relation schemas. While the first three normal forms (1NF, 2NF, 3NF) focus on decomposing
relations based on functional dependencies, achieving a good relational schema requires considering
additional properties.
- Functional Dependency: In 1NF, 2NF, and 3NF, relations are decomposed to minimize
redundancy by organizing data based on functional dependencies. However, relying solely on these
normal forms is insufficient for ensuring an optimal relation schema.
- Additional Properties: To enhance the design, two supplementary properties must
also be satisfied during the decomposition of relations:
- Lossless Join Property: This property ensures that combining decomposed
relations through a natural join results in the original relation without losing any
information. It prevents the creation of spurious tuples and maintains data integrity.
- Dependency Preservation Property: This property ensures that functional
dependencies observed in the original relation are retained after decomposition. It is vital
for preserving the semantics of the data and preventing anomalies that could arise from lost
dependencies.
Lossless Join Property
Picture the Lossless Join Property as taking a big table, splitting it into smaller ones, and then
being able to merge them back together without losing any data. In the database world, this property
ensures that when we decompose a table into smaller tables, we can recreate the original table
without any missing information.
- No Missing Rows: Suppose we have a 'Employees' table split into 'Employee
Names' and 'Employee Departments.' The Lossless Join Property guarantees that combining
'Employee Names' and 'Employee Departments' brings us back to the full 'Employees' table, making
sure no employee details are left out.
- Essential in Normalization: It's a crucial rule in the process of
normalization. When we organize our database, we want to be certain that no data is lost as we
break it down and piece it back together.
Dependency Preservation Property
Now, think of the Dependency Preservation Property as ensuring that the relationships between
different parts of our data remain clear, even when we split them. It's about preserving the
dependencies.
- Preserve Relationships: Consider a 'Orders' table with 'Order ID' and 'Customer
Name.' If we decompose this into 'Order ID' and 'Customer Details,' Dependency Preservation
ensures that the connection between 'Order ID' and 'Customer Name' is maintained.
- Not Always Guaranteed: Unlike Lossless Join, Dependency Preservation isn't
always guaranteed. Sometimes, when we break down our tables, we might lose some connections
between pieces. It's like making sure the dependencies between tables stay intact - it doesn't
always happen perfectly.
Boyce Codd Normal Form (BCNF)
BCNF is a more refined version of the Third Normal Form (3NF) in database normalization. A relation is
said to be in BCNF if and only if all of its determinants are candidate keys.
- BCNF is essentially a stricter form of 3NF, emphasizing the elimination of certain types of
redundancy in a relational database.
- Consider a relation R(A, B, C) with the functional dependencies A → B and B → C. In BCNF,
both A and B must be candidate keys, ensuring that all determinants are keys.
- It's important to note that every relation in BCNF is also in 3NF, but not every relation in 3NF is
necessarily in BCNF.
Example:
Let's take a relation EmployeeSkills(EmployeeID, Skill, Proficiency) with the following functional
dependencies:
- EmployeeID → Skill (An employee is associated with a specific skill)
- Skill → Proficiency (The proficiency level of a skill is determined by the skill itself)
In BCNF, both EmployeeID and Skill must be candidate keys. This ensures that the determinants (EmployeeID
and Skill) are capable of uniquely identifying each tuple in the relation.
Multivalued Dependency (MVD)
Multivalued Dependency (MVD) is a concept in database normalization that deals with relationships
involving multiple values. It occurs when one attribute in a table uniquely determines another
attribute, independent of the other attributes in the table.
- In a relation R(A, B, C), if A determines both B and C independently, it forms a multivalued
dependency.
- MVDs help in identifying and addressing certain types of redundancies that may arise in a relational
database.
- Normalization through MVD is essential for achieving a more structured and efficient database
design.
Example:
Consider a relation StudentCourses(StudentID, Course, Instructor) with the following multivalued
dependency:
- StudentID → Course (A student can be enrolled in multiple courses)
- StudentID → Instructor (A student can have multiple instructors)
In this example, StudentID determines both the Course and the Instructor independently, representing a
multivalued dependency.
Fourth Normal Form (4NF)
Fourth Normal Form (4NF) is an advanced level of normalization that addresses certain complex scenarios
involving multivalued dependencies and composite keys.
- A relation is in 4NF if it is in BCNF and has no non-trivial multivalued dependencies.
- 4NF helps in eliminating redundant data and ensures a high degree of data integrity in a relational
database.
- It is particularly useful when dealing with relations where a combination of attributes determines
multiple independent sets of values.
Example:
Consider a relation EmployeeProjects(EmployeeID, Project, Responsibilities) with the multivalued
dependency:
- EmployeeID → Project (An employee can work on multiple projects)
- EmployeeID → Responsibilities (An employee can have different responsibilities on different
projects)
In 4NF, the relation should be in BCNF and free from non-trivial multivalued dependencies.
5NF
Fifth Normal Form (5NF)
Fifth Normal Form (5NF) is a further extension of normalization that addresses cases where information
can be derived from the combination of multiple multivalued dependencies.
- A relation is in 5NF if it is in 4NF and there are no non-trivial join dependencies.
- 5NF helps in ensuring data consistency and reducing the risk of anomalies that may arise due to
complex relationships in the database.
- It is applied in scenarios where a combination of attributes may determine the existence of related
data in other tables.
Example:
Consider a relation SupplierParts(SupplierID, Part, Location) with the multivalued dependencies:
- SupplierID → Part (A supplier can supply multiple parts)
- Part → Location (A part can be available at different locations)
In 5NF, the relation should be in 4NF, and there should be no non-trivial join dependencies, ensuring a
high level of data integrity and structure.