High Level Data Base Design & Normalization

Features of Relational Database Design

When a database designer is planning to design a database, they consider the process, key points, and methodologies involved in the construction of a relational database.

Relational Database design is based on four key parameters that serve as guidelines for the construction of a database.

1. Semantics of Attributes:

This aspect revolves around the meaning of each individual attribute within the database.
Guideline: Design a relation schema to facilitate easy explanation of its meaning; names should align with the data stored in them.
Do not combine attributes from multiple entity types to avoid confusion.

2. Reducing Redundant Information from Tuples:

Duplicate values should not exist in multiple tuples.
Guideline: Design the database to eliminate insertion, deletion, and updation anomalies.

3. Reducing Null Values in Attributes:

The value of an attribute should not be frequently null.
Guidelines: Avoid including attributes with frequent null values in the relation.

4. Disallowing the Possibility of Generating Spurious Tuples

This refers to avoiding the generation of spurious tuples, meaning undesired results when joining tables.
Guidelines: Design the relational schema to enable joining with equality conditions on common attributes.

Anomalies in Relational Database:-

Refers to undesirable problem occured while designing the database.
Basically we have 3 anomolies (problems):
1. Insertion anamoly
2. Deletion anamoly
3. Updation anamoly

Insertion Anomaly

An insertion anomaly occurs when there is a desire to insert information, but it cannot be added due to certain constraints or restrictions.

Example:

                    
+-------------------------------------------+
| EID |   Name  | Salary | DID | Department |
+-------------------------------------------+
|  1  | Paul    | 10,000 |  1  | A/C        |  
|  2  | John    | 6,000  |  1  | A/C        |  
|  3  | Sharav  | 7,000  |  3  | Sales      |
|  4  | Nike    | 6,000  |  2  | Marketing  |
|  5  | Hudson  | 10,000 |  2  | Marketing  |
|  6  | Smith   | 20,000 |  1  | A/C        |  
|  7  | Paula   | 50,000 |  2  | Marketing  |
|  8  | Maithli | 15,000 |  1  | A/C        |
|     |         |        |  5  | Finance    | 
+-------------------------------------------+

We can't add Finance and DID 5 due to EID constraints, as EID (primary key) cannot be left blank or null.

Deletion Anomaly

A deletion anomaly occurs when we remove some existing information from a database, and this action unintentionally leads to the deletion of other related but necessary data.
Example:

        
+-------------------------------------------+
| EID |   Name  | Salary | DID | Department |
+-------------------------------------------+
|  1  | Paul    | 10,000 |  1  | A/C        |  
|  2  | John    | 6,000  |  1  | A/C        |  
|  3  | Sharav  | 7,000  |  2  | Sales      |
+-------------------------------------------+

If we delete the tuple with EID 3, which is the only occurrence of the "Sales" department, we lose the definition of the "Sales" department entirely.

Updation Anomaly

An updation anomaly occurs when we attempt to update data in a way that introduces inconsistency into the dataset.
Example: Suppose we want to change the Department ID (DID) from 1 to 2 for the A/C department:

                        
+-------------------------------------------+
| EID |   Name  | Salary | DID | Department |
+-------------------------------------------+
|  1  | Paul    | 10,000 |  2  | A/C        |  
|  2  | John    | 6,000  |  1  | A/C        |  
|  3  | Sharav  | 7,000  |  3  | Sales      |
+-------------------------------------------+

This update would create inconsistency because now the Department A/C is associated with both DID 1 and DID 2, making it unreliable and confusing.
For accurate updates, one would need to change the Department ID for all instances of A/C, leading to a more complex and error-prone process.

Normalization: Introduction

Normalization is a systematic process aimed at decomposing or dividing a relational database into multiple relations to eliminate anomalies and improve its overall structure.

It is a step-by-step process, and each step is referred to as a normal form. Normalization is a reversible process.

Benefits of Normalization

Smaller tables with smaller rows:

Normalization results in breaking down large tables into smaller, more manageable ones, reducing redundancy and improving efficiency.

Searching, sorting & creating indexes are faster:

Normalized tables simplify data retrieval operations, leading to faster search, sort, and index creation processes.

Small number of null values & less redundant data:

Normalization aims to minimize null values and redundant data, ensuring a cleaner and more efficient database design.

Data modification anomalies are reduced:

Normalization helps mitigate issues such as insertion, update, and deletion anomalies, promoting data integrity.

Easier to maintain & change as the needs change:

A normalized database is more adaptable to changes in requirements, making it easier to maintain and modify over time.

Reduces data dependency:

Normalization minimizes data redundancy, leading to reduced dependency and ensuring that changes in one part of the database do not adversely affect other parts.

Pros and Cons of Normalization in Databases

Advantages

Reduces data redundancy, minimizing storage space.
Enhances overall organization of the database.
Promotes data consistency within the database.
Facilitates a more flexible database design.
Reduces the number of indexes per table, speeding up maintenance tasks.
Allows for selective table joins, improving query efficiency.
Enhances database security management.
Increases storage efficiency.
Speeds up data access.

Disadvantages

Database design must align with user requirements before development.
Normalization can consume significant CPU, memory, and I/O resources, affecting performance.
Requires more joins for desired results; poorly written queries can impact database performance.
Normalizing relations of a higher degree is time-consuming and challenging.
Careless decomposition may lead to poorly designed databases and serious issues.

Types of Normal Forms:

First Normal Form (1NF):

Focuses on eliminating duplicate data by ensuring that each attribute in a table contains only atomic (indivisible) values.

Second Normal Form (2NF):

Builds on 1NF by addressing partial dependencies. All non-prime attributes must be fully functionally dependent on the primary key.

Third Normal Form (3NF):

Further refines data integrity by removing transitive dependencies. No non-prime attribute should be transitively dependent on the primary key.

Boyce-Codd Normal Form (BCNF):

Aims to eliminate all non-trivial functional dependencies. Every non-trivial functional dependency is a superkey.

Fourth Normal Form (4NF) based on Multivalued Dependency:

Addresses situations where a table has multiple independent multi-valued attributes. It ensures data is organized efficiently.

Fifth Normal Form (5NF) based on Join Dependency:

Deals with scenarios where a table contains multiple overlapping candidate keys, and it aims to eliminate redundancy through the use of join dependencies.

The first four depend on the concept of functional dependency.

What are Functional Dependencies?

Functional dependencies result from the interrelationship between attributes or tuples in any relation.

They play a crucial role in normalization and reduce redundancy.
In relation R, where X and Y are two subsets of sets of attributes, Y is said to be functionally dependent on X if a given set of values for all attributes in X uniquely determines the values of all attributes in Y.
X → Y (Y depends on X or X determines Y)

Example:

                
+-----------+
|  X  |  Y  |
+-----------+
|  a  |  1  |
|  b  |  22 |
|  c  |  65 |
|  a  |  1  |
+-----------+

If X → Y, then for any two tuples (t1 and t2) where t1[X] = t2[X], it follows that t1[Y] = t2[Y]. For every unique value of X, we get a unique value from Y.
a → 1
X is the determinant, and Y is dependent.

Question: Is X → Y?

                
+-----------+
|  X  |  Y  |
+-----------+
|  a  |  2  |
|  b  |  2  |
|  a  |  2  |
|  b  |  2  |
|  c  |  2  |
+-----------+

For every 'a' in X, we get Y = 2, and the same holds for 'b'; however, 'c' won't affect anything.
a → 2, b → 2, c → 2

Another example:

                    
+---------------------------+
|  EID  |   Name  |  Salary |
+---------------------------+
|   1   | Aditya  |  15000  |
|   2   |  Manoj  |  16000  |
|   3   | Sandeep |  9000   |
|   4   |  Vikas  |  10000  |
|   5   |  Manoj  |  9000   |
+---------------------------+

Consider the functional dependency:

X(EID) → Y(Name, Salary)

Which means X(EID) → Y(Name)
X(EID) → Y(Salary)

Vice versa is not true. For instance, when finding the name of a person with a salary of 9000, we might get two people, but when searching for the name of the person with ID 2, we get only one person.

First Normal Form - 1NF

A relation is in 1NF if the domain of each attribute contains only atomic values.

Examples:

        
+-------------------------------------+
|  ID  |  Name  |  Age  |     Phone   |
+-------------------------------------+
|   1  |    x   |  32   |     666     |
|   2  |    y   |  65   |     777     |
|   3  |    z   |  42   |  888, 222   |
+-------------------------------------+

ID 3 has two phone numbers, and we shouldn't have multiple values in a single attribute. To convert this to 1NF form, we can repeat the row like this ↓

                    
+-------------------------------------+
|  ID  |  Name  |  Age  |     Phone   |
+-------------------------------------+
|   1  |    x   |  32   |     666     |
|   2  |    y   |  65   |     777     |
|   3  |    z   |  42   |     888     |
|   3  |    z   |  42   |     222     |
+-------------------------------------+

Now, each attribute contains only atomic values, adhering to the 1NF requirements.

We can't even have composite attributes as it violates the atomic property for 1NF. For example ↓

                    
+----------------------------------------+
|  ID  |  Name  |  Address               |
+----------------------------------------+
|   1  |    x   |  74A, Block 7, Delhi   |
|   2  |    y   |  664b, Block 8, Delhi  |
+----------------------------------------+

Converting it to 1NF form by distributing the address attribute into sub-attributes.

                    
+--------------------------------------------+
|  ID  |  Name  |  Street  |  Block | State  |
+--------------------------------------------+
|   1  |    x   |   74A    | Block 7 | Delhi |
|   2  |    y   |   664b   | Block 8 | Delhi |
+--------------------------------------------+

Working with derived attributes:

As derived attribute values can be easily calculated at runtime, they are removed in 1NF form.

                    
+-----------------------------------------+
|  ID  |  Name  |      DOJ       |  WExp  |
+-----------------------------------------+
|   1  |    x   |   16 May 2018  |   3    |
|   2  |    y   |   14 Aug 2019  |   2    |
+-----------------------------------------+

Converting to 1NF form ↓

                    
+-------------------------------+
|  ID  |  Name  |      DOJ      |
+-------------------------------+
|   1  |    x   |   16 May 2018 |
|   2  |    y   |   14 Aug 2019 |
+-------------------------------+

Prime and Non-Prime Attributes

Prime or Key Attribute

For a given relation R = {A1, A2, A3, A4, ..., An}, an attribute a is a prime attribute if A is part of any candidate key of R.
Alternatively, an attribute is considered prime if it is part of a key.

Non-Prime or Non-Key Attribute:

If an attribute is not part of any key, it is classified as a non-prime attribute.

Example:

Relation Schema:

Relation Schema: Student(Rollno, S-Name, S-Address, S-DOB, S-Fees, S-Course)
Now, let's determine which attributes can serve as candidate keys.
- Since Rollno uniquely identifies each student, it qualifies as a candidate key: Rollno → C_k.
- The combination [S-Name + S-Address] can also uniquely identify a student: [S-Name + S-Address] → C_k (This combination can also identify a unique value).
- Therefore, Rollno, S-Name, and S-Address are considered prime attributes as they are either individually or collectively part of a candidate key.
- On the other hand, S-DOB, S-Fees, and S-Course are classified as non-prime key attributes since they do not contribute to the identification of unique tuples.

Fully Functional Dependency:

In the context of database normalization, a fully functional dependency refers to the condition where a non-prime key attribute (denoted as A) is entirely dependent on all prime key attributes. In other words, the value of A is uniquely determined by the combination of values in all prime key attributes.

Example:

Consider a relation named "Student" with prime key attributes (rollno & games) and non-prime key attributes (grade, name, and fees).

For illustration purposes:

Name: This attribute can be uniquely identified using only the "rollno."
Fees: Similarly, the "fees" attribute can be uniquely identified using only the "games."
Grade: However, the "grade" attribute requires knowledge of both "rollno" and "games" to uniquely identify it. In this scenario, the grade is considered fully dependent on both prime key attributes (rollno and games).

This exemplifies a fully functional dependency, where certain non-prime key attributes rely on the entirety of the prime key for their uniqueness.

Partial Dependency

Partial Dependency occurs when you have more than one prime key attribute, and a non-prime key attribute is not dependent on all prime key attributes.

Example:

Consider a relation named "Student" with prime key attributes (rollno & games) and non-prime key attributes (grade, name, and fees).

Name: The "name" attribute, in this case, can be retrieved solely using the "rollno." Since it is not dependent on both prime key attributes (rollno and games), it is considered a partial dependency.
Fees: Similarly, the "fees" attribute can be uniquely identified using only the "games" attribute. This situation also represents a partial dependency since it does not depend on both prime key attributes.
Grade: Unlike the previous examples, the "grade" attribute requires knowledge of both "rollno" and "games" for unique identification. Therefore, it does not exhibit partial dependency and is fully dependent on the entire combination of prime key attributes.

This provides a more comprehensive view of partial dependency within the context of a relational database, emphasizing that certain non-prime key attributes depend on only a subset of the prime key attributes for their uniqueness.

Trivial and Non-trivial Dependency

Trivial Functional Dependency

If A → B and B ⊆ A, then this dependency is a Trivial Functional Dependency. In simpler terms, if the right-hand side (B) is a subset of the left-hand side (A), the dependency is considered trivial.
A → A and B → B are also categorized as Trivial Dependencies. These cases express that an attribute is functionally dependent on itself, which is inherently evident.
Example: Let's consider a relation where Rollno → Rollno and Rollno, S-name → S-Name. In the first case, Rollno → Rollno is trivial because it merely states that you can find the value of Rollno from itself, which is self-evident and doesn't provide new information. The second case, Rollno, S-name → S-Name, is also trivial as it essentially says that knowing both Rollno and S-name, you can find the value of S-Name. These examples highlight situations where the dependency is obvious and doesn't contribute additional insights.
Trivial dependencies are characterized by their predictability and lack of novelty, making them less informative in the context of database relationships.

Non-trivial Functional Dependency

If A → B and B is not a subset of A, then this dependency is a Non-trivial Functional Dependency. In other words, the right-hand side (B) provides new information beyond what is already known from the left-hand side (A).
Example: Consider a relation where Rollno → S-Name. This is a non-trivial dependency because knowing the Rollno uniquely determines the value of S-Name, and the relationship is not self-evident or redundant. The dependency introduces new information about the dataset.
Non-trivial dependencies are essential in database design as they contribute to the normalization process, helping to eliminate redundancy and ensure data integrity.
Unlike trivial dependencies, non-trivial dependencies play a crucial role in providing valuable insights and maintaining the accuracy of data relationships within a relational database.

Transitive and Non-transitive Dependency

Transitive Dependency

A transitive dependency occurs when there is an indirect relationship between two attributes through a third attribute. In other words, if A → B and B → C, then A → C is a transitive dependency.
Example: Consider a relation where EmployeeID → DepartmentID and DepartmentID → DepartmentName. Here, EmployeeID → DepartmentName is a transitive dependency, as the department name is indirectly determined by the employee ID through the relationship with the department ID.
Transitive dependencies can lead to data redundancy and are typically addressed through normalization techniques.

Non-transitive Dependency

A non-transitive dependency, on the other hand, exists when there is a direct relationship between two attributes without involving a third attribute.
Example: In a relation where StudentID → StudentName, this is a non-transitive dependency. The student name is directly determined by the student ID without the need for an intermediary attribute.
Non-transitive dependencies are desirable in database design as they help maintain simplicity and reduce the risk of data anomalies.
Identifying and eliminating transitive dependencies is a key step in the normalization process to ensure a well-structured and efficient relational database.

Second Normal Form (2NF)

A relation is in the second normal form (2NF) if it is in 1NF and all non-prime key attributes are fully functionally dependent upon the primary key attributes.

Example:

Consider a relation named "Student" with prime key attributes (rollno & games) and non-prime key attributes (grade, name, and fees).

Using the diagram, we can observe that the "name" is dependent on "rollno" only, "fees" is dependent on "games" only, and "grade" is dependent on both "rollno" and "games."
So, according to the definition of 2NF, we can create three tables:
1. (Rollno, Games, Grade)
2. (Rollno, Name)
3. (Games, Fees)

In the new structure, each table represents a distinct functional dependency, ensuring that non-prime key attributes are fully dependent on the primary key attributes. This adheres to the principles of the second normal form, promoting data integrity and reducing redundancy in the relational database.

Dependencies & Logical Implication

Dependencies refer to the relationship where Y is dependent on X, prompting us to explore the logical consequences and other derived relationships.

We can utilize Armstrong's Axioms to further understand these dependencies:

Reflexivity: If Y is a subset of X, then X → Y, signifying the unique identification of Y using X. This axiom also implies that X → X.
Augmentation: If X → Y holds and Z is a set of attributes, then ZX → ZY.
Transitivity: If X → Y holds and Y → Z, then X → Z holds.

Additional Axioms:

Additivity or Union: If X → Y and X → Z, then X → YZ holds.
Projectivity or Decomposition: If X → YX holds, then X → Y and X → Z also hold.
Pseudo-Transitivity: If X → Y and ZY → W hold, then XZ → W holds.

These axioms play a crucial role in determining closures, which, in turn, assist us in identifying keys, particularly in the context of achieving Third Normal Form (3NF).

Closure of Functional Dependencies

If we have a relation with some attributes and given dependencies, the closure of dependencies signifies what we can derive from the provided dependencies. This process is facilitated by applying the Armstrong Axioms, as discussed earlier.
Closure is represented as (AB)⁺ = {} where all the derived dependencies are enclosed within the curly brackets.

The closure, (AB)⁺, essentially represents the set of all attributes that can be functionally determined by the attributes A and B, considering the given set of dependencies.

Q→ R = {X, Y, Z, H, D}
F.D = X → YZ, DX → W, Y → H then find closure F⁺ of FD's.

Solution:
Given Functional Dependencies (FD):

X → YZ
DX → W
Y → H

We need to find the closure F⁺ of these FDs.

Applying X → YZ, we get X → Y and X → Z.
Now, considering X → Y and Y → H, applying transitivity, we find X → H.
No other dependencies can be derived from the given FDs.

Therefore, the closure F⁺ is: {X, Y, H}

Q→ R = {A, B, C, D, E}
F.D = A → BC, CD → E, B → D, E → A
Find F⁺

Solution:
Given Functional Dependencies (FD):

A → BC
CD → E
B → D
E → A

We need to find the closure F⁺ of these FDs.

Starting with A → BC, we get A → B and A → C.
Considering CD → E, no new attributes are added.
Considering B → D, we add B → D.
Considering E → A, we add E → A.
No more attributes can be added. The closure F⁺ is: {A, B, C, D, E}

Q→ R = {A, B, C, D, E, F, G}
F.D = A → B, BC → DE, AEF → G, E → C, A → E

Solution:
Given Functional Dependencies (FD):

A → B
BC → DE
AEF → G
E → C
A → E

We need to find the closure F⁺ of these FDs.

Starting with A → B, we get A → B.
Considering BC → DE, we add BC → D and BC → E.
Considering AEF → G, we add AEF → G.
Considering E → C, we add E → C.
Considering A → E, we add A → E.
No more attributes can be added. The closure F⁺ is: {A, B, C, D, E, F, G}

Finding the Candidate Key of a Relation by Closure

Candidate key: It is the minimal set of attributes whose attribute closure includes all attributes of the relation; this set is called the candidate key of the relation.

Example: R{A, B, C, D, E}
F.D = AB → C, CD → E, DE → B
Find the candidate key.

Solution:
According to the rules, we don't consider attributes on the right-hand side of the functional dependencies. In this case, B, C, and E are ignored, leaving us with A and D.
Now, using the AD combination, we will create more combinations such as ADC, ABC, ADE. For each combination, we will find the attribute closure.

For {ADC}₊ = {A, D, C}
= {A, C, D, E} as CD → E
= {A, B, C, D, E} as DE → B
So, C_k = {ADC}, and it is a candidate key.
For {ADB}₊ = {A, D, B}
= {A, B, C, D}
= {A, B, C, D, E}
So, C_k = {ADB} is also a candidate key.
For {ADE}₊ = {A, D, E}
= {A, B, D, E}
= {A, B, C, D, E}
So, C_k = {ADE} is yet another candidate key.

Convert Relation to 2NF

R(A, B, C, D)
A, B prime key attributes (denoted as C_k)
C, D non-prime key attributes
F.D = AB → D, B → C

In 2NF, all non-prime attributes should be fully functionally dependent on the prime key attributes. Let's analyze the given relation:

On the right side, we have AB, so we will find the closure (AB)⁺ = {A, B, C, D}.
As we are obtaining all the attributes, it qualifies as a candidate key.
Now, the goal is to convert the relation into 2NF form.
Observing that D is completely dependent on AB, but C is not completely dependent on AB; it is only dependent on B. This implies that the relation is not in 2NF, and decomposition is required.
We will create two relations:
R1 (ABD), which is in 2NF as all non-prime attributes are fully functionally dependent on the prime key.
R2 (B, C), where all non-prime attributes are dependent on the prime key according to the 2NF rule.

Third Normal Form (3NF)

A relation is in the third normal form (3NF) if it is in 2NF and non-primary key attributes are non-transitively dependent upon primary key attributes.
Alternatively, a relation is in 3NF if it is in 2NF and has no transitive dependencies.

Example:

In this example, we have a relation named "Student" with "Rollno" as the primary key attribute and "name," "semester," and "hostel" as non-prime attributes.
Notably, we can determine "name," "semester," and "hostel" using "rollno," and we can also find "hostel" using "semester" directly.
However, the main issue arises when different hostels are allotted to different semesters, creating a potential transitive dependency.
This becomes evident when attempting to update or delete information, leading to problems in maintaining data integrity.
To overcome this problem, we can divide the original relation into two sub-relations:
1. R1(Rollno, Name, Semester)
2. R2(Semester, Hostel)
By doing this, we eliminate the transitive dependency and represent the data in 3NF. Each sub-relation now independently reflects meaningful relationships and adheres to the principles of the third normal form.

Properties of Normalization

Normalization is a crucial process in database design, aiming to minimize redundancy and dependency issues within relation schemas. While the first three normal forms (1NF, 2NF, 3NF) focus on decomposing relations based on functional dependencies, achieving a good relational schema requires considering additional properties.

Functional Dependency: In 1NF, 2NF, and 3NF, relations are decomposed to minimize redundancy by organizing data based on functional dependencies. However, relying solely on these normal forms is insufficient for ensuring an optimal relation schema.
Additional Properties: To enhance the design, two supplementary properties must also be satisfied during the decomposition of relations:
1. Lossless Join Property: This property ensures that combining decomposed relations through a natural join results in the original relation without losing any information. It prevents the creation of spurious tuples and maintains data integrity.
2. Dependency Preservation Property: This property ensures that functional dependencies observed in the original relation are retained after decomposition. It is vital for preserving the semantics of the data and preventing anomalies that could arise from lost dependencies.

Lossless Join Property

Picture the Lossless Join Property as taking a big table, splitting it into smaller ones, and then being able to merge them back together without losing any data. In the database world, this property ensures that when we decompose a table into smaller tables, we can recreate the original table without any missing information.

No Missing Rows: Suppose we have a 'Employees' table split into 'Employee Names' and 'Employee Departments.' The Lossless Join Property guarantees that combining 'Employee Names' and 'Employee Departments' brings us back to the full 'Employees' table, making sure no employee details are left out.
Essential in Normalization: It's a crucial rule in the process of normalization. When we organize our database, we want to be certain that no data is lost as we break it down and piece it back together.

Dependency Preservation Property

Now, think of the Dependency Preservation Property as ensuring that the relationships between different parts of our data remain clear, even when we split them. It's about preserving the dependencies.

Preserve Relationships: Consider a 'Orders' table with 'Order ID' and 'Customer Name.' If we decompose this into 'Order ID' and 'Customer Details,' Dependency Preservation ensures that the connection between 'Order ID' and 'Customer Name' is maintained.
Not Always Guaranteed: Unlike Lossless Join, Dependency Preservation isn't always guaranteed. Sometimes, when we break down our tables, we might lose some connections between pieces. It's like making sure the dependencies between tables stay intact - it doesn't always happen perfectly.

Boyce Codd Normal Form (BCNF)

BCNF is a more refined version of the Third Normal Form (3NF) in database normalization. A relation is said to be in BCNF if and only if all of its determinants are candidate keys.

BCNF is essentially a stricter form of 3NF, emphasizing the elimination of certain types of redundancy in a relational database.
Consider a relation R(A, B, C) with the functional dependencies A → B and B → C. In BCNF, both A and B must be candidate keys, ensuring that all determinants are keys.
It's important to note that every relation in BCNF is also in 3NF, but not every relation in 3NF is necessarily in BCNF.

Example:

Let's take a relation EmployeeSkills(EmployeeID, Skill, Proficiency) with the following functional dependencies:

EmployeeID → Skill (An employee is associated with a specific skill)
Skill → Proficiency (The proficiency level of a skill is determined by the skill itself)

In BCNF, both EmployeeID and Skill must be candidate keys. This ensures that the determinants (EmployeeID and Skill) are capable of uniquely identifying each tuple in the relation.

Multivalued Dependency (MVD)

Multivalued Dependency (MVD) is a concept in database normalization that deals with relationships involving multiple values. It occurs when one attribute in a table uniquely determines another attribute, independent of the other attributes in the table.

In a relation R(A, B, C), if A determines both B and C independently, it forms a multivalued dependency.
MVDs help in identifying and addressing certain types of redundancies that may arise in a relational database.
Normalization through MVD is essential for achieving a more structured and efficient database design.

Example:

Consider a relation StudentCourses(StudentID, Course, Instructor) with the following multivalued dependency:

StudentID → Course (A student can be enrolled in multiple courses)
StudentID → Instructor (A student can have multiple instructors)

In this example, StudentID determines both the Course and the Instructor independently, representing a multivalued dependency.

Fourth Normal Form (4NF)

Fourth Normal Form (4NF) is an advanced level of normalization that addresses certain complex scenarios involving multivalued dependencies and composite keys.

A relation is in 4NF if it is in BCNF and has no non-trivial multivalued dependencies.
4NF helps in eliminating redundant data and ensures a high degree of data integrity in a relational database.
It is particularly useful when dealing with relations where a combination of attributes determines multiple independent sets of values.

Example:

Consider a relation EmployeeProjects(EmployeeID, Project, Responsibilities) with the multivalued dependency:

EmployeeID → Project (An employee can work on multiple projects)
EmployeeID → Responsibilities (An employee can have different responsibilities on different projects)

In 4NF, the relation should be in BCNF and free from non-trivial multivalued dependencies.

5NF

Fifth Normal Form (5NF)

Fifth Normal Form (5NF) is a further extension of normalization that addresses cases where information can be derived from the combination of multiple multivalued dependencies.

A relation is in 5NF if it is in 4NF and there are no non-trivial join dependencies.
5NF helps in ensuring data consistency and reducing the risk of anomalies that may arise due to complex relationships in the database.
It is applied in scenarios where a combination of attributes may determine the existence of related data in other tables.

Example:

Consider a relation SupplierParts(SupplierID, Part, Location) with the multivalued dependencies:

SupplierID → Part (A supplier can supply multiple parts)
Part → Location (A part can be available at different locations)

In 5NF, the relation should be in 4NF, and there should be no non-trivial join dependencies, ensuring a high level of data integrity and structure.