× back
Next Topic →

Introduction

Basic terms in DBMS

Data

  • Data is a collection of raw, unorganized facts and details like text, obervations, figures, symbols, and descriptions of things etc.
  • In other words, data does not carry specific purpose and has no significance by itself.
  • Moreover, data is measured in terms of bits and bytes - which are basic units of information in the context of computer storage and processing.
  • Data can be recovered and doesn't have any meaning unless processed.

Types of data

  1. Quantitative
    1. Numerical form
    2. Weight, volume, cost of an item
  2. Qualitative
    1. Descriptive, but not numerical
    2. Name, gender, hair color of a person

Information

  • Information is processed, organized and structured data.
  • It provides context of the data and enables decision making.
  • Processed data that make sense to us.
  • Information is extracted from the data by analyzing and interpreting pieces of data.
  • E.g., you have data of all the people living in your locality, its data, when you analyze and interpret the data and come to some conclusion that:
    1. There are 100 senior citizens
    2. The sex ratio is 1:1
    3. Newborn babies are 100
    These are information.

Data vs Information

  • Data is a collection of facts, while information puts those facts into context.
  • While data is raw and unorganized, information is organized.
  • Data points are individual and sometimes unrelated. Information maps out that data to provide a big-picture view of how it all fits together.
  • Data, on its own, is meaningless. When it’s analyzed and interpreted, it becomes meaningful information.
  • Data does not depend on information; however, information depends on data.
  • Data typically comes in the form of graphs, numbers, figures, or statistics. Information is typically presented through words, language, thoughts, and ideas.
  • Data isn’t sufficient for decision-making, but you can make decisions based on information.

What is Database?

What is DBMS?

DBMS is the combination of two words : DATABASE & MANAGEMENT SYSTEM

  • A database is a collection of related information stored, so that it is available to many users for different purpose.
  • Database management system is a collection of programs that enables users to create and maintain the database.
  • DBMS can also be defined as an interface between the Application program and the operating system to access and manipulate that database.
  • Database management system is a software which is used to manage the database. For example: MYSQL, Oracle, etc are a very popular commercial database which is used in different applications.

Characteristics and Applications of DBMS

Applications of DBMS

Advantages and Disadvantages of DBMS

Advantages

  • Data Independence: Data independence is the concept that separates the way data is stored from the way it is accessed and used. It ensures that changes made to the structure of the database (schema) do not affect the applications using the data. There are two types of data independence: logical and physical.
    • Logical data independence allows modifications to the logical structure of the database without impacting the applications. For instance, if a new attribute needs to be added to a table or a relationship between tables is altered, applications accessing the data won't need to be rewritten. This flexibility is vital for adapting to changing business requirements without disrupting the existing software.
    • Physical data independence allows changes in the physical storage details of the database without affecting how data is accessed. For instance, the DBMS can decide to store data on different storage devices, reorganize data files for efficiency, or even change the file format used for storage. These changes can be made transparently to applications, ensuring that their operations remain unaffected.
  • Efficient Data Access: Efficient data access is a cornerstone of a well-designed DBMS, ensuring that data retrieval and manipulation are performed optimally.
    • Indexing: Indexes are data structures that speed up data retrieval by providing a quick way to locate specific rows in a table based on certain columns. For example, a database can use a B-tree index to efficiently find records matching a certain value without scanning the entire table. It uses indexing, query optimization, and caching techniques to speed up data access. This is especially important when dealing with large datasets, as it reduces the time it takes to retrieve information.
    • Query Optimization: Query optimization is the process of determining the most efficient way to execute a query. The DBMS's query optimizer analyzes different execution plans and selects the one that minimizes the time and resources needed to fetch the required data.
    • Caching: Caching involves storing frequently accessed data in memory. This reduces the need to fetch data from slower storage devices, such as hard drives, every time it's requested. Caching mechanisms enhance performance by providing faster data access for common queries.
  • Data Integrity: Data integrity refers to the accuracy, consistency, and reliability of the data stored in the database. A DBMS enforces data integrity constraints, such as uniqueness, referential integrity, and data validation rules. This ensures that the data stored in the database remains reliable and accurate over time.
  • Concurrent Access to Data: In a multi-user environment, multiple users or applications might need to access the same data simultaneously. A DBMS manages concurrent access by providing mechanisms like locking and transaction management. This ensures that data remains consistent even when accessed by multiple users concurrently.
  • Ensures Data Recovery: A DBMS includes features for data backup and recovery. Regular backups and transaction logs enable recovery in case of hardware failures, system crashes, or human errors. This helps prevent data loss and ensures business continuity.
  • Data Security: Data security is crucial to protect sensitive and confidential information. A DBMS provides security mechanisms to control who can access the data and what operations they can perform on it. This includes user authentication, authorization, and encryption of data stored in the database.
  • Control database redundancy → It can control data redundancy because it stores all the data in one single database file and that recorded data is placed in the database.
    • Database redundancy refers to the situation where the same data is stored multiple times in a database. Redundancy can lead to various issues, such as increased storage requirements, data inconsistency (due to multiple copies of the same data being updated differently), and difficulties in maintaining data accuracy.
  • Data sharing → In DBMS the authorized users of an organization can share the data among multiple users.
  • Easily maintenance → It can be easily maintained due to the centralized nature of the database system.
  • Reduce time → It reduces development time and maintenance need.
  • Backup → It provides backup and recovery subsystems which create automatic backup of data from hardware and software failure and restores the data if required.
  • Multiple user interface → It provides different types of user interface like graphical user interfaces, application program interfaces.

Disadvantages of DBMS

  • Cost of hardware and software → It requires a high speed of data processor and large memory size to run DBMS software.
  • Size → It occupies a large space of disks and large memory to run them efficiently.
  • Complexity → Database system creates additional complexity and requirements.
  • Higher impact of failure → Failure highly impacts the database because in most of the organization all the data stored in a single database and if the database is damaged due to electric failure or database corruption then the data may be lost forever.

Database system vs File system

Disadvantages of file system

  • Data redundancy and inconsistency : In file systems, the same data may be stored in multiple files, leading to redundancy. This redundancy can result in inconsistencies when updates are made to the same data in different files, causing discrepancies and errors.
  • Difficulty in Accessing Data : Retrieving specific data from file systems can be challenging. Users need to know the exact location and structure of files, leading to complex code and slower data access times.
  • Data isolation : In file systems, data is scattered across various files and formats, making it hard to maintain consistency and integrity. Changes to data structures can lead to issues in multiple places.
  • Integrity problems : Maintaining data integrity (accuracy and consistency) is difficult in file systems since there is no centralized control over data. Unauthorized changes and errors can occur easily.
  • Atomicity problems : File systems lack mechanisms to ensure atomicity, which is the property that guarantees a series of operations are treated as a single, indivisible unit. If an operation fails midway, there's no way to ensure the database is left in a consistent state.
  • Concurrent access anomalies : In file systems, concurrent access to data by multiple users can lead to inconsistent outcomes. Race conditions and conflicts can arise, causing incorrect or unexpected results.
  • Security problems : File systems have limited security features. Unauthorized access to files and data can occur, leading to potential data breaches and privacy violations.

The above mentioned disadvantages of file system are advantages of Database system

Advantages of Database system

  • Data Redundancy and Inconsistency Resolution: DBMS minimizes redundancy through normalization, which eliminates data duplication. Centralized control ensures consistent updates, reducing inconsistencies.
  • Efficient Data Access: DBMS provides query optimization and indexing techniques, making data retrieval more efficient. Users can use a common query language to access data, simplifying the process.
  • Data Integration: DBMS integrates data in a centralized repository, enabling data isolation and easier management. Changes to data structures can be made systematically.
  • Data Integrity Enforcement: DBMS enforces data integrity through constraints, triggers, and validation rules. It ensures that data remains accurate and consistent.
  • Atomicity and Transaction Management: DBMS ensures atomicity by using transactions. If a transaction fails, the database can be rolled back to a consistent state before the transaction started.
  • Concurrency Control: DBMS manages concurrent access through locking mechanisms and isolation levels, avoiding conflicts and maintaining data integrity.
  • Enhanced Security: DBMS offers robust security features, including user authentication, access control, and encryption, safeguarding data from unauthorized access.

Types of Databases

Centralized Database

  • It is the type of database that stores data at a centralized database system.
  • It comforts the users to access the stored data from different location through several applications.
  • These applications contain the authentication process to let users access data securely.
  • An example of a centralized database can be central library that carries a central database of each library in a college/university.

Distributed Database

  • In distributed database, data is distributed among different database system of an organization.
  • These database system are connected via communication links such links help the end-uers to access the data easily.
  • It is divided into two subpart:
    • Distributed Database:
      1. Homogeneous Database
      2. Heterogeneous Database

Relational Database

  • It stores data in the form of rows (tuple) and columns (attributes) and together forms a table (relation).
  • A relational database uses SQL for storing manipulating, as well as maintianing the data.
  • Each table in the databse carries a key that makes the data unique from others.
  • Examples of relational databases are MYSQL, Microsoft SQL Server, Oracle, etc.

Non-SQL Database

  • Non SQL is a type of database that is used for storing a wide range of data sets.
  • It is not relational database as it stores data not only in tabular form but in several different ways.
  • It is also divided into four sub part:
    1. Key-value storage
    2. Document-oriented Database
    3. Graph Database
    4. Wide-column Store

Cloud Database

  • It is a type of database where data is stored in a virtual environment and is stored in a virtual environment and executes over the cloud computing platform.
  • It provides users with various cloud computing services (Saas, Paas, Laas, etc) for accessing the database.
  • Some example of cloud database such as amazon web services (AWS), Microsoft Azure, Google Cloud SQL etc.

Object Oriented Database

  • The type of database that uses the object-based data model approah for storing data in the database system.
  • The data is represented and stored as objects which are similar to the objects used in the object-oriented programming langugae.
                    
                                               Root
                                                |
                     ___________________________|________________________
                    |                                                   |
                 Level 1                                              Level 2
                    |                                                   |
    |-------------------------|                             |-----------------------| 
Level 1.1                 Level 1.2                     Level 2.1                Level 2.2
                    
                
  • Data get stored in the form of records that are connected via links.
  • Each child record in the tree will contain only one parent one the other hand, each parent record can have multiple child records.

Network Database

  • It is the database that typically follows the network data model.
  • This is the representation of data is in the form of nodes connected via links between them.
  • It allows each record to have multiple children and parent nodes to form a generalized graph structure.

What is RDBMS

Table/relation

  • Eveything in a relational database is stored in the form of relations.
  • The RDBMS database uses tables to store data.
  • A table is a collection of related data entries and contains rows and columns to store data.

Properties of Relational tables

  • Value are atomic.
  • Column value are of the same kind.
  • Each row is unique.
  • Each column has a unique name.
  • The sequence of rows is insignificance:
    • "The sequence of rows is insignificant" refers to the idea that the order in which rows are stored or retrieved from a table does not inherently carry any meaning or significance. In other words, the arrangement of rows in a table doesn't affect the correctness or interpretation of the data. This property is a fundamental characteristic of relational databases and is often referred to as "unordered" or "set-oriented" data storage.
  • The sequence of columns is insignificance:
    • "The sequence of columns is insignificant" means that the order in which columns are defined within a table does not impact the meaning or interpretation of the data stored in that table.

Row or record or tuple

  • A row of a table is also called a record or tuple.
  • Row contains the specific information of each entry in the table.
  • It is a horizontal entity in the table.

Properties of a row

  • No two tubles are identical to each other in all their entries.
  • All tuples of the relation have the same format and the same number of entries.
  • The order of the tuple is irrelevant.
  • They are identified by their content, not by their position.

Column or attributes or fields

  • A column is a vertial entity in the table which contains all information associated with a specific field in a table.

Properties of an attributes

  • Every attribute of a relation must have a name.
  • Null values are permitted for the attributes.
  • Default values can be specified for an attribute automatically inserted if no other value is specified for an attribute.
  • Attribute that uniquely identify each tuple of a relation are the primary key.

Data item/cells

  • The smallest unit of data in the table is the individual data item.
  • It is stored at the intersection of tuples and attributes.

Degree

  • The total number of attributes that comprise a relation is known as the degree of the table.

Cardinality

  • The total number of tuples at any one time in a relation is known as the table's cardinality.
  • The relation whose cardinality is zero is called an empty table.

Domain

  • The domain refers to the possible values each attribute can contain.
  • It can be specified using standard data types such as integers, floaing number etc.

Dr. E. F. Codd's Rules for RDBMS (not in syllabus)

12 rules ↓

  1. Information Rule:
    • All information in the database is to be represented in a tabular form.
    • Data is stored in tables consisting of rows and columns, ensuring a clear and structured representation.
  2. Guaranteed Access Rule:
    • Each data value in the database is guaranteed to be accessible using a combination of the table name, primary key value, and column name.
    • Provides a systematic and reliable way to retrieve specific data without ambiguity.
  3. Systematic Treatment of Null Values:
    • The DBMS must allow representation of missing or unknown data using a special value called "null."
    • Ensures consistent handling of missing information and prevents confusion between actual data and missing data.
  4. Dynamic Online Catalog Based on the Relational Model:
    • The structure description of the entire database must be stored in an online catalog, known as data dictionary, which can be accessed by authorized users.
    • Users can use the same query language to access the catalog which they use to access the database itself.
  5. Comprehensive Data Sublanguage Rule:
    • The DBMS must support a comprehensive language for defining, querying, and manipulating data.
    • SQL (Structured Query Language) is an example of such a language that provides a standardized way to interact with relational databases.
  6. View Updating Rule:
    • All views that are theoretically updatable must also be updatable by the system.
    • Guarantees that views created by users can be modified using the same commands as regular tables.
  7. High-Level Insert, Update and Delete:
    • The DBMS must support operations to insert, update, and delete data using high-level, declarative commands.
    • Users should be able to modify data without needing to specify the physical implementation details.
  8. Physical Data Independence:
    • The DBMS must maintain a clear separation between the logical and physical aspects of data storage.
    • Users and applications are shielded from changes in the physical storage structure, allowing for flexibility in adapting to hardware changes.
  9. Logical Data Independence:
    • Modifications to the logical structure (schema) of the database should not affect the existing applications.
    • Users and applications can evolve independently of changes in the database schema.
  10. Integrity Independence:
    • Constraints ensuring data integrity should be defined separately from application programs.
    • Data integrity rules are maintained by the DBMS, preventing accidental violations of constraints.
  11. Distribution Independence:
    • The distribution of data across multiple locations should be invisible to users.
    • Users interact with the database as if it were a centralized system, even when data is distributed across different nodes.
  12. Nonsubversion Rule:
    • The DBMS must not allow direct modification of data bypassing the defined integrity rules and constraints.
    • Ensures that the integrity of the data is maintained by enforcing constraints during all operations.

View of Data

Data Abstraction

  • As data in database are stored with very complex data structure. So when user comes and want to access any data, he won't be able to access data if he has to go through this data structure. So to simplify the interaction of user and database. DBMS hides some informations of user and database.
  • DBMS hides some informations which is not of user interest, this is called Data abstraction.
  • So developer hides complexity from user and show abstract view of data.

Data abstraction has three level of abstraction

  1. Physical level / Internal level
  2. Logical level / Conceptual level
  3. View level / External level

  • Physical level → This is the lowest level of data abstraction which describe how data is actually stored in the database. This level basically describe the data structure and access path / indexing used for accessing file.
  • Logical Level → The next level of abstraction describes what data are stored in the database and what relationship exist among those data.
  • View Level → In this level user only interact with database. The complexity remain hidden. User see data and there may be many views of one data like in the form of chart, graph.

Lets suppose we have customer information so at physical level these records [customer information] can be described as block of storage.
At the logical level these records can be described as fields and attributes along with their datatype and relationship among each other.
At view level user just interact with system with the help of GUI (Graphical User Interface) and enters the detail at the screen. (user is not aware of what and how data is stored)

Schema in DBMS

What is Schema?

  • Design of database is called the schema.
  • It is a basically skeleton structure that represents the logical view of the entire database.
  • It defines how data is organized and how the relationship among them are associated.
  • It formulates all the constraints that are to be applied on the data.
  • Database system has various schema.

Physical Database Schema

  • It describes how data will be stored in hard disk / secondary storage.
  • It describes the database design at physical level.
  • This schema related to the actual storage of data and its form of storage like files, indices.

Logical Database Schema

  • This schema defines all the logical constraints that need to be applied on the data stored.
  • It defines tables, views, and integrity constraint.
  • Defines relation between tables and keys applied.

View Schema

  • It descibe different views of database and some time also called sub schema.

Instance

  • The collection of information stored in the database at a particular moment is called an instance.

Architecture in DBMS

There are two types of architecture in DBMS.

  1. Application architecture
  2. Schema architecture

Application Architecture

  • The application architecture describes how software components are organized and interact to support the functionality of an application.
  • DBMS design depends on its architecture, which dictates how data is stored, accessed, and managed within a database management system.
  • The basic client/server architecture is used to handle a large number of PCs, web servers, database servers, and other components that are interconnected through networks.
    • Client-Server Model: In simple terms, the client-server model is a computing architecture where one system (the client) requests services or resources from another system (the server). The client makes requests, and the server provides the requested services, such as data retrieval or processing.
  • DBMS architecture depends on how users are connected to the database to fulfill their requests and manage data effectively.
  • Types of Application Architecture
    1. 1-tier architecture: In this architecture, the entire application runs on a single machine, with no distinction between the client and server.
    2. 2-tier architecture: This architecture separates the client interface from the database server, allowing for better scalability and manageability.
    3. 3-tier architecture: In a 3-tier architecture, there are three distinct layers: the presentation layer (client), the application logic layer (middle-tier), and the data storage layer (database server). This approach offers enhanced flexibility and scalability.
  • Database architecture can be categorized as single-tier or multi-tier. However, from a logical standpoint, it typically falls into two main categories: 2-tier and 3-tier architectures, as described above.

1-Tier Architecture

  • In 1-tier architecture, the database is directly available to the user. This means that users can interact directly with the Database Management System (DBMS) without any intermediary layers.
  • Any changes made in this architecture are directly reflected in the database itself. It does not typically provide user-friendly tools or interfaces for end-users to interact with the data.
  • 1-tier architecture is commonly employed for local application development, where programmers have direct access to the database for quick response and efficient development. It is suitable for applications with limited complexity and user interaction requirements.
  • Advantages of 1-Tier Architecture:
    • Minimal latency: Direct access to the database results in minimal communication overhead, ensuring fast data retrieval and processing.
    • Complete control: Developers have full control over the database and can implement custom solutions and optimizations.
    • Simplicity: Well-suited for small-scale applications or prototypes where simplicity and speed are prioritized.
  • Disadvantages of 1-Tier Architecture:
    • Lack of scalability: It is not suitable for applications that require scalability to accommodate a large number of users or complex data processing.
    • Security concerns: Direct database access can pose security risks if not properly secured and controlled.
    • Limited user-friendliness: End-users may find it challenging to interact directly with the database without user-friendly interfaces.

2-Tier Architecture

  • 2-tier architecture is akin to the basic client-server model. In this architecture, client-side applications can directly communicate with the database located on the server side. To facilitate this interaction, APIs such as ODBC (Open Database Connectivity) and JDBC (Java Database Connectivity) are commonly used.
  • User interfaces and application programs run on the client side, making it the user's primary interaction point with the system.
  • The server side of the architecture is responsible for providing essential functionalities such as query processing and transaction management. It handles the execution of database operations requested by the client, ensuring data integrity and security.
  • Advantages of 2-Tier Architecture:
    • Efficient data access: Direct communication between the client and the database server minimizes data access latency, resulting in faster response times.
    • Simple architecture: It offers a straightforward and easy-to-understand design, making it suitable for applications with moderate complexity.
    • Controlled client environment: Developers have control over the client-side environment, allowing for custom user interfaces and application logic.
  • Disadvantages of 2-Tier Architecture:
    • Limited scalability: 2-tier architecture may struggle to handle a large number of concurrent users and complex data processing requirements, making it less suitable for enterprise-scale applications.
    • Security challenges: Direct client-server communication can introduce security vulnerabilities if not properly secured and monitored.
    • Difficulties in maintenance: Changes or updates to the application often require client-side modifications, which can be challenging to manage, especially in a distributed environment.

3-Tier Architecture

  • 3-tier architecture introduces an additional layer between the client and server, creating a more structured and scalable system.
  • In this architecture, the client cannot directly communicate with the database server. Instead, the application on the client-end interacts with an application server, which, in turn, communicates with the database systems.
  • 3-tier architecture is commonly employed for building large-scale web applications where scalability, maintainability, and security are paramount.
  • Key Components of 3-Tier Architecture:
    • Presentation Layer (Client): This layer is responsible for the user interface and interactions. It displays information to users and collects their input.
    • Application Layer (Middle Tier): The middle tier, often referred to as the application server, contains the application's business logic. It processes user requests, performs necessary computations, and manages application functionality.
    • Data Layer (Server): The server-side layer houses the database systems responsible for data storage, retrieval, and management. It ensures data integrity and security.
  • Advantages of 3-Tier Architecture:
    • Scalability: 3-tier architecture can easily scale to accommodate a large number of users and increased data processing demands by adding more application servers or database servers.
    • Maintainability: Separation of concerns between layers simplifies maintenance and updates. Changes to the user interface or business logic can be made without affecting the entire system.
    • Security: By restricting direct access to the database and centralizing data management, 3-tier architecture enhances security and access control.
  • Disadvantages of 3-Tier Architecture:
    • Complexity: The additional layer and intercommunication can introduce complexity, increasing development and maintenance efforts.
    • Latency: The extra layer can introduce some communication overhead, potentially leading to slightly higher response times compared to 2-tier architecture.
    • Cost: Implementing and maintaining a 3-tier architecture can involve higher initial costs due to the infrastructure required for multiple layers.

Schema Architecture

  • The overall design of the database is referred to as the database schema.

Three Schema Architecture of DBMS

  • The three-schema architecture, also known as ANSI/SPARC (American National Standards Institute, Standards Planning and Requirement Committee) architecture or three-level architecture, provides a structured approach to managing databases.
  • It is used to separate user applications and the physical database, facilitating efficient data management and access.

Internal Level/Internal View

  • The internal level includes the internal schema, which describes the physical storage structure of the database.
  • The internal schema is also known as the physical schema.
  • It defines how data will be stored within database blocks and provides a detailed description of the low-level data structures used.
  • The physical level is crucial for understanding the database's storage mechanisms and optimization.

Conceptual Level/Logical Level

  • The conceptual schema describes the database's design at the conceptual level, which is also referred to as the logical level.
  • This schema defines the overall structure of the entire database, focusing on data relationships, entities, and attributes.
  • At the conceptual level, implementation details, such as the physical data structure, are abstracted and hidden.
  • Database administrators and programmers typically work at this level to design and manage the database's logical structure.

External Level/View Level

  • At the external level, a database contains multiple schemas, sometimes referred to as subschemas. These subschemas describe different views of the database.
  • Each view schema defines a specific data subset that a particular user group is interested in, hiding the remaining database elements from that user group.
  • The view schema is responsible for describing how end-users interact with the database system, presenting a tailored perspective of the data.

Data Independence

Physical Data Independence

Logical Data Independence

Different Types of Database Users in DBMS

1. Database Administrator (DBA)

  • The Database Administrator, known as the DBA, holds complete control over the database. DBAs are the guardians of database performance and reliability. They make critical decisions regarding the database and provide essential technical support.
  • Responsibilities:
    • Database installation, configuration, and maintenance
    • Security management and access control
    • Backup and recovery planning
    • Performance tuning and optimization
    • Schema design and modification

2. System Analysts and Application Programmers

  • System Analysts: These individuals act as data detectives. They access the database by crafting specific queries, often with specific conditions, to analyze and retrieve valuable insights from the data.
  • Application Programmers: These are the software sorcerers who breathe life into applications. Armed with knowledge of programming languages like C, C++, Java, and SQL, they create user interfaces and ensure seamless interaction with the database.
  • Responsibilities:
    • System analysis and requirements gathering
    • Query optimization for efficient data retrieval
    • Development and maintenance of database-driven applications
    • Integration of data with software systems

3. Database Designer

  • Database Designers are the architects of the data world. They define how data is structured, stored, and presented within the database. Their designs determine the flow and organization of information.
  • Responsibilities:
    • Database schema design and normalization
    • Table and index creation
    • Data modeling and documentation
    • Performance optimization through proper design
    • Collaboration with developers and analysts

4. End Users

  • End users are the individuals who interact directly with the database to perform their tasks. They come in various flavors:
    1. Native Users: These users may have limited knowledge about databases. They access the database through user-friendly menus and interfaces without delving into technical details.
    2. Sophisticated Users: These users have a basic understanding of databases. They use user queries to retrieve specific data and work with it.
    3. Specialized Users: These are the database ninjas. They write specialized queries, often in scripting languages like shell or batch, to interact with the database in unique and powerful ways.
  • End User Roles:
    • Data entry and retrieval
    • Report generation and analysis
    • Decision-making based on database insights
    • Interfacing with applications
    • Collaborating with other users for data sharing

Data Model

E-R Model Concepts

For example ⇒ Suppose we design a school database. In this database, the student will be an entity with attribute like address, name, id, age etc. ↓

Component of ER Diagram

Entity

  • An entity is a fundamental concept in the Entity-Relationship (E-R) model and represents a specific thing or object in the real world. Each entity is distinguishable from all other objects, meaning it has a unique identity.
  • Entities are used to model and describe real-world objects, such as people, places, things, or concepts, that are relevant to the domain being represented in a database.
  • Anything about which we store information, track data, or perform operations can be considered an entity in the context of database design and management.
  • Entities are characterized by their attributes, which are properties or characteristics that define and differentiate one entity from another.
  • In E-R diagrams, entities are typically represented by rectangles, and their names are displayed inside the rectangles.

Entity Set

  • An entity set is a collection or grouping of entities that share the same characteristics or attributes. These entities belong to the same category or type within a particular domain.
  • Entity sets are used to organize and categorize entities in a database. For example, in a database for a university, you might have an entity set for "Students" to group all individual student entities.
  • Each entity set is defined by a common set of attributes that are applicable to all entities within that set. These attributes collectively describe the properties of the entities in the set.
  • Entity sets help simplify database design and management by allowing you to work with groups of entities that have similar characteristics and behaviors.
  • In E-R diagrams, entity sets are typically represented by rectangles similar to individual entities but often labeled with the plural form of the entity name to indicate that it represents a collection of entities.
  • Entity sets are essential for establishing relationships between entities, as they define the types of entities that can participate in various relationships.

Types of Entity Sets

There are two primary types of entity sets in the Entity-Relationship (E-R) model, each with its own characteristics and roles in database design:

  1. Weak Entity Set → A weak entity set is an entity set that depends on another entity, known as the "owning" or "parent" entity, for its existence and uniqueness. Key features of weak entity sets include:
    • Dependency: A weak entity set does not have a unique key attribute of its own, meaning it cannot be uniquely identified by its attributes alone.
    • Representation: In E-R diagrams, a weak entity set is typically represented by a double rectangle, distinguishing it from strong entity sets.
    • Example: Consider a database for tracking library books. The "Copy" entity may be a weak entity set because it depends on the "Book" entity for its identity, and copies of different books might have the same attributes (e.g., copy number).
  2. Strong Entity Set → A strong entity set, in contrast, is an entity set that contains sufficient attributes to uniquely identify each of its entities. Key features of strong entity sets include:
    • Uniqueness: Each entity within a strong entity set can be uniquely identified by its own attributes, typically with the help of a primary key attribute or a combination of attributes.
    • Representation: In E-R diagrams, a strong entity set is usually represented by a single rectangle, indicating its self-sufficiency in terms of identity.
    • Example: In a database for a university, the "Student" entity is typically a strong entity set because each student can be uniquely identified by a student ID or a combination of attributes such as name and date of birth.

Attributes in ER Model

  • An attribute is used to describe a property of an entity.
  • An entity set may contain many number of attributes.
  • Attributes are represented in an elliptical shape.

For example → ID, age, phone number, name, etc., can be attributes of a student.

Types of Attributes

  1. Simple Attribute → An attribute that cannot be further subdivided into components is a simple attribute.
    • It is represented by an ellipse.
  2. Composite Attribute → An attribute that can be split into components is a composite attribute.
    • The composite attribute is represented by an ellipse, and those ellipses are connected with an ellipse.
  3. Key Attribute → The key attribute is used to represent the main characteristics of an entity.
    • It represents a primary key.
    • The key attribute is represented by an ellipse with an underline.
  4. Multivalued Attribute → An attribute that can have more than one value is known as a multivalued attribute.
    • Double ellipses are used to represent multivalued attributes.
    • For example → a student can have more than one phone number.
  5. Derived Attribute → An attribute that can be derived from other attributes is known as a derived attribute.
    • It can be represented by a dashed ellipse.
    • For example → a person's age changes over time and can be derived from another attribute like date of birth.

Relationship / Mapping Constraints

  • A relationship is used to describe the connection between entities.
  • A diamond or rhombus shape is used to represent the relationship in E-R diagrams.
  • Mapping cardinalities, or cardinality ratios, express the number of entities to which another entity can be associated via a relationship set.

There are 4 types of mapping constraints or relationships:

  1. One-to-One Relationship → When only one instance of an entity is associated with the relationship, it is known as a one-to-one relationship.
  2. One-to-Many Relationship → When one instance of the entity on the left is associated with more than one instance of an entity on the right through the relationship, it is known as a one-to-many relationship.
    • For example → a scientist can invent many inventions, but each invention is attributed to a specific scientist.
  3. Many-to-One Relationship → When more than one instance of the entity on the left is associated with only one instance of an entity on the right through the relationship, it is known as a many-to-one relationship.
    • For example → students enroll in only one course, but a course can have many students.
  4. Many-to-Many Relationship → When more than one instance of the entity on the left and more than one instance of an entity on the right are associated with the relationship, it is known as a many-to-many relationship.
    • For example → employees can be assigned to many projects, and each project can have many employees working on it.

Notation of E-R Diagram

  • Database can be represented using the relations. In ER diagram many notations are used to express the cardinality.
  • These notations are as follows ↓

Construct an E-R Diagram for a hospital with a set of patients and a set of medical doctors.

Converting ER Diagram to Relational Database Schema

Converting an Entity-Relationship (ER) Diagram into a Relational Database Schema involves several rules and algorithmic steps:

1. Strong Entity Set With Only Simple Attributes

  • If you have a strong entity set with only simple attributes, it will require only one table in the relational model.
  • The attributes of the table will be derived from the attributes of the entity set.
  • The primary key of the table will be the key attribute of the entity set.

2. Strong Entity Set With Composite Attributes

  • If you have a strong entity set with composite attributes, it will still require only one table in the relational model.
  • During conversion, consider the simple attributes within the composite attributes, not the composite attributes themselves.

3. Strong Entity Set With Multi-Valued Attributes

  • If you have a strong entity set with multi-valued attributes, it will require two tables in the relational model:
    • One table will contain all the simple attributes along with the primary key.
    • The other table will contain the primary key and all the multi-valued attributes.

4. Translating Relationship Set into a Table

  • A relationship set will necessitate one table in the relational model.
  • The attributes of the table will include:
    • The primary key attributes of the participating entity sets.
    • Any descriptive attributes specific to the relationship set.
  • A combination of non-descriptive attributes will form the primary key.
  • For a given ER diagram, you may need three tables in the relational model, such as:
    1. One table for the entity set "Employee."
    2. One table for the entity set "Department."
    3. One table for the relationship set "works in."

Extended ER Diagram

EER diagram is introduced because there are some limitations of ER diagram to overcome them we use EER diagram.

Limitations of Entity-Relationship (ER) Diagrams

Entity-Relationship (ER) diagrams are valuable tools for visualizing the structure and relationships within a database. However, like any modeling technique, they have certain limitations that can affect their ability to represent complex scenarios accurately. Below are some key limitations of ER diagrams:

  • ER diagrams face challenges when it comes to representing class and subclass entities:
    • Inheritance Hierarchy: ER diagrams may struggle to depict the hierarchical relationship between a parent class and its subclasses effectively. While the parent class contains general attributes and relationships, subclasses inherit properties from the parent but also have their unique attributes. Capturing this inheritance hierarchy can be complex.
    • Subclass to Subclass Relationship: When there is a relationship between two subclasses, ER diagrams may not provide a straightforward way to express this relationship. This limitation can make it challenging to model scenarios where subclasses interact with each other.
  • Additionally, ER diagrams cannot directly represent relationships between relationships:
    • Inter-Relationship Relationships: ER diagrams primarily focus on entities and their relationships. They are less equipped to illustrate how relationships between entities relate to each other. For example, expressing relationships like "a customer's interaction history with an order" can be challenging without additional notation or documentation.

So, in summary, the main limitations of ER diagrams can be categorized into two key areas:

  1. Difficulty in Representing Class and Subclass Hierarchies: ER diagrams may struggle to clearly depict inheritance hierarchies and relationships between subclasses.
  2. Limited Support for Relationships Between Relationships: ER diagrams are primarily designed for entity-level relationships and may require additional techniques to represent complex interactions between these relationships.

Despite these limitations, ER diagrams remain a valuable tool for database design, offering a visual and intuitive way to represent a wide range of database structures.

Overcoming the Relationship between relation problem ↓

One way to overcome the challenge of representing complex relationships between relations (tables) in entity-relationship diagrams (ER diagrams) is through the use of the "Aggregation" concept.

Aggregation involves grouping multiple entities that have a single relationship, and we assign a new name to this grouping of entities that share this relationship.

Overcoming the Challenge of Class and Subclass Using Extended ER Diagrams

  • We have 2 terms here:
    1. Generalization
    2. specialization
  • Generalization: Generalization is the process of extracting common properties from a set of entities to create a more generalized entity.
    • Generalization follows a "bottom-up approach" where two or more entities with shared attributes are combined to form a higher-level entity, often referred to as the superclass.
    • The primary purpose of generalization is to emphasize the similarities among lower-level entity sets while abstracting away differences in the schema.
  • For example, consider three sub-entities: Car, Bus, and Motorcycle. These three entities can be generalized into a higher-level entity (superclass) named "Vehicle," which captures common attributes and relationships shared by all vehicle types.

"IS A" Relationship Symbol in EER Diagrams

The symbol of a triangle with "IS A" written on it is used in Extended Entity-Relationship (EER) diagrams to represent the concept of "IS-A" relationships, also known as "Generalization-Specialization" relationships.

Symbol Details:

  • Triangle Symbol: The triangle is typically drawn as an arrowhead pointing from a subclass (the specific entity or subtype) towards a superclass (the more general entity or supertype). This arrowhead signifies the "IS-A" relationship, indicating that the subclass "IS A" type of the superclass.
  • "IS A": The "IS A" label written inside the triangle further emphasizes the nature of the relationship. It signifies that instances of the subclass are instances of the superclass. In other words, the subclass inherits properties (attributes and relationships) from the superclass and can be treated as an instance of both the subclass and superclass.

Example:

Let's consider an example to illustrate the use of the triangle symbol with "IS A":

  • Suppose you have a superclass called "Vehicle."
  • You also have three subclasses: "Car," "Motorcycle," and "Bus."
  • You would draw a triangle arrowhead pointing from each of these subclasses (Car, Motorcycle, Bus) towards the "Vehicle" superclass.

This notation indicates that each of these subclasses "IS A" type of "Vehicle." It means that a Car "IS A" Vehicle, a Motorcycle "IS A" Vehicle, and a Bus "IS A" Vehicle. The subclasses inherit common attributes and relationships from the superclass while having their own unique properties.

In EER diagrams, "IS-A" relationships are essential for modeling class and subclass hierarchies, providing a clear way to represent inheritance and specialization in a database schema.

  • Specialization: Specialization is the opposite of Generalization.
  • In Specialization, an entity is divided into sub-entities based on their distinguishing characteristics.
  • Specialization follows a "Top-down approach" where a higher-level entity is specialized into two or more lower-level entities.
  • It is used to identify subsets of an entity set that share specific distinguishing characteristics.
  • Specialization can be applied repeatedly to refine the schema's design.
  • Specialization is often depicted by a triangle component labeled "IS A."

Example of Specialization:

  • The "Vehicle" entity can be specialized into sub-entities like "Car," "Truck," or "Motorcycle," each with its unique attributes and characteristics.
  • Typically, the superclass is defined first, followed by the subclass and their related attributes. Then, relationship sets are added later in the schema design process.

Relational Model

Hierarchical Model

Symbols used in diagram ↓

Some Restrictions ↓

  • If a parent has one child → one-to-one relation
  • If a parent has many children → one-to-many relation
  • However, we can't define a one-to-many relationship from child to parent.
    • Many-to-many relationships are not possible in a hierarchical model.

How to Overcome These Restrictions?

  • We can overcome these restrictions by introducing repetitions of entities.

Network Model

Symbol used ↓

  • The network model uses a tree structure, making many-to-many representation not possible. To overcome this limitation, we employ the "common set type" method.
  • CODASYL Database Model: CODASYL, also known as the Network Model, is a database model used to organize and structure data in a network-like format. In this model, data is organized into records and sets, and relationships between data elements are represented graphically. It was a popular database model in the early days of computing.
  • CODASYL Conference: CODASYL is the name of a conference that focuses on data system languages and database management. This conference has played a significant role in shaping the field of data management and has been a platform for discussing and developing data-related standards and technologies.

DBMS Languages

  • A Database Management System (DBMS) includes appropriate languages and interfaces to express database queries and updates.
  • These languages can be used to read, store, and update data in the database.
  • Types of Database Language:
    1. DDL (Data Definition Language): Used to define the database structure or schema.
    2. DML (Data Manipulation Language): Used for querying and manipulating data in the database.
    3. DCL (Data Control Language): Manages access control and permissions on the database.
    4. TCL (Transaction Control Language): Used to manage transactions within the database.

DDL (Data Definition Language)

  • DDL is used to define the database structure or schema.
  • Usage: It consists of a set of SQL commands used to create, modify, and delete the database structure (e.g., tables, indexes, constraints) but not the data.
  • Users: These commands are typically not used by general users accessing the database via an application. They are used by database administrators (DBAs), database designers, or immediate application developers.
  • Data Dictionary: DDL updates a special set of tables known as the data dictionary or data directory, which stores metadata about the database structure.

List of DDL Tasks:

  • CREATE: Used to create objects in the database (e.g., tables, indexes).
  • ALTER: Used to modify the structure of existing database objects.
  • DROP: Used to delete objects from the database.
  • TRUNCATE: Used to remove all records from a table, including all allocated space for the records.
  • COMMENT: Used to add comments to the data dictionary or metadata about objects.
  • RENAME: Used to rename an object in the database.

DML (Data Manipulation Language)

  • DML is a set of SQL commands used to select, modify, and delete data in a database, as opposed to the database structure itself.
  • Usage: DML is primarily used for accessing and manipulating data within a database. It handles user requests to interact with the data.
  • Scope: DML statements are specifically used to manage data within schema objects.

List of DML Tasks:

  • SELECT: Retrieves data from a database, allowing users to query and retrieve specific information.
  • INSERT: Adds new data into a table, creating new records within the database.
  • UPDATE: Modifies existing data within a table, allowing users to change the values of specific records.
  • DELETE: Removes records from a table, effectively deleting data from the database.
  • MERGE: Performs an "UPSERT" operation, which means it either inserts new data or updates existing data, based on specified conditions.
  • CALL: Used to invoke a structured query language (SQL) or a Java subprogram, enabling the execution of predefined procedures or functions.
  • LOCK TABLE: Manages data concurrency by controlling access to specific database tables, helping prevent conflicts during simultaneous data access.

DCL (Data Control Language)

  • Data Control Language (DCL) is used to control privileges in databases, regulating who can access and manipulate data and the database itself.
  • Usage: DCL comprises SQL statements that control access to data and the database as a whole.
  • Privileges: In order to perform various operations in the database, such as creating tables, sequences, or views, users need appropriate privileges.

Types of Privilege:

  1. SYSTEM Privilege: This type of privilege covers actions like creating a session, tables, and other system-level operations.
  2. OBJECT Privilege: Object privileges pertain to commands or queries that involve working with specific database tables or objects.

List of DCL Tasks:

  • GRANT: Grants users access privileges to a database or specific database objects, allowing them to perform authorized operations.
  • REVOKE: Revokes previously granted permissions from users, restricting their access to certain database resources.

Note: In addition to the mentioned query languages, there is also DQL (Data Query Language), which primarily consists of one task:

  • SELECT: DQL's main function is to retrieve and query data from a database, allowing users to specify what data they want to retrieve and how it should be presented.

Previous Year Questions

Q- Fill in the blanks / True-False:
1- Weak entity type is represented by __________ box in ER diagram.
2- A relational database consists of a collection of _________
3- Database __________ which is the logical design of the database, and the database, ___________ which is a snapshot of the data in the database at a given instant in time.
4- Multi valued attribute is ___________.
5- Create command comes under DDL category.          (True/False)
6- Database can reduce the redundancy up to 100%. (True/False)
7- Create command comes under DML category. (True/False)
8- One table can have manimum two primary key. (True/False)
9- SQL is a programming language. (True/False)
10- Database is _______________.
11- Weak entity type is represented by _______ box in an ER diagram.
12- A relational database consists of a collection of __________.
13- Database ___________ which is the logical design of the database, and the database, _____________ which is snapshot of the data in the database at a given instant in time.

Q- Define Database management system. Give an discuss few applications of DBMS.

Q- Attempt any five parts:
1- Key constraints
2- DML vs. DDL
3- Physical and logical data independence
4- Generalization
5- Schema and Instances
6- Main function of DBA?
7- Data Independence
8- Define foreign key. What is the concept used for this?
9- Applications of database system.

Q- Define Database and Database Management System. Describe the advantages and disadvantages of using of DBMS.

Q- What are the different types of database end users? Describe the responsiblities of the DBA and the database designer.

Q- Write difference between the following:
(i) Logical vs. Physical Data Independence
(ii) File System vs. DBMS Approach
(iii) Composite key vs. Composite Attribute

Q- Define the following with respect to an E-R diagram. Explaoin the manner in which each is mapped to a table. Illustrate with an example.
(i) Relationship set
(ii) Aggregation
(iii) Multivalued attribute

Q- Draw an E-R diagram for banking system.

Q- What is a relation? Differentiate between a relational schema and a relational instance. Define cardinality and degree of relation.

Q- Define database management system. Given and discuss few applications of DBMS.

Q- Draw a neat, clean and labeled structure of database.

Q- What is data model? Explain various types of data models studied in brief.

Q- Define Data modelling? What is its role in database design? Explain the types of modelling.

Write the importance of ER diagram in database and explain all ERD symbols. Draw ER diagram for the following scenario:
university needs to maintain the information about its affilicated colleges and courses offered in each college as detail given below. College: college_code, college_name, address (consisting of street, area and city) Courses offered : course_code, course_name, course_credit. Student: student's number, birth_date, gender, address, phone number, courses_taken.
Draw ER diagram for the above situation and identify the possible relationship between entities.

Explain the 3-layer architecture of DBMS along with the main components. Also differentiate between Entity Integrity and Referential Integrity?

Q- Discuss the term "Mapping Cardinalities" in detail. Use appropriate diagrams in support of your answer.

Q- Draw an ER schema diagram for a COMPANY enterprise which is having Employee, Department, and Project as entities. Make necessary assumptions wherever required to show different attributes in your diagram.

Q- Explain the following Extended ER model features:
(i) Specialization
(ii) Aggregation

Q- Differentiate between database system vs. file system.

Q- Write appropriate steps of an algorithm used to convert ER schema to Relational database schema.

Q- Define ER model. Also describe different types of attributes that can be used in an ER diagram.

Q- What do you understand by the term "Data models". Explain.

Q- Draw a neat and labeled diagram of Database system structure that shows the various components of a database system.

Q- Explain the term "View of data". Also explain the various data abstraction levels.

Q- Discuss about all the types of notation used in ER diagram. What are the limitations of ER diagram?

Q- Draw an ER diagram for a University enterprise which is having the various courses, departments, students and teachers. Make necessary assumptions whereever required.

Q- Explain the concept of super key, candidate key and primary key.

Q- What do you mean by constaints? Explain all the types of constraints with suitable example.

Q- Explain the term specialization, generalization and aggregation.

Q- Explain schema architecture in detail and explain how data independence is achieved through it?

Q- What is sub class and super class in EER model? Explain the attribute inheritance with an example.

Q- A university registrar's office maintains data about the following entities:
(i) courses, including number, title, credits, syllabus and prequisites.
(ii) course offerings, including course number, year, semester, section number, instructor(s), timings and classroom.
students, including student-id, name and program.
instructors, including identification number, name, department and title.
Further, the enrollment of students in courses and grades awarded to students in each course they are enrolled for must be appropriately modelled. Construct and E-R diagram for the registrar's office. Document all assumptions that you make about the mapping constraints.

Q- Explain DDL and DML and enlist DDL and DML commands.

Reference ↓