× back
← Previous Topic

Transaction and Concurrency Control

What is a Transaction?

Basic Database Access Operations

  1. Read_item(x)
  2. Write_item(x)

Example: Transaction to transfer 700 Rs from A/C A to B A/C.

  • read(A);
    A = A - 700;
    Write(A);
    read(B);
    B = B + 700;
    write(B);
  • Remember, a unit or part of a program in DBMS, is known as a transaction, and a similar concept is referred to as processes in an operating system.
  • For insuring integrity of data, database system maintain the 'ACID' properties.

ACID Properties of Transaction

To ensure the integrity of data, a database system is required to maintain the following properties:

  1. Atomicity: It means that either all operations within a transaction are completed, or none are. The transaction is treated as an indivisible unit of work, ensuring that either all changes are applied or none at all.
  2. Consistency: It means that all operations performed by the user are logically correct and executed completely. The database is transferred from one consistent state to another, maintaining data integrity throughout the transaction.
  3. Isolation: If transactions are executing concurrently (e.g., Ti and Tj), the operations of one transaction (Ti) should not interfere with the operations of another transaction (Tj). Each transaction should be isolated from the effects of other concurrently executing transactions.
  4. Durability: The committed transaction should persist in the database, and its changes must not be lost even in the face of system failures. Once a transaction is committed, its effects should be durable and survive system crashes or other failures.

Examples:

Atomicity: Consider a funds transfer transaction where money is deducted from one account and added to another. Atomicity ensures that either both these operations happen successfully or none at all, preventing scenarios where money is deducted but not added or vice versa.

Consistency: In a database managing student grades, consistency ensures that if a transaction involves updating grades, the database moves from one consistent state (e.g., all grades are within a valid range) to another consistent state after the transaction is executed.

Isolation: Imagine two transactions updating different sets of records simultaneously. Isolation ensures that the changes made by one transaction do not affect the ongoing operations or the final outcome of the other transaction.

Durability: After a user confirms a purchase in an e-commerce system, durability ensures that the record of that transaction persists in the database even if there's a system failure immediately after the confirmation, safeguarding against data loss.

Transaction States

Concurrency

Concurrency refers to the simultaneous execution of multiple transactions.

EXAMPLE:


Time  | T1  |  T2  | T3
------------------------- 
t1    |     |   R  |
t2    |  R  |      |
t3    |     |      | W
t4    |     |   R  |
t5    |     |   R  |
t6    |     |   W  |
t7    |  W  |      |
t8    |  R  |      |
t9    |     |      | R
    

Advantages of Concurrency

  1. Resource utilization: If one transaction is busy using the CPU, others can utilize I/O devices concurrently, leading to better resource utilization.
  2. Improved throughput: Concurrency enables more transactions to be completed or executed simultaneously, resulting in improved overall throughput.
  3. Reduced response time: With multiple transactions executing concurrently, the time needed for completing transactions is reduced, leading to quicker system responses.
  4. Reduced waiting time: Transactions have to wait less for acquiring resources, as the system allows multiple transactions to proceed simultaneously, reducing waiting times.

Why Concurrency Control is Needed?

Concurrency leads to several problems

  • Lost update problem
  • Dirty read problem
  • Uncommitted dependency problem
  • Unrepeatable read problem
  • Inconsistent analysis problem
  • Phantom read problem

Lost Update Problem

  • If any transaction Tj updates any variable 'A' at time 't' without knowing the value of 'A' at time 't', then this may lead to the lost update problem.

Example:

Consider two transactions, T1 and T2, where T1 reads the value of variable 'A' and T2 updates 'A' without considering the updated value from T1. This can result in a lost update problem as the update from T1 is overwritten by the update from T2.

Dirty Read Problem

  • Reading the data value of a variable before committing may lead to the 'dirty read problem'.

Example:

Consider two transactions, T1 and T2, where T1 updates a variable 'X', but before committing the update, T2 reads the intermediate and uncommitted value of 'X'. This premature read by T2 may result in inaccurate or inconsistent data, illustrating the dirty read problem.

Uncommitted Dependency Problem

  • This problem arises if any transaction Tj updates any variable 'V' and allows read or write operations on 'V' by any other transaction, but Tj is rolled back due to failure.

Example:

Imagine two transactions, T1 and T2, where T1 updates a variable 'V', and T2 reads or writes 'V' based on the update from T1. If T1 is rolled back due to a failure before committing, it creates an uncommitted dependency problem for T2. This is because T2 might have performed operations based on the temporary changes made by T1, which are now lost due to the rollback.

Unrepeatable Read Problem

  • It occurs when a transaction tries to read the value of a data item twice, and another transaction updates the same data item in between the two read operations of the first transaction.

Example:

Consider two transactions, T1 and T2, where T1 reads a data item twice. Meanwhile, T2 updates the same data item between the two read operations of T1. This creates an unrepeatable read problem for T1 because the second read yields a different value than the first read due to the update performed by T2.

Inconsistent Analysis Problem

  • This problem occurs when one transaction reads a variable before it is committed by another transaction.

Example:

Suppose there are two transactions, T1 and T2. T1 reads a variable, and before T1 commits its changes, T2 updates the same variable. This leads to the inconsistent analysis problem for T1 because the value it read initially is now different from the updated value made by T2.

Phantom Read Problem

  • This problem occurs when a transaction reads the same variable from the buffer, and when it reads it again later, it finds that it doesn't exist or has been modified.

Example:

Consider two transactions, T1 and T2. T1 reads a set of variables, and in between its two read operations, T2 inserts or modifies a record that affects the range of data read by T1. When T1 reads the same set of variables again, it encounters a phantom read problem as the data it initially read has changed due to the insertion or modification by T2.

Schedule

Example:


T1         |       T2
_____________________
R(A)       |
R(B)       |
A = A + B  |
W(A)       | 
           |  R(A)
           |  A = A - 500
           |  W(A)
    

In the above example:

  1. This schedule involves two transactions, T1 and T2, executing concurrently. T1 begins by reading the values of variables A and B, followed by an addition operation (A = A + B) and a write operation (W(A)). Meanwhile, T2 reads the value of A, subtracts 500 from it, and writes back the updated value to A.
  2. The schedule illustrates the interleaved execution of operations from both transactions, showcasing the concept of concurrency. It is crucial to note that the order of operations within each transaction is maintained, ensuring consistency in their individual sequences.
  3. Schedules play a key role in understanding the dynamic execution of transactions in a concurrent environment, helping maintain the integrity and consistency of the database system.

Schedule are of three types:

  1. Serial schdeule
  2. Non serial schedule
  3. Serializable schedule

Serial Schedule

  • In a serial schedule, one transaction is executed entirely before starting another transaction.
  • When the first transaction completes its cycles, only then does the next transaction start.

Example:

Consider two transactions, T1 and T2. In a serial schedule, if T2 is being executed first, it means that T2 will be completely executed before T1 begins its execution. The operations of T2, including reads, writes, and any computations, will finish before T1's operations commence.

Serial schedules ensure a strict sequential execution of transactions, eliminating any interleaving of operations between transactions. This guarantees that the effects of one transaction are visible to the database before the next transaction begins, maintaining a clear order of execution.

While serial schedules provide simplicity and ease of analysis, they may lead to reduced concurrency and potentially longer execution times when compared to more concurrent scheduling methods.

  • Note: For set of 'n' transaction 'n!' different valid schedules can be possible.
    • For 3 transaction T1 T2 T3 - 3! - 1 * 2 * 3 = 6
      T1 T2 T3, T2 T1 T3, T3 T1 T2, T1 T3 T2, T2 T3 T1, T3 T2 T1

Non-Serial Schedule

  • In a non-serial schedule, instructions of transactions execute concurrently, allowing for the interleaving of instructions between transactions.
  • Interleaving refers to the practice of transitioning from the execution of one transaction to another transaction within the schedule.
  • The possible number of non-serial schedules for multiple transactions can be significantly larger compared to serial schedules.
    • For example, consider three transactions:
      • T1 has 4 instructions
      • T2 has 2 instructions
      • T3 has 3 instructions
      Then the total number of non-serial schedules is calculated using permutations, considering the interleaving of instructions:
      Total Non-Serial Schedules = (2 + 3 + 4)! / (2! * 3! * 4!) = 1260
      Total Serial Schedules = 3! = 6
      Total Number of Non-Serial Schedules = 1260 - 6 = 1254

Serializable Schedule

  • A non-serial schedule is considered serializable if its result is equivalent to the result obtained from executing a serial schedule of the same transactions.

Example:

Let's consider a non-serial schedule involving two transactions, T1 and T2. The non-serial schedule interleaves instructions between T1 and T2 to execute concurrently. To determine if it is serializable, we compare its result with that of a corresponding serial schedule.


Non-Serial Schedule:
T1: R(A)
T2: R(B)
T1: A = A + B
T2: B = B * 2
T1: W(A)
T2: W(B)
                        

To check serializability, we compare this schedule with the serial schedule:


Serial Schedule:
T1: R(A)
T1: A = A + B
T1: W(A)
T2: R(B)
T2: B = B * 2
T2: W(B)
                        

    If the results of both schedules are the same, the non-serial schedule is serializable. If not, it indicates potential conflicts or dependencies that may need resolution for serializability.

    Serializable schedules are crucial for maintaining consistency in a concurrent database environment, ensuring that the final state is equivalent to the state achieved through a serialized execution of transactions.

Serializability & Testing of Serializability

Test for Serializability of a Schedule

A precedence graph is employed to test the serializability of a non-serial schedule.

  • Graph is represented by 'G'. G(V, E)
    • V: set of vertices (all transactions)
    • E: set of edges Ti → Tj
  • When creating edges from one transaction to another in the precedence graph, three conditions are considered:
    
    T1    ->     T2
    W(Q)         R(Q)
    R(Q)         W(Q)
    W(Q)         W(Q)
    R(Q)         R(Q) (In this condition, there will be no edge)
                    
  • When both transactions are reading the same variable, there will be no edge between them.
  • If the precedence graph contains no cycles, it indicates that the schedule S is a serializable schedule.

Example:


   T1    |     T2     |     T3
--------------------------------
   R(A)  |            |       
   R(C)  |            |       
   W(A)  |            |       
         |    R(B)    |       
   W(C)  |            |       
         |    R(A)    |       
         |            |   R(C)    
         |    W(B)    |       
         |            |   R(B)    
         |            |   W(C)    
         |    W(A)    |       
         |            |   W(B)     
                    

As there is no cycle formed in this graph that means it is serializable schedule.

Another example:


   T1    |     T2     |     T3
--------------------------------
   R(A)  |            |       
         |    R(B)    |       
         |            |   R(C)    
         |    W(B)    |       
         |            |   W(C)    
   W(A)  |            |       
         |    R(A)    |       
         |            |       
   R(C)  |            |       
         |    W(A)    |     
   W(C)  |            |       
         |            |   W(B)    
                    

If we have a non-serializable schedule and we aim to convert it into a serializable schedule, two methods can be employed:

  1. Conflict Serializability
  2. View Serializability

Conflict Serializability

  • A schedule is considered conflict serializable if, after swapping non-conflicting operations, it can be transformed into a serial schedule (conflict equivalent to a serial schedule).
  • Conflicting operations are those that may create interference when executed concurrently. For instance:
    
    T1   |   T2
    -----------
    W(A) | W(A)
    W(A) | R(A)
    R(A) | W(A)
                                
  • In the above example, when both T1 and T2 are writing to the same variable (A), it constitutes a conflicting operation. The conditions previously used to create edges in the precedence graph are now identified as conflicting operations.
  • Other operations not falling into these conflicting conditions are considered non-conflicting.

Example:


T1    |     T2     
---------------
R(A)  |              
W(A)  |          
      |    R(A)          
      |    W(A)      
R(B)  |               
W(B)  |              
      |    R(B)      
      |    W(B)                    
                    

In order to convert the given schedule into a conflict serializable form, we will identify conflicting operations and rearrange non-conflicting operations to achieve serializability. In the given example:


T1    |     T2     
----------------
R(A)  |              
W(A)  |          
      |    R(A)          
      |    W(A)      
R(B)  |               
W(B)  |              
      |    R(B)      
      |    W(B)                    
                    

The conflicting operations are W(A) in T1 and W(A) in T2, as well as W(B) in T1 and W(B) in T2. To make the schedule conflict serializable, we will rearrange non-conflicting operations:


T1    |     T2     
----------------
R(A)  |              
R(B)  |          
W(A)  |    R(A)          
W(B)  |    W(A)      
      |    R(B)      
      |    W(B)                    
                    

The modified schedule is now conflict serializable as it can be transformed into a serial schedule by rearranging conflicting operations. This ensures that the final result remains consistent with a serial execution of transactions.

View Serializability

  • A schedule is considered view serializable if it is view equivalent to a serial schedule.
  • If a schedule is conflict serializable, then it is also view serializable.
  • However, view serializable schedules that are not conflict serializable may contain blind writes.

Two schedules, s1 and s2, are view equivalent if they satisfy the following conditions:

  1. Initial Read: The initial read of both schedules is the same.
    
    S1       T1   |   T2          S2   T1   |    T2 
            ------+-------            ------+------- 
             R(A) |                         |  W(A)
                  |  W(A)              R(A) |
                
    In this example, R(A) is the initial read in Schedule 1, and in Schedule 2, R(A) is still the initial read.
  2. Update Read: In Schedule S1, if Ti is reading A, which is updated by Tj, then in S2 also.
    
    S1       T1  |  T2  | T3           S2   T1  |  T2  |  T3
           ------+------+-----             -----+------+------ 
            W(A) |      |                       | W(A) |
                 | W(A) |                  W(A) |      |
                 |      | R(A)                  |      | R(A)
                
  3. Final Write: The final write must be the same between both schedules.
    
    S1       T1  |  T2  | T3           S2   T1  |  T2  |  T3
           ------+------+-----             -----+------+------ 
            W(A) |      |                       | R(A) |
                 | R(A) |                  W(A) |      |
                 |      | W(A)                  |      | W(A)
                

Recoverability of Schedule

Example:

Consider the following schedule involving two transactions, T1 and T2:


    T1        |       T2
----------------------------------
   R(A)       |             
   W(A)       |             
              |      R(A)          
              |      W(A) (Updated by T1)
   W(B)       |             
    

In this example, T1 reads and writes to variable A, and T2 reads and writes to the same variable after T1. If a failure occurs after T2 reads A but before T2 writes A, the system needs to recover to a consistent state.

If the system can roll back the changes made by T1 (undo the write operation), it can ensure recoverability. In this case, the schedule is recoverable because the system can return to a consistent state despite the error or failure.

Cascading Rollback & Cascadeless Schedule

Cascading Rollback

  • Cascading rollback occurs when, due to a failure in one transaction, multiple transactions have to be rolled back.

Cascadeless Schedule

  • A cascadeless schedule is designed to avoid cascading rollbacks by ensuring that a transaction must commit before the data item it updates is read by another transaction.

Example:

Consider the following schedule involving two transactions, T1 and T2, where T2 reads a data item before T1 commits:


    T1        |       T2
----------------------------------
   W(A)       |             
   R(A)       |             
   R(B)       |             
              |      W(B)          
              |      R(A) (Reads uncommitted value due to T1 not committed)
    

In this example, T2 reads data item A before T1 commits its write operation on A. If T1 encounters a failure and rolls back, it will result in T2 having read an uncommitted value of A. This situation represents cascading rollback.

To achieve a cascadeless schedule, transactions should be arranged such that a transaction commits before the data it updates is read by another transaction, preventing cascading rollbacks.

Concurrency Control Techniques

Lock-Based Protocols

Granularity of Locks

  • The level or type of information that a lock protects is referred to as locking granularity. In other words, locking granularity defines the size of data items that the data manager locks.

Locking can occur at the following levels:

  1. Coarse Granularity: In this level, the database management system (DBMS) places locks on entities such as tables, data files, or the entire database.
    • It reduces the number of transactions that can be executed concurrently, leading to lower throughput and increased response time.
    • This approach results in lower concurrency due to the larger entities being locked.
  2. Fine Granularity: Locks are placed on smaller entities, such as records or fields within a table. It can also involve locking entire tables.
    • Although fine granularity introduces more lock overhead, it enhances concurrency by allowing multiple transactions to access different parts of the data simultaneously.
    • Higher concurrency leads to increased throughput and reduced response time.
  3. Intermediate Granularity: Locks are applied to pages, where a page is a fixed-size unit like 4KB, 8KB, or 16KB.
    • Pages: In the context of databases, a table may span several pages, and a page may contain several rows of a table or multiple tables.
    • This approach provides moderate concurrency and incurs moderate lock overhead.

Timestamp-Based Protocol

  • The timestamp-based protocol is employed to order conflicting transactions, where conflicting transactions are those that require access to the same data items for execution.
  • Each transaction Ti is assigned a timestamp or a unique number denoted by TS before it starts execution.

Deadlock

This topic falls under the umbrella of concurrency control.

Necessary Conditions for Deadlock:

These conditions indicate that our system is in a deadlock.

  1. Hold and Wait: Transactions hold resources while waiting for additional resources.
    • Example:
      Transaction T1 holds resource R1 and is waiting for resource R2, while Transaction T2 holds resource R3 and is waiting for resource R1. Both transactions are holding resources and waiting for additional resources, satisfying the hold and wait condition.
  2. Mutual Exclusion: Resources cannot be shared; only one transaction can use a resource at a time.
    • Example:
      Two transactions, T1 and T2, are competing for access to a printer resource. Due to the mutual exclusion condition, only one transaction can use the printer at a time, leading to potential conflicts and deadlock situations.
  3. No Preemption: Resources cannot be preempted or forcefully taken away from a transaction; they must be explicitly released.
    • Example:
      Transaction T1 holds resource R1, and Transaction T2 requests R1. If preemption were allowed, the system could forcefully take R1 from T1 and give it to T2. However, the no preemption condition states that this is not allowed, contributing to deadlock risk.
  4. Circular Wait: There is a circular chain of transactions, each holding a resource needed by the next transaction in the chain.
    • Example:
      Consider two transactions, T1 and T2, where T1 holds resource R1 and requests resource R2, while T2 holds resource R2 and requests resource R1. This scenario forms a circular wait and contributes to deadlock conditions.

Methods for Handling Deadlocks

Deadlocks, once identified, can be managed using different methods to prevent or address them. There are two primary methods for handling deadlocks:

  1. Deadlock Prevention:

    Preventing deadlocks involves employing strategies and protocols to eliminate one or more necessary conditions for deadlock. The focus is on structuring the system in a way that potential deadlocks cannot occur. Key strategies include:

    • Ensuring the Hold and Wait condition is not satisfied by requiring transactions to acquire all needed resources before initiating execution.
    • Addressing Mutual Exclusion by allowing resources to be shared among transactions.
    • Introducing Preemption, where if a transaction cannot acquire a resource, it releases all previously acquired resources and restarts the process.
    • Breaking Circular Wait by assigning a priority to resources and requiring transactions to request resources in increasing order of priority.
  2. Detect & Recovery:

    Deadlock detection involves periodically examining the system's state to identify if a deadlock has occurred. If a deadlock is detected, recovery strategies are implemented to resolve it. Common approaches include:

    • Rollback: Roll back one or more transactions to a previous consistent state, releasing the resources held by those transactions.
    • Resource Preemption: Forcefully reclaim resources from transactions to break the circular wait and resolve the deadlock.
    • Transaction Termination: Terminate one or more transactions involved in the deadlock to release their resources.

These methods provide different approaches for managing deadlocks, offering a balance between prevention measures and post-detection recovery strategies.

References