Transaction and Concurrency Control

What is a Transaction?

A transaction can be defined as a unit or part of a program at the time of its execution, forming a logical unit of database processing.
A transaction includes one or more database access operations such as insertion, deletion, modification, or retrieval operations.

Basic Database Access Operations

Read_item(x)
Write_item(x)

Example: Transaction to transfer 700 Rs from A/C A to B A/C.

read(A);
A = A - 700;
Write(A);
read(B);
B = B + 700;
write(B);

Remember, a unit or part of a program in DBMS, is known as a transaction, and a similar concept is referred to as processes in an operating system.
For insuring integrity of data, database system maintain the 'ACID' properties.

ACID Properties of Transaction

To ensure the integrity of data, a database system is required to maintain the following properties:

Atomicity: It means that either all operations within a transaction are completed, or none are. The transaction is treated as an indivisible unit of work, ensuring that either all changes are applied or none at all.
Consistency: It means that all operations performed by the user are logically correct and executed completely. The database is transferred from one consistent state to another, maintaining data integrity throughout the transaction.
Isolation: If transactions are executing concurrently (e.g., T_i and T_j), the operations of one transaction (T_i) should not interfere with the operations of another transaction (T_j). Each transaction should be isolated from the effects of other concurrently executing transactions.
Durability: The committed transaction should persist in the database, and its changes must not be lost even in the face of system failures. Once a transaction is committed, its effects should be durable and survive system crashes or other failures.

Examples:

Atomicity: Consider a funds transfer transaction where money is deducted from one account and added to another. Atomicity ensures that either both these operations happen successfully or none at all, preventing scenarios where money is deducted but not added or vice versa.

Consistency: In a database managing student grades, consistency ensures that if a transaction involves updating grades, the database moves from one consistent state (e.g., all grades are within a valid range) to another consistent state after the transaction is executed.

Isolation: Imagine two transactions updating different sets of records simultaneously. Isolation ensures that the changes made by one transaction do not affect the ongoing operations or the final outcome of the other transaction.

Durability: After a user confirms a purchase in an e-commerce system, durability ensures that the record of that transaction persists in the database even if there's a system failure immediately after the confirmation, safeguarding against data loss.

Transaction States

Database before execution: Before the start of any execution, the database is in an idle position, awaiting transaction initiation.
Active state: It is the initial state of a transaction. During the execution of statements, a transaction is in the active state, actively performing read or write operations.
Failed state: If, by any chance, the transaction encounters an error, it enters the failed state. In such cases, the ongoing transaction is terminated, and the system rolls back the changes, undoing the incomplete transaction.
Partially committed state: If there is no error, the transaction proceeds to the partially committed state. This signifies that the transaction has completed its intended work, but the changes are not yet finalized.
Committed state: After successful execution and confirmation, the transaction enters the committed state. At this point, the changes made by the transaction are permanent and saved on the hard disk or in the server, ensuring durability.
- After commit, the changes will reflect and persist, making them a permanent part of the database.
Terminated state: Following the committed state, the transaction is terminated, signifying the end of its lifecycle. At this point, the transaction has successfully completed its tasks, and the database returns to an idle state, ready for new transactions.

Concurrency

Concurrency refers to the simultaneous execution of multiple transactions.

EXAMPLE:

    
Time  | T1  |  T2  | T3
------------------------- 
t1    |     |   R  |
t2    |  R  |      |
t3    |     |      | W
t4    |     |   R  |
t5    |     |   R  |
t6    |     |   W  |
t7    |  W  |      |
t8    |  R  |      |
t9    |     |      | R

Simultaneously, three transactions are occurring, and at each time duration, a portion of each transaction is being completed.

Advantages of Concurrency

Resource utilization: If one transaction is busy using the CPU, others can utilize I/O devices concurrently, leading to better resource utilization.
Improved throughput: Concurrency enables more transactions to be completed or executed simultaneously, resulting in improved overall throughput.
Reduced response time: With multiple transactions executing concurrently, the time needed for completing transactions is reduced, leading to quicker system responses.
Reduced waiting time: Transactions have to wait less for acquiring resources, as the system allows multiple transactions to proceed simultaneously, reducing waiting times.

Why Concurrency Control is Needed?

Concurrency leads to several problems

Lost update problem
Dirty read problem
Uncommitted dependency problem
Unrepeatable read problem
Inconsistent analysis problem
Phantom read problem

Lost Update Problem

If any transaction T_j updates any variable 'A' at time 't' without knowing the value of 'A' at time 't', then this may lead to the lost update problem.

Example:

Consider two transactions, T₁ and T₂, where T₁ reads the value of variable 'A' and T₂ updates 'A' without considering the updated value from T₁. This can result in a lost update problem as the update from T₁ is overwritten by the update from T₂.

Dirty Read Problem

Reading the data value of a variable before committing may lead to the 'dirty read problem'.

Example:

Consider two transactions, T₁ and T₂, where T₁ updates a variable 'X', but before committing the update, T₂ reads the intermediate and uncommitted value of 'X'. This premature read by T₂ may result in inaccurate or inconsistent data, illustrating the dirty read problem.

Uncommitted Dependency Problem

This problem arises if any transaction T_j updates any variable 'V' and allows read or write operations on 'V' by any other transaction, but T_j is rolled back due to failure.

Example:

Imagine two transactions, T₁ and T₂, where T₁ updates a variable 'V', and T₂ reads or writes 'V' based on the update from T₁. If T₁ is rolled back due to a failure before committing, it creates an uncommitted dependency problem for T₂. This is because T₂ might have performed operations based on the temporary changes made by T₁, which are now lost due to the rollback.

Unrepeatable Read Problem

It occurs when a transaction tries to read the value of a data item twice, and another transaction updates the same data item in between the two read operations of the first transaction.

Example:

Consider two transactions, T₁ and T₂, where T₁ reads a data item twice. Meanwhile, T₂ updates the same data item between the two read operations of T₁. This creates an unrepeatable read problem for T₁ because the second read yields a different value than the first read due to the update performed by T₂.

Inconsistent Analysis Problem

This problem occurs when one transaction reads a variable before it is committed by another transaction.

Example:

Suppose there are two transactions, T₁ and T₂. T₁ reads a variable, and before T₁ commits its changes, T₂ updates the same variable. This leads to the inconsistent analysis problem for T₁ because the value it read initially is now different from the updated value made by T₂.

Phantom Read Problem

This problem occurs when a transaction reads the same variable from the buffer, and when it reads it again later, it finds that it doesn't exist or has been modified.

Example:

Consider two transactions, T₁ and T₂. T₁ reads a set of variables, and in between its two read operations, T₂ inserts or modifies a record that affects the range of data read by T₁. When T₁ reads the same set of variables again, it encounters a phantom read problem as the data it initially read has changed due to the insertion or modification by T₂.

Schedule

A schedule is a sequential arrangement of operations from one transaction to another within a database system.

Example:

    
T1         |       T2
_____________________
R(A)       |
R(B)       |
A = A + B  |
W(A)       | 
           |  R(A)
           |  A = A - 500
           |  W(A)

In the above example:

This schedule involves two transactions, T1 and T2, executing concurrently. T1 begins by reading the values of variables A and B, followed by an addition operation (A = A + B) and a write operation (W(A)). Meanwhile, T2 reads the value of A, subtracts 500 from it, and writes back the updated value to A.
The schedule illustrates the interleaved execution of operations from both transactions, showcasing the concept of concurrency. It is crucial to note that the order of operations within each transaction is maintained, ensuring consistency in their individual sequences.
Schedules play a key role in understanding the dynamic execution of transactions in a concurrent environment, helping maintain the integrity and consistency of the database system.

It is used to preserve the order of operation in each of individual transaction.

Schedule are of three types:

Serial schdeule
Non serial schedule
Serializable schedule

Serial Schedule

In a serial schedule, one transaction is executed entirely before starting another transaction.
When the first transaction completes its cycles, only then does the next transaction start.

Example:

Consider two transactions, T1 and T2. In a serial schedule, if T2 is being executed first, it means that T2 will be completely executed before T1 begins its execution. The operations of T2, including reads, writes, and any computations, will finish before T1's operations commence.

Serial schedules ensure a strict sequential execution of transactions, eliminating any interleaving of operations between transactions. This guarantees that the effects of one transaction are visible to the database before the next transaction begins, maintaining a clear order of execution.

While serial schedules provide simplicity and ease of analysis, they may lead to reduced concurrency and potentially longer execution times when compared to more concurrent scheduling methods.

Note: For set of 'n' transaction 'n!' different valid schedules can be possible.
- For 3 transaction T1 T2 T3 - 3! - 1 * 2 * 3 = 6
  T1 T2 T3, T2 T1 T3, T3 T1 T2, T1 T3 T2, T2 T3 T1, T3 T2 T1

Non-Serial Schedule

In a non-serial schedule, instructions of transactions execute concurrently, allowing for the interleaving of instructions between transactions.
Interleaving refers to the practice of transitioning from the execution of one transaction to another transaction within the schedule.
The possible number of non-serial schedules for multiple transactions can be significantly larger compared to serial schedules.
- For example, consider three transactions:
  - T1 has 4 instructions
  - T2 has 2 instructions
  - T3 has 3 instructions
  Then the total number of non-serial schedules is calculated using permutations, considering the interleaving of instructions:
  Total Non-Serial Schedules = (2 + 3 + 4)! / (2! * 3! * 4!) = 1260
  Total Serial Schedules = 3! = 6
  Total Number of Non-Serial Schedules = 1260 - 6 = 1254

Serializable Schedule

A non-serial schedule is considered serializable if its result is equivalent to the result obtained from executing a serial schedule of the same transactions.

Example:

Let's consider a non-serial schedule involving two transactions, T1 and T2. The non-serial schedule interleaves instructions between T1 and T2 to execute concurrently. To determine if it is serializable, we compare its result with that of a corresponding serial schedule.

                        
Non-Serial Schedule:
T1: R(A)
T2: R(B)
T1: A = A + B
T2: B = B * 2
T1: W(A)
T2: W(B)

To check serializability, we compare this schedule with the serial schedule:

                        
Serial Schedule:
T1: R(A)
T1: A = A + B
T1: W(A)
T2: R(B)
T2: B = B * 2
T2: W(B)

If the results of both schedules are the same, the non-serial schedule is serializable. If not, it indicates potential conflicts or dependencies that may need resolution for serializability.

Serializable schedules are crucial for maintaining consistency in a concurrent database environment, ensuring that the final state is equivalent to the state achieved through a serialized execution of transactions.

Serializability & Testing of Serializability

Serializability helps identify which non-serial schedule produces consistent results similar to those of a serial schedule.

Test for Serializability of a Schedule

A precedence graph is employed to test the serializability of a non-serial schedule.

Graph is represented by 'G'. G(V, E)

V: set of vertices (all transactions)
E: set of edges T_i → T_j

When creating edges from one transaction to another in the precedence graph, three conditions are considered:

                
T1    ->     T2
W(Q)         R(Q)
R(Q)         W(Q)
W(Q)         W(Q)
R(Q)         R(Q) (In this condition, there will be no edge)

When both transactions are reading the same variable, there will be no edge between them.
If the precedence graph contains no cycles, it indicates that the schedule S is a serializable schedule.

Example:

                    
   T1    |     T2     |     T3
--------------------------------
   R(A)  |            |       
   R(C)  |            |       
   W(A)  |            |       
         |    R(B)    |       
   W(C)  |            |       
         |    R(A)    |       
         |            |   R(C)    
         |    W(B)    |       
         |            |   R(B)    
         |            |   W(C)    
         |    W(A)    |       
         |            |   W(B)

As there is no cycle formed in this graph that means it is serializable schedule.

Video lecture ⇗

Another example:

                    
   T1    |     T2     |     T3
--------------------------------
   R(A)  |            |       
         |    R(B)    |       
         |            |   R(C)    
         |    W(B)    |       
         |            |   W(C)    
   W(A)  |            |       
         |    R(A)    |       
         |            |       
   R(C)  |            |       
         |    W(A)    |     
   W(C)  |            |       
         |            |   W(B)

A cycle is formed in this one that means it is non serializable schedule.

If we have a non-serializable schedule and we aim to convert it into a serializable schedule, two methods can be employed:

Conflict Serializability
View Serializability

Conflict Serializability

A schedule is considered conflict serializable if, after swapping non-conflicting operations, it can be transformed into a serial schedule (conflict equivalent to a serial schedule).

Conflicting operations are those that may create interference when executed concurrently. For instance:

                            
T1   |   T2
-----------
W(A) | W(A)
W(A) | R(A)
R(A) | W(A)

In the above example, when both T1 and T2 are writing to the same variable (A), it constitutes a conflicting operation. The conditions previously used to create edges in the precedence graph are now identified as conflicting operations.
Other operations not falling into these conflicting conditions are considered non-conflicting.

Example:

                    
T1    |     T2     
---------------
R(A)  |              
W(A)  |          
      |    R(A)          
      |    W(A)      
R(B)  |               
W(B)  |              
      |    R(B)      
      |    W(B)

In order to convert the given schedule into a conflict serializable form, we will identify conflicting operations and rearrange non-conflicting operations to achieve serializability. In the given example:

                    
T1    |     T2     
----------------
R(A)  |              
W(A)  |          
      |    R(A)          
      |    W(A)      
R(B)  |               
W(B)  |              
      |    R(B)      
      |    W(B)

The conflicting operations are W(A) in T1 and W(A) in T2, as well as W(B) in T1 and W(B) in T2. To make the schedule conflict serializable, we will rearrange non-conflicting operations:

                    
T1    |     T2     
----------------
R(A)  |              
R(B)  |          
W(A)  |    R(A)          
W(B)  |    W(A)      
      |    R(B)      
      |    W(B)

The modified schedule is now conflict serializable as it can be transformed into a serial schedule by rearranging conflicting operations. This ensures that the final result remains consistent with a serial execution of transactions.

View Serializability

A schedule is considered view serializable if it is view equivalent to a serial schedule.
If a schedule is conflict serializable, then it is also view serializable.
However, view serializable schedules that are not conflict serializable may contain blind writes.

Two schedules, s1 and s2, are view equivalent if they satisfy the following conditions:

Initial Read: The initial read of both schedules is the same.

            
S1       T1   |   T2          S2   T1   |    T2 
        ------+-------            ------+------- 
         R(A) |                         |  W(A)
              |  W(A)              R(A) |

In this example, R(A) is the initial read in Schedule 1, and in Schedule 2, R(A) is still the initial read.

Update Read: In Schedule S1, if T_i is reading A, which is updated by T_j, then in S2 also.

            
S1       T1  |  T2  | T3           S2   T1  |  T2  |  T3
       ------+------+-----             -----+------+------ 
        W(A) |      |                       | W(A) |
             | W(A) |                  W(A) |      |
             |      | R(A)                  |      | R(A)

Final Write: The final write must be the same between both schedules.

            
S1       T1  |  T2  | T3           S2   T1  |  T2  |  T3
       ------+------+-----             -----+------+------ 
        W(A) |      |                       | R(A) |
             | R(A) |                  W(A) |      |
             |      | W(A)                  |      | W(A)

Recoverability of Schedule

If a schedule involves multiple transactions and one transaction reads some other transaction's updated instruction, and after that, if there is a failure or error, and we are able to return to a consistent state, then it is considered a recoverable schedule; otherwise, it is an irrecoverable schedule.

Example:

Consider the following schedule involving two transactions, T1 and T2:

    
    T1        |       T2
----------------------------------
   R(A)       |             
   W(A)       |             
              |      R(A)          
              |      W(A) (Updated by T1)
   W(B)       |

In this example, T1 reads and writes to variable A, and T2 reads and writes to the same variable after T1. If a failure occurs after T2 reads A but before T2 writes A, the system needs to recover to a consistent state.

If the system can roll back the changes made by T1 (undo the write operation), it can ensure recoverability. In this case, the schedule is recoverable because the system can return to a consistent state despite the error or failure.

Cascading Rollback & Cascadeless Schedule

Cascading Rollback

Cascading rollback occurs when, due to a failure in one transaction, multiple transactions have to be rolled back.

Cascadeless Schedule

A cascadeless schedule is designed to avoid cascading rollbacks by ensuring that a transaction must commit before the data item it updates is read by another transaction.

Example:

Consider the following schedule involving two transactions, T1 and T2, where T2 reads a data item before T1 commits:

    
    T1        |       T2
----------------------------------
   W(A)       |             
   R(A)       |             
   R(B)       |             
              |      W(B)          
              |      R(A) (Reads uncommitted value due to T1 not committed)

In this example, T2 reads data item A before T1 commits its write operation on A. If T1 encounters a failure and rolls back, it will result in T2 having read an uncommitted value of A. This situation represents cascading rollback.

To achieve a cascadeless schedule, transactions should be arranged such that a transaction commits before the data it updates is read by another transaction, preventing cascading rollbacks.

Concurrency Control Techniques

Concurrency is a critical aspect of database management systems that involves multiple transactions executing concurrently to enhance system efficiency.
To mitigate issues related to concurrency, various protocols (rules) are applied to transactions to ensure consistency and isolation.
Different concurrency control techniques include:
1. Lock-Based Protocols: These protocols involve acquiring and releasing locks on data items to control access and prevent conflicts between transactions.
2. Timestamp-Based Protocols: Transactions are assigned unique timestamps, and conflicts are resolved based on these timestamps to ensure a consistent order of execution.
3. Validation-Based Protocols: Transactions are validated before committing to ensure that their execution does not violate consistency constraints.
4. Multiversion Concurrency Control Protocols: Multiple versions of a data item are maintained, allowing transactions to access the appropriate version without locking, enhancing concurrency.
5. Graph-Based Protocols: Dependency graphs are used to represent relationships between transactions, helping in deadlock detection and prevention.

Lock-Based Protocols

Granularity of Locks

The level or type of information that a lock protects is referred to as locking granularity. In other words, locking granularity defines the size of data items that the data manager locks.

Locking can occur at the following levels:

Coarse Granularity: In this level, the database management system (DBMS) places locks on entities such as tables, data files, or the entire database.
- It reduces the number of transactions that can be executed concurrently, leading to lower throughput and increased response time.
- This approach results in lower concurrency due to the larger entities being locked.
Fine Granularity: Locks are placed on smaller entities, such as records or fields within a table. It can also involve locking entire tables.
- Although fine granularity introduces more lock overhead, it enhances concurrency by allowing multiple transactions to access different parts of the data simultaneously.
- Higher concurrency leads to increased throughput and reduced response time.
Intermediate Granularity: Locks are applied to pages, where a page is a fixed-size unit like 4KB, 8KB, or 16KB.
- Pages: In the context of databases, a table may span several pages, and a page may contain several rows of a table or multiple tables.
- This approach provides moderate concurrency and incurs moderate lock overhead.

Timestamp-Based Protocol

The timestamp-based protocol is employed to order conflicting transactions, where conflicting transactions are those that require access to the same data items for execution.
Each transaction T_i is assigned a timestamp or a unique number denoted by TS before it starts execution.

Deadlock

This topic falls under the umbrella of concurrency control.

A system is considered to be in a deadlock state if there exists a set of transactions {T₀, T₁, T₂, T₃, ..., T_n} such that each transaction in the set is waiting for the release of a resource held by any other transaction in that set, resulting in all transactions being in a waiting state.

Necessary Conditions for Deadlock:

These conditions indicate that our system is in a deadlock.

Hold and Wait: Transactions hold resources while waiting for additional resources.
- Example:
  Transaction T1 holds resource R1 and is waiting for resource R2, while Transaction T2 holds resource R3 and is waiting for resource R1. Both transactions are holding resources and waiting for additional resources, satisfying the hold and wait condition.
Mutual Exclusion: Resources cannot be shared; only one transaction can use a resource at a time.
- Example:
  Two transactions, T1 and T2, are competing for access to a printer resource. Due to the mutual exclusion condition, only one transaction can use the printer at a time, leading to potential conflicts and deadlock situations.
No Preemption: Resources cannot be preempted or forcefully taken away from a transaction; they must be explicitly released.
- Example:
  Transaction T1 holds resource R1, and Transaction T2 requests R1. If preemption were allowed, the system could forcefully take R1 from T1 and give it to T2. However, the no preemption condition states that this is not allowed, contributing to deadlock risk.
Circular Wait: There is a circular chain of transactions, each holding a resource needed by the next transaction in the chain.
- Example:
  Consider two transactions, T₁ and T₂, where T₁ holds resource R₁ and requests resource R₂, while T₂ holds resource R₂ and requests resource R₁. This scenario forms a circular wait and contributes to deadlock conditions.

Methods for Handling Deadlocks

Deadlocks, once identified, can be managed using different methods to prevent or address them. There are two primary methods for handling deadlocks:

Deadlock Prevention:
Preventing deadlocks involves employing strategies and protocols to eliminate one or more necessary conditions for deadlock. The focus is on structuring the system in a way that potential deadlocks cannot occur. Key strategies include:
- Ensuring the Hold and Wait condition is not satisfied by requiring transactions to acquire all needed resources before initiating execution.
- Addressing Mutual Exclusion by allowing resources to be shared among transactions.
- Introducing Preemption, where if a transaction cannot acquire a resource, it releases all previously acquired resources and restarts the process.
- Breaking Circular Wait by assigning a priority to resources and requiring transactions to request resources in increasing order of priority.
Detect & Recovery:
Deadlock detection involves periodically examining the system's state to identify if a deadlock has occurred. If a deadlock is detected, recovery strategies are implemented to resolve it. Common approaches include:
- Rollback: Roll back one or more transactions to a previous consistent state, releasing the resources held by those transactions.
- Resource Preemption: Forcefully reclaim resources from transactions to break the circular wait and resolve the deadlock.
- Transaction Termination: Terminate one or more transactions involved in the deadlock to release their resources.

These methods provide different approaches for managing deadlocks, offering a balance between prevention measures and post-detection recovery strategies.