What Is Database Concurrency?
Ever wondered how online platforms such as ticketing and shopping websites handle millions of simultaneous transactions without crashing or mixing up vital details? The answer lies in database concurrency.
Simply put, it is the ability of a database to handle multiple simultaneous read/write interactions from users or applications without impacting the integrity or the quality of the data stored within it.
Why Is Database Concurrency Essential?
Simultaneous interactions with a database often lead to the following problems:
Dirty Read
A dirty read occurs when one transaction reads an uncommitted or temporary value written into the database by another transaction that may be later aborted or rolled back.
For example, assume a joint bank account with your spouse, and the following events occur.
Time | Action | Account Balance |
---|---|---|
T1 | Initial account balance | 1000 |
T2 | Partner Deposits 1000 (Transaction A) | 2000 |
T3 | Purchase something worth 1500 (Transaction B) | 500 |
T4 | Deposit transaction rolls back (-1000) | -500 |
As you can see, the account balance becomes negative after transaction B commits because it reads an uncommitted value from transaction A, which was later rolled back.
Lost Data Update
A lost data update occurs when two transactions update the same data item, and one of them overwrites the changes made by the other without knowing it.
For example, assume you have two employees trying to update the product details of a specific product on your online store, and the following occurs.
Time | Employee A | Employee A | Product Price |
---|---|---|---|
T1 | Reads the product Price | 100 | |
T2 | Increases price by 10% | 110 | |
T3 | Reads the product Price @ 110 | 110 | |
T4 | Commits | 110 | |
T5 | Decreases the product price by 5% | 104.5 | |
T5 | Commits | 104.5 |
Here, the result is not what either employee wanted it to be.
Phantom Read
A phantom read occurs when one transaction reads a set of data items that satisfy some condition and finds that another transaction has inserted or deleted some data items that also satisfy the same condition in between.
For example, say you have been tasked to determine the total number of products that have been rated 4 or higher by customers. Consider the following sequence of events.
Time | Action | Number of Products with Rating >= 4 |
---|---|---|
T1 | Run the first search query | 10 |
T2 | Forward the information | 10 |
T3 | Another customer rates a new product 4 or higher | 11 |
T4 | Run the same search query again | 11 |
Unrepeatable Read
An unrepeatable read occurs when one transaction reads the same data item twice and finds different values each time because another transaction has updated the data item in between.
Time | Transaction A | Transaction B | Name on Record |
---|---|---|---|
T1 | Read name (John) | John | |
T2 | Update name (Jack) | Jack | |
T3 | Commit | Jack | |
T4 | Read name (Jack) | Jack |
Here, transaction A has read two different values for the same data item, which can cause confusion or errors when used for subsequent data operations.
So, how do you overcome these problems? The answer is Database Concurrency Control techniques. Let’s look at a few of these techniques in detail.
Database Concurrency Control Techniques
The following are the most effective database concurrency control techniques that help overcome the problems mentioned above.
Locking
Locking is a technique that prevents concurrent transactions from accessing or altering the same data item with the use of locks (read or write). These locks can be exclusive or shared and can be implemented on various levels of granularity (records, pages, tables, or databases), ensuring database serializability.
Time Stamping
Time stamping is a database concurrency control technique that assigns a time stamp to each transaction. These timestamps are used to order concurrent transactions and determine which transaction “wins” in case of conflicts. Older timestamps typically have precedence, ensuring a predictable execution order.
Multi-versioning
Multiversioning is a database technique that keeps multiple versions of each data item with a timestamp or version number. It allows concurrent transactions to access different versions of the same data item but requires more database storage space.
Validation
Validation is a technique that divides each transaction into three phases:
- Read phase: The transaction reads the data items from the database and stores them in a local buffer.
- Validation phase: The transaction checks whether it has any conflicts with other transactions that have already been committed.
- Write phase: The transaction writes the updated data items back to the database if it passes the validation.
Validation allows concurrent transactions to read the same data item without interference.