Question
[Q.7] Answer the following questions (a) Explain 2PL in the context of database concurrency control (b)...
Answers
Answer for 3 (a)
First of all, let me explain you a bit about concurrency and why it matters in database systems (I'll use DBMS as its acronym).
Consider a case of a simple system where there's one queue of operations taking place sequentially. This situation is similar to simple tasks such as queuing up for a metro ticket etc. And these operations require a sequential algorithm to manage the process.
A sequential algorithm doesn't have to worry about other simultaneously occuring processes as it deals with just one main thread, or the primary component of the system.
But in a world of smartphones and IOT, simple DBMSs aren't simply enough to cope up with increasing complexity of systems. One such example is our smartphone, and all the simultaneous and concurrent processes it handles simultaenously. It allows for an environment to make many things happen at the same time, and this is only possible because of simultaneous databases that allow dedicated DBMS systems for each user.
Grid computing and cloud computing are two areas where concurrent DBMSs are used that allow for various sub- components (two or more) to access the same data that are executed with time overlap. it allows for the integrity of DBMSs without violating any rules prescribed by the specific DBMS.
So, concurrent DBMSs are used to control concurrency of multiple ovelapped transactions by defining a strict set of rules known as concurrency control protocols. These rules safeguard the 3 out of 4 pillars of secure databse transactions-
- Atomicity
- Isolation
- Serializibility
Now, these protocols can broadly be divided in two parts-
- Lock Based protocols- No transaction can read/write data until it acquires a lock over it
- Time stamp based protocols- transactions use system time/logical counter as the timestamp
The locks are further of two types-
Binary lock is where the data is either locked or unlocked,
Shared lock is where the operation on data determined what type of lock it has. For example, a write operation would have an exclusive lock. A read permission is shared lock, and for a transaction to acquire an exclusive lock, it must first have a shared lock permission (read permission).
Two Phase Locking Protocol (2PL)
This protocol divides the entire execution time of transaction in three phases, or parts.These phases form the complete cycle of transaction completion, and begins when the transaction begins executing. At this phase, the protocol seeks permissions for the locks it requires.
The second part is where transaction aquires all the locks. This is a a bit longer process, and keeps going until the protocol has acquired all the locks it needs.
And when it releases the first lock, that is when the third phase begins. In this phase, no new locks can be demanded and the protocol can only release the acquired lock.
Thus, this protocol has two phases- GROWING and FALLING/SHRINKING. The growing phase is where protocol acquires all the locks, and shrinking phase is where it releases the locks.
Answer for 3 (b)
Isolation is one of the four pillars of every DBMS, sequential or concurrent- ACID. Isolation determines the visibility of a transaction to other systems/users. A lower isolation level allows every user to access the same data, thus highly risking the data privacy and security of the system. However, a higher isolation level reduces the types of concurrency over the data but requires more resources and is slowed than lower isolation levels.
Isolation protocols help safeguard the data from unwanted transactions. They maintain the integrity of every data by defining how and when the changes made by one operation are visible to the others.
Answer for 3 (c)
Ideally, the transaction should take place in such a way that its the only transaction accessing resources in a database system. To understand the 4 levels of isolation, it is important we understand what kind of phenomenon may occur in concurrent data types.
Type 1 (dirty read)- Lets say Transaction T1 updates a row and leaves it uncommitted. And T2 transaction sees the change (reads the updated row). If T1 rolled back the change, T2 will have seen a data that never existed. This situation is called dirty read.
Type 2 (non-repeatable read) - It occurs when a transaction reads the same value twice, and gets a different result everytime.
Type 3 (phantom read)- It occurs when the same queries are executed but the rows received by the two are different.
Now based on these three phenomemon, we have four levels of isolation-
- Read uncommitted- it is the lowest level of isolation. At this level, dirty reads are allowed. That means one can read the uncommitted changes made by another.
- Read committed- It allows no dirty reads, and clearly states that any uncommitted data is committed at the moment it is read.
- Repeatable read- This is most restricted level of isolation. The transaction holds read locks on all the rows it referances, and write locks over all the rows it updates/inserts/deletes. So, there's no chance of non-repeatable reads.
- Serializable- The highest level of civilization. it demands that all concurrent transactions be executing serially,
Locks acquired Locks released phase 2 phase 1 GROWING phase 3 SHRINKING