Subscribe by Email


Showing posts with label Database strategy. Show all posts
Showing posts with label Database strategy. Show all posts

Wednesday, September 17, 2014

What are some challenges with respect to database concurrency control?

During the sequential execution of the transactions, the execution time periods do not overlap each other and therefore the transaction concurrency does not exist. But if we allow interleaving of the transactions in a manner that is not controlled properly, we are bound to get undesirable results. There are many situations in which concurrency of a database might be harmed.
In this article we discuss such challenges. The transaction models based upon the ACID rules have proved to be quite durable in their course of time. They serve as a foundation for the present database and transaction systems. Many types of environments, parallel as well as distributed have used this basic model for implementing their own complex database systems. These environments however require some additional techniques such as the two - phase commit protocol. Though this model has been a great success, it suffers from few limitations.
The model lacks flexibility and thus is not able to model some particular kinds of interactions between the organization and the complex systems. Also in collaborative environments, a piece of data cannot be strictly isolated even if it is desirable. Also since the ACID transaction model suits well for the systems with short and simple transactions, it is not so appropriate for the workflow management systems.
Such systems require rich transaction models with multi – level notion. For other environments for e.g., the mobile wireless networks also this model does not suffice. In such environments expectations of having large disconnection period are higher. Then we have the internet which is a loosely coupled WAN (wide area network) which too cannot fully adopt ACID model because the availability is low. We require techniques which can help the ACID model in adjusting to such extremely varying situations.
Research is going on new techniques that help in maintaining concurrency control as well as recovering in dissemination – oriented environments, heterogeneous systems and so on. Another problem is that this model is not capable of exploiting the data and application semantics through a general mechanism. Knowledge about this can help a great deal in improving the performance of the system by a significant margin. A separate research has been going on over the subject of recovery and concurrency control. The DBMS’ recovery component is responsible for durability as well as atomicity of the ACID transactions. Distinguishing between the volatile storage and the non – volatile storage becomes absolutely necessary. The below mentioned three types of failure pose a challenge for the proper working of a DBMS:
- Transaction failure: In some cases it happens that the transaction during execution reaches a state from where it cannot commit successfully. In such cases all the updates made by that transaction have to be erased from the database as a measure of atomicity preservation. This is called transaction rollback.
- System failure: A system failure often causes a loss of volatile memory contents. It has to be ensured that the updates made by all the transactions before the occurrence of the crash persist in the database and the updates made by the unsuccessful transactions have been removed.
- Media failure: In these failures, the non – volatile storage gets corrupted which makes it impossible to recover the online version of the data. Here the option is to restore the database from an archive and using operation logs all the updates must be made.
Recovery from the last kind of failure requires using other additional mechanisms. For all this to take effect, the recovery component has to be reliable. Flaws in the recovery system of a database put it to a high risk of losing data.


Tuesday, September 16, 2014

What is 2 phase locking with respect to database concurrency control?

2PL or Two – phase locking is the most widely used concurrency control method. This method is used for implementing the serializability theory. The protocol makes use of two types of locks that the transactions apply to data which leads to blocking of the other transactions from accessing the same data item during the execution period of that transaction.
The 2PL protocol works in the following 2 phases:
- The expanding phase: In this phase the locks are only acquired and there is no releasing.
- The shrinking phase: In this phase the locks are only released and not acquired.

Shared locks (S) and exclusive locks (X) are the two types of lock used by this protocol. However many refinements have been produced of this protocol which utilize more than one type of lock. Since 2PL blocks processes using locks, it can lead to deadlocks as a result of the blocked transactions. SS2PL (strong strict two – phase locking) is combined to form the 2PL and is also known as rigorousness. It is mostly used for maintaining concurrency control in the database systems. The protocol has a number of variants, the most common being the strict 2PL, which is a combination of 2PL and strictness. A schedule that obeys the protocol is said to be serializable.
In a typical transaction, when the phase – 1 of transaction ends and there is no explicit information available, it is said to be in a ready state i.e., it can commit now without requiring any more locks. In such cases, we can end the phase-2 immediately or sometimes it might not even be required. In other cases where more than one processes are involved, we determining the end of phase – 1 and begin releasing of the locks with the help of a synchronization point. If this is not done, we violate the serializability and strict 2PL rules. But, determining such a transaction point is very costly and therefore the transaction end is merged with the end of phase – 1 eliminating the need of phase – 2.
Thus 2PL is turned in to SS2PL. In S2PL, the transactions must release their locks (X locks) after they have completed their write operation either by aborting or committing. The read locks (S) on the other hand are released on regular basis in phase 2.  Explicit phase – 1 end support is required for implementation of the general S2PL.
Strong strict 2Pl is also known as rigorous two – phase locking, rigorousness or rigorous scheduling and so on. Both the read and write locks are released after the completion of the transaction by the protocol. A transaction that complies with SS2PL is the one having only phase – 1 during its entire life time and no phase – 2. The class of schedules exhibiting the SS2PL property is called rigorousness. S2PL is a superset of SS2PL classes. This one has been the concurrency control mechanism choice for most of the database designers. The main advantage is that it provides strictness apart from serializability.
These two properties are very much necessary for an efficient recovery of the database as well as in commitment ordering. Global serializability and distributed serializability solutions are used for distributed environments. The down side of 2PL protocol is deadlocks. The data access operations are blocked by the locks resulting in a deadlock.  In this situation none of the blocked transactions can reach completion. Thus resolving the deadlocks effectively is a major issue. It can be resolved by aborting one of the locked transactions, thus eliminating the cycle in the precedence graph. The wait -  for -  graphs are used for detecting deadlocks. 


Friday, September 12, 2014

What are some major goals of database concurrency control?

Serial or sequential execution of transactions that access the database have no overlapping of time periods and therefore, no concurrency can be achieved in database systems following such a simple execution mechanism. But at the same time, if you start working out the combinations, if concurrency does exist by allowing interleaving of the operations of the transactions, we risk getting some undesirable results because of the improper control over concurrency. Below we give some examples:
- The dirty read problem: One transaction say A reads a data item which has been already written by another transaction say B which later aborted. This value is a result of an aborted operation and therefore is not valid and should not be read by other transactions. This is called dirty read. As a result of this, the transaction B will produce incorrect results.
- The lost update problem: A transaction B writes a data item for the first time which has already been written by another concurrent transaction A while still in progress. In this case we lose the value written by transaction B to transaction A because of overwriting.  According to the rules of precedence the first value should be read by the transactions first before the second value. As a consequence of this, the transactions yield wrong results.
- The incorrect summary problem: We have one transaction A in execution that considers all the values of the multiple instances of a single data item and we have another transaction B whose operations change the value of some of the instances. Therefore the end result does not reflect the correct summary. This is necessary for correctness.  Also it is possible that certain results might have not been included in the summary depending on the time instances at which the updates were made.

The database systems that require high transactional performance need that there concurrent transaction are executed properly so as to meet the goals. In fact, for modern businesses, a database cannot even be considered which does not meet such goals. Now what are these goals? Let us see below:
- Correctness: For attaining this goal, it is important that the system allows the execution of only the serializable schedules. If there is no serializability, we might face the above listed problems. Serializability can be defined as the equivalency of a schedule to some serial schedule having the same transactions. The transactions must be sequential in nature void of any time overlaps and isolated. The highest isolation level can be obtained only through serializability. In some cases, serializability might be cut down for allowing system to give better performance. It might also be relaxed in distributed environments for satisfying their availability. But this is done only on the condition that there is no compromise with correctness. A practical example is of the transactions involving money. If we relax serializability here, money can be transferred to the wrong account. Serializability is achieved by concurrency control mechanisms by means of the conflict serializability. This is nothing but a special case of serializability.
- Recoverability: This is another major goal which insists that after a crash the database must be able to recover efficiently without losing the effects of the successfully committed transactions. Recoverability ensures that the database system is able to tolerate the faults and might not get corrupted because of the media failure. The responsibility of the protection of the data that system stores in its database is given to the recovery manager. This component works together with the concurrency control component for providing an all time available and reliable access to data. Together these two components keep the database system in a consistent state.

The database systems must ensure that the integrity of the transaction is not affected by any kind of fault. In the modern world, with high frequency of transactions, and large monetary values at stake, any problem or violation of the above goals can be very expensive.


Facebook activity