Software Product Development | Software Testing Tutorial | Software Process: Database Concurrency

Showing posts with label Database Concurrency. Show all posts

Tuesday, September 23, 2014

What is concurrency control with respect to databases?

Concurrency control is a technique applied in the fields of operating systems, databases, computer programming, and multi processing etc. for ensuring that concurrent operations produce the correct results in as less time as possible. Both the hardware and the software parts of the computer are made up of smaller modules and components where each of them is designed and programmed respectively to work correctly according to some consistency rules. There is concurrent interaction between these components through messages, shared data, which can lead to a result which is in violation of those rules.
The basic idea of concurrency control is to provide methodologies, theories and rules for enforcing consistency in the whole system. Implementation of concurrency control reduces the performance because we apply some constraints on the components which does have the effect of reducing the overall speed. However one thing that should be taken care of is to achieve consistency with as much efficiency as possible and without reducing the performance below minimum levels.
The drawbacks of concurrency control include additional complexity and generation of more overhead in using a concurrent algorithm. The concurrent algorithms generate more overhead when compared to their sequential algorithm counterparts. If the concurrency control mechanism fails, it can lead to torn read and write operations and can corrupt the data.
In this article we talk about concurrency with respect to the databases. Concurrency control is implemented in DBMS, distributed applications etc. for ensuring that the concurrent data transactions are accomplished without causing damage to the data integrity. Distributed applications include cloud computing and grid computing. Concurrency control is also used in some other transactional objects.
It is an essential part of the systems where two or more transactions overlap over the same time instant and can operate on the same data. This happens in almost any general purpose database management system.
Since the advent of database systems, research has been going on related to this concept. Serializability theory is the best established theory that helps define the concept of concurrency control. This theory also lets us in designing as well as analyzing the concurrency control methods as well as mechanism as effectively as possible. There is another theory that does not emphasize upon concurrency control over the abstract data types but rather over atomic transactions. However this theory though having a wider scope and more refined, it adds more complexity to the system. Both the theories have their advantages and disadvantages. Merging these two theories might help because they are complementary to some extent.
For ensuring proper concurrency control and correct execution of the transactions, only the serializable transactions and schedules are generated by the system and executed. In some cases, the serializability might be relaxed intentionally by the system for increasing the performance. But, this is done only in those cases where it won’t generate incorrect output.
There are many cases when the transactions fail. Here the system needs to have a recoverability property for recovering from the damage. A good database system also ensures that the result of the transactions that have committed is not lost if the system is switched off accidentally or crashes. On the other hand it also ensures that the incomplete results of the aborted transactions are erased and the actions are rolled back. The ACID rules (mentioned below) characterize the transactions:
- Atomicity: Each thread consists of a single transaction.
- Consistency: This characteristic depends largely on user.
- Isolation: Every transaction should be executed in isolation i.e., should not interfere with others.
- Durability: The results of the committed actions should persist.
Nowadays as, database systems are becoming more distributed, the focus is more upon the distribution of the concurrency control mechanism.

Saturday, September 20, 2014

What are some concurrency control mechanisms (with respect to databases)?

In this article we discuss about different types of concurrency control mechanisms that are implemented in databases.
- Optimistic: This type of concurrency control method delays transaction checking. It does not check the integrity and isolation rules such as the recoverability, serializability etc. of the transaction until it has completed executing all of its associated operations. Also, any of the operations of the transaction are not blocked. If, upon commitment the transaction’s operations are violating these rules, the transaction is aborted. Immediately after being aborted, this transaction is executed again though this generates a restart and re – execution overhead. This mechanism is good to follow if there not too many abortions.
- Pessimistic: The concurrency control methods falling under this category block a transaction from carrying out any transaction if it is suspected to violate the rules. The transaction is not allowed to execute until the probability of rule violation becomes zero. However such prolonged blocking of the transactions reduces the performance drastically.
- Semi – optimistic: These concurrency control mechanisms consider blocking the transaction only in some situations where it is important to do so. Transactions are not blocked when rules are being checked as in optimistic concurrency control mechanisms.

The performance of these different types of concurrency control mechanisms is different. By this we mean they all have different throughputs (rate of transaction completion). This depends on various factors such as the level of parallelism, transaction types mix etc. The trade – offs between the various categories should be considered and the one providing highest performance in the particular situation should be chosen. Two transactions mutually locking each other result in a deadlock. In such a situation the involved transactions go on waiting forever and are not able to complete. The concurrency control mechanisms that are non – optimistic are observed to have more deadlocks. The transactions have to be aborted for resolving the deadlocks. All this deadlocks, blocking, resolving introduces delays in performance and these are the major trade – off factors between the types. Below we mention some major concurrency control methods which many variants falling under the above mentioned categories:
- Locking: Its variants include the two – phase locking (2PL). This mechanism facilitates the access to the locks on data acquired by the transactions. If a transaction tries to acquire a lock over a piece of data already locked on by another transaction, it is blocked till the latter transaction releases its lock. This however depends on the type of access operation and type of lock.
- Serialization graph checking or precedence graph checking: This mechanism checks out for any cycles in the graph of the schedule and if found breaks them by aborting the involved transactions.
- Timestamp ordering: The timestamps are assigned to the transactions and access to data is kept under control by constant checking and time stamping.
- Commitment ordering: The transactions are checked in the order of their commitment so as to maintain their compatibility with their precedence order.

There are some other concurrency control methods are used along with the above mentioned types:
- Index concurrency control: The access operations are synchronized with the indexes instead of synchronizing with the user data. Performance can be gained using specialized methods.
- Multi – version concurrency control or MVCC: Each time an object is written, it generates a new copy of that object so that the other transactions can still read the object. This increases concurrency without compromising with the performance.
- Private workspace model: A private workspace is maintained by each transaction for accessing the data. Any changes made to the data become visible to the outside transactions after the transaction commits.

Friday, September 19, 2014

Best practices for concurrency control with respect to databases?

Today almost all the service – oriented businesses have grown highly dependent on reliable and speedy access to their data. Most of the global enterprises need this access to databases on a 24x7 level without interruptions. These reliability, availability and performance needs of the organizations are met by database management systems (DBMS). Thus, a DBMS is responsible for two things. Firstly, for protecting data that it stores and for providing correct, reliable and all – time access to this data. The concurrency control and recovery mechanisms of DBMS are responsible for carrying out these functions properly.
Concurrency control mechanism ensures that you get to see the execution of only your transaction even though 100s of users are accessing the database at the same time. The recovery mechanism ensures that the database is able to recover from all the faults. It is because of the existence of these functionalities that the programmers feel free to add new parts to the system without having much to worry about. A transaction is nothing but a unit of work which consists of several operations and updates. Every transaction is expected to obey the ACID rules. In this article we discuss about some best practices for concurrency control.
- Two phase locking: Locking is perhaps the most widely used technique for maintaining control over concurrency matters. This mechanism provides two types of locks namely, the shared lock (S), and the exclusive lock (X). The compatibility matrix defines the compatibility of these two locks. According to the compatibility matrix, S locks can be held by two different transactions at the same time but this is not possible in the case of X locks for the same data item. With this policy multiple read operations can be carried out concurrently. In other words read access to an item is protected by the S locks. On the other side, the write access is protected by the exclusive locks. In simple words, no other transaction can obtain a lock on a data item which has already been locked by another transaction if the two locks are conflicting. A transaction requesting for a lock at an instant when it cannot be granted, it is blocked by the mechanism until the other transaction releases its lock.
- Hierarchical locking: Practically, the notion of the conflicting locks works at different levels of granularity. Deciding for proper granularity for locking an item generates a locking overhead and might interfere with concurrency. Locking at the granularity of a tuple (one row) allows the system to keep concurrency at the maximum level. The disadvantage of this locking mechanism is that for a transaction to access multiple tuples, it needs to lock all those tuples. This will require issuing same number of calls to the lock manager generating a substantial overhead. This can be avoided by considering a coarser granularity but at the expense of more false conflicts.

The two phase locking is categorized under the pessimistic technique as it assumes that there will be interference among the transactions and takes measures against the same. Optimistic concurrency control provides an alternative to this. With this, the transactions can carry out the operations without having to acquire locks. For ensuring that there is no violation of serializability, a validation phase is performed before the transactions can commit. A number of optimistic protocols have been proposed. The validation process makes sure that the read and write operations of two transactions running concurrently do not conflict. If such a conflict is determined during validation, the transaction is immediately aborted and forced to restart. Thus, for ensuring isolation, optimistic mechanisms rely on restarting the transaction whereas the locking policies use blocking strategy.

Wednesday, September 17, 2014

What are some challenges with respect to database concurrency control?

During the sequential execution of the transactions, the execution time periods do not overlap each other and therefore the transaction concurrency does not exist. But if we allow interleaving of the transactions in a manner that is not controlled properly, we are bound to get undesirable results. There are many situations in which concurrency of a database might be harmed.
In this article we discuss such challenges. The transaction models based upon the ACID rules have proved to be quite durable in their course of time. They serve as a foundation for the present database and transaction systems. Many types of environments, parallel as well as distributed have used this basic model for implementing their own complex database systems. These environments however require some additional techniques such as the two - phase commit protocol. Though this model has been a great success, it suffers from few limitations.
The model lacks flexibility and thus is not able to model some particular kinds of interactions between the organization and the complex systems. Also in collaborative environments, a piece of data cannot be strictly isolated even if it is desirable. Also since the ACID transaction model suits well for the systems with short and simple transactions, it is not so appropriate for the workflow management systems.
Such systems require rich transaction models with multi – level notion. For other environments for e.g., the mobile wireless networks also this model does not suffice. In such environments expectations of having large disconnection period are higher. Then we have the internet which is a loosely coupled WAN (wide area network) which too cannot fully adopt ACID model because the availability is low. We require techniques which can help the ACID model in adjusting to such extremely varying situations.
Research is going on new techniques that help in maintaining concurrency control as well as recovering in dissemination – oriented environments, heterogeneous systems and so on. Another problem is that this model is not capable of exploiting the data and application semantics through a general mechanism. Knowledge about this can help a great deal in improving the performance of the system by a significant margin. A separate research has been going on over the subject of recovery and concurrency control. The DBMS’ recovery component is responsible for durability as well as atomicity of the ACID transactions. Distinguishing between the volatile storage and the non – volatile storage becomes absolutely necessary. The below mentioned three types of failure pose a challenge for the proper working of a DBMS:
- Transaction failure: In some cases it happens that the transaction during execution reaches a state from where it cannot commit successfully. In such cases all the updates made by that transaction have to be erased from the database as a measure of atomicity preservation. This is called transaction rollback.
- System failure: A system failure often causes a loss of volatile memory contents. It has to be ensured that the updates made by all the transactions before the occurrence of the crash persist in the database and the updates made by the unsuccessful transactions have been removed.
- Media failure: In these failures, the non – volatile storage gets corrupted which makes it impossible to recover the online version of the data. Here the option is to restore the database from an archive and using operation logs all the updates must be made.
Recovery from the last kind of failure requires using other additional mechanisms. For all this to take effect, the recovery component has to be reliable. Flaws in the recovery system of a database put it to a high risk of losing data.

Tuesday, September 16, 2014

What is 2 phase locking with respect to database concurrency control?

2PL or Two – phase locking is the most widely used concurrency control method. This method is used for implementing the serializability theory. The protocol makes use of two types of locks that the transactions apply to data which leads to blocking of the other transactions from accessing the same data item during the execution period of that transaction.
The 2PL protocol works in the following 2 phases:
- The expanding phase: In this phase the locks are only acquired and there is no releasing.
- The shrinking phase: In this phase the locks are only released and not acquired.

Shared locks (S) and exclusive locks (X) are the two types of lock used by this protocol. However many refinements have been produced of this protocol which utilize more than one type of lock. Since 2PL blocks processes using locks, it can lead to deadlocks as a result of the blocked transactions. SS2PL (strong strict two – phase locking) is combined to form the 2PL and is also known as rigorousness. It is mostly used for maintaining concurrency control in the database systems. The protocol has a number of variants, the most common being the strict 2PL, which is a combination of 2PL and strictness. A schedule that obeys the protocol is said to be serializable.
In a typical transaction, when the phase – 1 of transaction ends and there is no explicit information available, it is said to be in a ready state i.e., it can commit now without requiring any more locks. In such cases, we can end the phase-2 immediately or sometimes it might not even be required. In other cases where more than one processes are involved, we determining the end of phase – 1 and begin releasing of the locks with the help of a synchronization point. If this is not done, we violate the serializability and strict 2PL rules. But, determining such a transaction point is very costly and therefore the transaction end is merged with the end of phase – 1 eliminating the need of phase – 2.
Thus 2PL is turned in to SS2PL. In S2PL, the transactions must release their locks (X locks) after they have completed their write operation either by aborting or committing. The read locks (S) on the other hand are released on regular basis in phase 2. Explicit phase – 1 end support is required for implementation of the general S2PL.
Strong strict 2Pl is also known as rigorous two – phase locking, rigorousness or rigorous scheduling and so on. Both the read and write locks are released after the completion of the transaction by the protocol. A transaction that complies with SS2PL is the one having only phase – 1 during its entire life time and no phase – 2. The class of schedules exhibiting the SS2PL property is called rigorousness. S2PL is a superset of SS2PL classes. This one has been the concurrency control mechanism choice for most of the database designers. The main advantage is that it provides strictness apart from serializability.
These two properties are very much necessary for an efficient recovery of the database as well as in commitment ordering. Global serializability and distributed serializability solutions are used for distributed environments. The down side of 2PL protocol is deadlocks. The data access operations are blocked by the locks resulting in a deadlock. In this situation none of the blocked transactions can reach completion. Thus resolving the deadlocks effectively is a major issue. It can be resolved by aborting one of the locked transactions, thus eliminating the cycle in the precedence graph. The wait - for - graphs are used for detecting deadlocks.

Sunday, September 14, 2014

What is Timestamp ordering method for database concurrency control?

Concurrency control methods with respect to the database systems can be divided in to two categories namely:
- Lock concurrency control methods: Example, 2PL, SS2PL etc.
- Non – lock concurrency control methods: Example, timestamp ordering method.
In this article we focus upon the latter one (Non – lock concurrency control methods). This method is widely used for handling transactions safely with the help of time stamps. Let us see how it operates.

The timestamp ordering mechanism makes three assumptions prior to operating:

- The value of every timestamp is unique and the time instant it represents is accurate.
- There are no duplicate timestamps.
- The lower – valued time stamp occurs before a higher – valued timestamp.

There are three methods that for generating the timestamps:

- It takes the values from the system clock as timestamp when the transaction starts.
- It uses a thread – safe shared counter as timestamp and it is incremented when the transaction starts.
- It combines the above two methods.

Formal: A transaction is considered to have an ordered list of operations. Before the first operation begins, the current timestamp is marked on the transaction. An initial empty set of transactions and empty set of objects is assigned to every transaction on which it depends and updates respectively. Two timestamp fields are given to each object which are meant to be used only for concurrency control.

Informal: A timestamp is assigned to the transaction at the very beginning. This makes it possible to tell in which order the transactions have to be applied. So when we have two transactions meant for operating on the same object, the one with the lower timestamp is executed first. However, if this transaction is incorrect, then it must be aborted immediately and restarted. The object has one read timestamp and one write timestamp, both of which are updated when the corresponding read and write operations are carried out.

Two cases are considered when an object has to be read by a transaction:
- The transaction starts prior to the write timestamp: This indicates that the data of the object was changed by something. The transaction is aborted and then restarted.
- The transactions starts after the write timestamp: The transaction can safely read the object. The read timestamp is changed to transaction timestamp.

The following cases are considered when an object has to be written or updated by a transaction:
- Transaction starts prior to read timestamp: This indicates that the object has been read by something. Assuming that the reading transaction has a copy of the data, we don’t write to it for preventing changes from being made to the copy. So we abort and restart the transaction.
- Transaction starts prior to the write timestamp: This means that the object was changed by something at the starting time of our transaction. Here we apply the Thomas write rule, skipping the current operation and then continuing as normal. Aborting and restarting is not required.
- The transaction changes the object and its timestamp is written over by the write timestamp.

Recoverability: In this concurrency control method, recoverable histories are not produced. To make recovery possible, we have to employ a scheduler for keeping a list of transactions having read from. Unless there are only committed transactions in the list, a transaction should not be allowed to commit. Also the data produced by uncommitted transactions can be tagged as dirty and read operations should be banned from using such data as a measure against cascading aborts. The scheduler should not permit transactions to carry out any operations on dirty data so as to maintain a strict history.

Friday, September 12, 2014

What are some major goals of database concurrency control?

Serial or sequential execution of transactions that access the database have no overlapping of time periods and therefore, no concurrency can be achieved in database systems following such a simple execution mechanism. But at the same time, if you start working out the combinations, if concurrency does exist by allowing interleaving of the operations of the transactions, we risk getting some undesirable results because of the improper control over concurrency. Below we give some examples:
- The dirty read problem: One transaction say A reads a data item which has been already written by another transaction say B which later aborted. This value is a result of an aborted operation and therefore is not valid and should not be read by other transactions. This is called dirty read. As a result of this, the transaction B will produce incorrect results.
- The lost update problem: A transaction B writes a data item for the first time which has already been written by another concurrent transaction A while still in progress. In this case we lose the value written by transaction B to transaction A because of overwriting. According to the rules of precedence the first value should be read by the transactions first before the second value. As a consequence of this, the transactions yield wrong results.
- The incorrect summary problem: We have one transaction A in execution that considers all the values of the multiple instances of a single data item and we have another transaction B whose operations change the value of some of the instances. Therefore the end result does not reflect the correct summary. This is necessary for correctness. Also it is possible that certain results might have not been included in the summary depending on the time instances at which the updates were made.

The database systems that require high transactional performance need that there concurrent transaction are executed properly so as to meet the goals. In fact, for modern businesses, a database cannot even be considered which does not meet such goals. Now what are these goals? Let us see below:
- Correctness: For attaining this goal, it is important that the system allows the execution of only the serializable schedules. If there is no serializability, we might face the above listed problems. Serializability can be defined as the equivalency of a schedule to some serial schedule having the same transactions. The transactions must be sequential in nature void of any time overlaps and isolated. The highest isolation level can be obtained only through serializability. In some cases, serializability might be cut down for allowing system to give better performance. It might also be relaxed in distributed environments for satisfying their availability. But this is done only on the condition that there is no compromise with correctness. A practical example is of the transactions involving money. If we relax serializability here, money can be transferred to the wrong account. Serializability is achieved by concurrency control mechanisms by means of the conflict serializability. This is nothing but a special case of serializability.
- Recoverability: This is another major goal which insists that after a crash the database must be able to recover efficiently without losing the effects of the successfully committed transactions. Recoverability ensures that the database system is able to tolerate the faults and might not get corrupted because of the media failure. The responsibility of the protection of the data that system stores in its database is given to the recovery manager. This component works together with the concurrency control component for providing an all time available and reliable access to data. Together these two components keep the database system in a consistent state.

The database systems must ensure that the integrity of the transaction is not affected by any kind of fault. In the modern world, with high frequency of transactions, and large monetary values at stake, any problem or violation of the above goals can be very expensive.

Monday, August 25, 2014

What is index concurrency control?

In this article we will discuss about the index concurrency control method for controlling database concurrency. Index as we know is a data structure that is used for easy navigation through the user data in a database. Index data should not be confused with user data. The difference between the two is that the former primarily consists of pointers.
Indexes have to be updated if any changes including delete, insert or modify are made to the database files so that user data can be accurately accessed. The index integrity is maintained by the means of a technique called the index locking. While a database transaction is taking place, lock is placed on a certain portion of the index. This is the portion that the transaction accesses in turn to access user data related to it. On top of this, for modifying and maintaining an index, special database system transactions are called. This is done by the system as a part of its self – maintenance routine. When a transaction locks a part of the index, the access to this portion is blocked to the other transactions. Thus, the other transactions cannot read or modify that portion. Only read operations can be performed if the lock is a shared one.
Indexes can be accessed using the techniques specializing in concurrency control. These techniques perform based upon the structure and type of the index. These techniques when applied to indexes are more effective when compared to application on user data. For B – trees we have specialized techniques that are effective in B – Tree concurrency control. For maintaining coordination between the threads that want to access the same indexes, index locks are used. The duration of the index locks is less than the duration of the usual transaction locks. Sometimes these index locks are also known as latches. Real time database systems are the ones that are most dependent on indexes for speeding up the access to data.
Index concurrency control also helps these systems in completing as many transactions as possible before deadline. For the prevention of the index contention so that it does not become a problem, we have special protocols called the high performance ICC (index concurrency control) protocols. By means of a detailed simulation model, real time variants of ICC protocols for B – Tree can be created and also their performances can be compared. GUARD – link is an ICC protocol for real time systems which can be both evaluated as well as presented. The classical B – link protocol is augmented with the admission control mechanism based upon feedback using this protocol. The ICC protocols are evaluated using certain performance metrics which are the percentage of the missed transactions. Sometimes the metric might be the percentage of the fairness w.r.t. type and size of the transaction.
According to some experiements, there is a difference between the real time ICC protocols’ performance characteristics and the performance characteristics of these same protocols in general purpose database systems. A thing to be noted about B link protocols is that they perform best when implemented in the conventional database systems, whereas in real time systems their performance is very poor since the load is too heavy. This GUARD – link protocol provides an all – new approach even though it has been developed on the grounds of B – link approach. It has been found after an experiment that this is the protocol that gives best performance under all conditions be it less or heavy real time workload. This is all because of its admission control mechanism.

Friday, March 11, 2011

What are different types of architectural pattern domains?

Architectural patterns define a specific approach for handling some behavioral characteristics of the system. A software architecture may have a number of architectural patterns that address issues such as concurrency, persistence and distribution.

CONCURRENCY
To promote parallelism, applications must handle multiple tasks and there are different ways to handle concurrency by an application and each way can be presented by a different architectural pattern.
- One approach is to use an operating system process management pattern that provides built-in operating system features allowing components to execute concurrently.
- Another approach is to define a task scheduler at the application level.

PERSISTENCE
Data persists if it survives past the execution of the process that created it. Persistent data can be read or modified by other processes at a later time as they are stored in a database or file. Persistence can be achieved through two architectural patterns:
- database management system pattern that applies the storage and retrieval capability of a database management system to the application architecture.
- application level persistence pattern that builds persistence features into application architecture.

DISTRIBUTION
The distribution problem addresses in a manner in which systems or components communicate with one another in a distributed environment. The most common architectural pattern established to address distribution problem is the broker pattern. Broker acts as middle man between client and server component.

Friday, September 11, 2009

Introduction to Database Concurrency

DATABASE CONCURRENCY: - Database concurrency is the particular situation when a single database is being accessed by multiple programs. Databases, by design in most cases are shared resources, but in this case, they are shared across multiple applications.
Database concurrency controls ensure that transactions occur in an ordered fashion. The main job of these controls is to protect transactions issued by different users/applications from the effects of each other. They must preserve the four characteristics of database transactions: atomicity, isolation, consistency and durability. Concurrency control is one of the main issues in the study of real time database systems. In addition to satisfying consistency requirements as in traditional database systems, a real time transaction processing system must also satisfy timing constraints.

Conflicts between transactions can be detected in two ways.
Pessimistic method detects conflicts before making access to the data object. When a transaction requests access to some data item, the concurrency control manager will examine this request and will determine whether to grant the request or not.
Optimistic schemes are designed to get rid of the locking overhead. They are optimistic in the sense that they take into account the explicit assumption that conflicts among transactions are rare events. The task of concurrency control is deferred until the end of transaction when some checking for potential conflicts has to take place and will be resolved accordingly, taking into consideration the amount of progress that has been done and the nature of conflict with transactions.
When concurrency control detects a conflict among some concurrent transactions accessing the same object, a conflict resolution mechanism needs to be put on. Concurrency control manager decides which transaction (victim) to penalize (the lock holder or the requester) and chooses an appropriate action and suitable timing. Two possible actions are most used: Blocking (wait) and abort (restart). In pessimistic concurrency control either blocking or abort can be used to resolve the conflict. However, in optimistic concurrency control only aborting is appropriate since conflict has been detected after the transaction has accessed the data object and performed some computation.

OPTIMISTIC CONCURRENCY CONTROL : The basic idea of an optimistic concurrency control mechanism is that the execution of a transaction consists of three phases: read, validation and write phases. For all optimistic concurrency control (OCC) schemes a conflict is detected after the data object has been accessed. In the OCC, conflict detection and resolution are both done at the certification time when a transaction completes its execution; it requests the concurrency control manager to validate all its accessed data objects. If it has not yet been marked for abort, it enters the commit phase where it writes all its updates to the database.

Software Product Development | Software Testing Tutorial | Software Process