Subscribe by Email


Tuesday, September 16, 2014

What is 2 phase locking with respect to database concurrency control?

2PL or Two – phase locking is the most widely used concurrency control method. This method is used for implementing the serializability theory. The protocol makes use of two types of locks that the transactions apply to data which leads to blocking of the other transactions from accessing the same data item during the execution period of that transaction.
The 2PL protocol works in the following 2 phases:
- The expanding phase: In this phase the locks are only acquired and there is no releasing.
- The shrinking phase: In this phase the locks are only released and not acquired.

Shared locks (S) and exclusive locks (X) are the two types of lock used by this protocol. However many refinements have been produced of this protocol which utilize more than one type of lock. Since 2PL blocks processes using locks, it can lead to deadlocks as a result of the blocked transactions. SS2PL (strong strict two – phase locking) is combined to form the 2PL and is also known as rigorousness. It is mostly used for maintaining concurrency control in the database systems. The protocol has a number of variants, the most common being the strict 2PL, which is a combination of 2PL and strictness. A schedule that obeys the protocol is said to be serializable.
In a typical transaction, when the phase – 1 of transaction ends and there is no explicit information available, it is said to be in a ready state i.e., it can commit now without requiring any more locks. In such cases, we can end the phase-2 immediately or sometimes it might not even be required. In other cases where more than one processes are involved, we determining the end of phase – 1 and begin releasing of the locks with the help of a synchronization point. If this is not done, we violate the serializability and strict 2PL rules. But, determining such a transaction point is very costly and therefore the transaction end is merged with the end of phase – 1 eliminating the need of phase – 2.
Thus 2PL is turned in to SS2PL. In S2PL, the transactions must release their locks (X locks) after they have completed their write operation either by aborting or committing. The read locks (S) on the other hand are released on regular basis in phase 2.  Explicit phase – 1 end support is required for implementation of the general S2PL.
Strong strict 2Pl is also known as rigorous two – phase locking, rigorousness or rigorous scheduling and so on. Both the read and write locks are released after the completion of the transaction by the protocol. A transaction that complies with SS2PL is the one having only phase – 1 during its entire life time and no phase – 2. The class of schedules exhibiting the SS2PL property is called rigorousness. S2PL is a superset of SS2PL classes. This one has been the concurrency control mechanism choice for most of the database designers. The main advantage is that it provides strictness apart from serializability.
These two properties are very much necessary for an efficient recovery of the database as well as in commitment ordering. Global serializability and distributed serializability solutions are used for distributed environments. The down side of 2PL protocol is deadlocks. The data access operations are blocked by the locks resulting in a deadlock.  In this situation none of the blocked transactions can reach completion. Thus resolving the deadlocks effectively is a major issue. It can be resolved by aborting one of the locked transactions, thus eliminating the cycle in the precedence graph. The wait -  for -  graphs are used for detecting deadlocks. 


Sunday, September 14, 2014

What is Timestamp ordering method for database concurrency control?

Concurrency control methods with respect to the database systems can be divided in to two categories namely:
- Lock concurrency control methods: Example, 2PL, SS2PL etc.
- Non – lock concurrency control methods: Example, timestamp ordering method.
In this article we focus upon the latter one (Non – lock concurrency control methods). This method is widely used for handling transactions safely with the help of time stamps. Let us see how it operates.

The timestamp ordering mechanism makes three assumptions prior to operating:


- The value of every timestamp is unique and the time instant it represents is accurate.
- There are no duplicate timestamps.
- The lower – valued time stamp occurs before a higher – valued timestamp.

There are three methods that for generating the timestamps:


- It takes the values from the system clock as timestamp when the transaction starts.
- It uses a thread – safe shared counter as timestamp and it is incremented when the transaction starts.
- It combines the above two methods.

Formal: A transaction is considered to have an ordered list of operations. Before the first operation begins, the current timestamp is marked on the transaction. An initial empty set of transactions and empty set of objects is assigned to every transaction on which it depends and updates respectively. Two timestamp fields are given to each object which are meant to be used only for concurrency control.

Informal: A timestamp is assigned to the transaction at the very beginning. This makes it possible to tell in which order the transactions have to be applied. So when we have two transactions meant for operating on the same object, the one with the lower timestamp is executed first. However, if this transaction is incorrect, then it must be aborted immediately and restarted. The object has one read timestamp and one write timestamp, both of which are updated when the corresponding read and write operations are carried out.

Two cases are considered when an object has to be read by a transaction:
- The transaction starts prior to the write timestamp: This indicates that the data of the object was changed by something. The transaction is aborted and then restarted.
- The transactions starts after the write timestamp: The transaction can safely read the object. The read timestamp is changed to transaction timestamp.

The following cases are considered when an object has to be written or updated by a transaction:
- Transaction starts prior to read timestamp: This indicates that the object has been read by something. Assuming that the reading transaction has a copy of the data, we don’t write to it for preventing changes from being made to the copy. So we abort and restart the transaction.
- Transaction starts prior to the write timestamp: This means that the object was changed by something at the starting time of our transaction. Here we apply the Thomas write rule, skipping the current operation and then continuing as normal. Aborting and restarting is not required.
- The transaction changes the object and its timestamp is written over by the write timestamp.

Recoverability: In this concurrency control method, recoverable histories are not produced. To make recovery possible, we have to employ a scheduler for keeping a list of transactions having read from. Unless there are only committed transactions in the list, a transaction should not be allowed to commit. Also the data produced by uncommitted transactions can be tagged as dirty and read operations should be banned from using such data as a measure against cascading aborts. The scheduler should not permit transactions to carry out any operations on dirty data so as to maintain a strict history. 


Friday, September 12, 2014

What are some major goals of database concurrency control?

Serial or sequential execution of transactions that access the database have no overlapping of time periods and therefore, no concurrency can be achieved in database systems following such a simple execution mechanism. But at the same time, if you start working out the combinations, if concurrency does exist by allowing interleaving of the operations of the transactions, we risk getting some undesirable results because of the improper control over concurrency. Below we give some examples:
- The dirty read problem: One transaction say A reads a data item which has been already written by another transaction say B which later aborted. This value is a result of an aborted operation and therefore is not valid and should not be read by other transactions. This is called dirty read. As a result of this, the transaction B will produce incorrect results.
- The lost update problem: A transaction B writes a data item for the first time which has already been written by another concurrent transaction A while still in progress. In this case we lose the value written by transaction B to transaction A because of overwriting.  According to the rules of precedence the first value should be read by the transactions first before the second value. As a consequence of this, the transactions yield wrong results.
- The incorrect summary problem: We have one transaction A in execution that considers all the values of the multiple instances of a single data item and we have another transaction B whose operations change the value of some of the instances. Therefore the end result does not reflect the correct summary. This is necessary for correctness.  Also it is possible that certain results might have not been included in the summary depending on the time instances at which the updates were made.

The database systems that require high transactional performance need that there concurrent transaction are executed properly so as to meet the goals. In fact, for modern businesses, a database cannot even be considered which does not meet such goals. Now what are these goals? Let us see below:
- Correctness: For attaining this goal, it is important that the system allows the execution of only the serializable schedules. If there is no serializability, we might face the above listed problems. Serializability can be defined as the equivalency of a schedule to some serial schedule having the same transactions. The transactions must be sequential in nature void of any time overlaps and isolated. The highest isolation level can be obtained only through serializability. In some cases, serializability might be cut down for allowing system to give better performance. It might also be relaxed in distributed environments for satisfying their availability. But this is done only on the condition that there is no compromise with correctness. A practical example is of the transactions involving money. If we relax serializability here, money can be transferred to the wrong account. Serializability is achieved by concurrency control mechanisms by means of the conflict serializability. This is nothing but a special case of serializability.
- Recoverability: This is another major goal which insists that after a crash the database must be able to recover efficiently without losing the effects of the successfully committed transactions. Recoverability ensures that the database system is able to tolerate the faults and might not get corrupted because of the media failure. The responsibility of the protection of the data that system stores in its database is given to the recovery manager. This component works together with the concurrency control component for providing an all time available and reliable access to data. Together these two components keep the database system in a consistent state.

The database systems must ensure that the integrity of the transaction is not affected by any kind of fault. In the modern world, with high frequency of transactions, and large monetary values at stake, any problem or violation of the above goals can be very expensive.


Thursday, September 11, 2014

Database access: What is Multi - version concurrency control?

A common concurrency control method used by the database management systems is the multiversion concurrency control often abbreviated as MVCC or MCC. The method helps in regulating the probelm causing situation of concurrent access to the database. Plus this concurrency control method is used for implementing transactional memory in the programming languages. When a read operation and another write operation is being carried out on the same piece of data item in a database, the database is likely to appear as inconsistent to the user. Using such half – written piece of data further is dangerous and can lead to system failure.
Concurrency control methods provide the simplest solution to this problem. These make sure that no read operation is carried out until the write operation is complete. In this case, the data item is said to have been locked by the write operation. But this traditional method is quite slow, with processes waiting for read access having to wait. So we use a different approach to concurrency control as provided by MVCC. At a given instant of time, a snapshot of the database is shown to each connected user.
Changes made by a write operation won’t be visible to the other database users until it is done completely. In this approach when an item has to be updated, the old data is not overwritten with the new data. Instead the old data is marked as obsolete and the new data is added to the database. Thus, the database has multiple versions of the same data but out of them only one is up to date. With this method the users are able to access the data that they were reading when they began the operation. Now it doesn’t matter whether the data was modified or deleted by some other thread. Though this method avoids the overhead generated by system in filling memory holes, it does require the system to periodically inspect the database and remove the obsolete data.
In case of the document – oriented databases, this method allows optimization of these documents by storing them on to a contiguous memory section of the disk. If any update is made to the document, it is simply rewritten. Thus, having multiple pieces of the documented maintained through links is not required. One can have point in time views in MVCC at a certain consistency. Any read operations carried out under MVCC make use of either transaction ID or a timestamp for determining what state the database is in. this avoids the need for lock management or  reading the transactions by isolating the write operations.
Only the future version of the data is affected by the write operation whereas the transaction ID on which the read operation is being carried out remains consistent because it’s a later transaction ID on which the transaction ID is working. The transactional consistency is achieved by MVCC by means of increasing IDs or timestamps. Another advantage of this concurrency control method is that no transaction has to wait for an object since it maintains many versions of the same data object. Each version is labeled with a timestamp or an ID. But, MVCC fails too at certain points. First of all true snapshot isolation cannot be achieved by MVCC. In some cases read – read anomalies and skew write anomalies also surface. There are two solutions to these anomalies namely:

- Serializable snapshot isolation and
- Precisely serializable snapshot isolation

But these solutions work at the cost of transaction abortion. Some databases which use MVCC are:
- ArangoDB
- Bigdata
- CouchDB
- Cloudant
- HBase
- Altibase
- IBM DB2
- Ingres
- Netezza


Sunday, September 7, 2014

Some tools that can be used in Test Driven development

Here we present a list of tools that can be used for carrying out Test Driven Development (TDD). The test driven development is equivalent to the Big upfront designing. The efficiency and the development speed both are improved in this process. Finding mistakes is quite fast and cheap in TDD. The iterations are short and therefore provide a means for frequent feedback. The test suites are up to date and executable and serve as a complete documentation in every sense. The tests are written before the code. Then the code is refactored to produce high quality code.

Tools for TDD are:
• JUnit: this is the java’s unit – testing framework.
• Some commonly used refactoring tools are RefactorIT, IntelliJ IDEA, Eclipse, Emacs and so on.
• HTTPUnit: This is a black box web testing tool and is used for automated web site testing.
• Latka: A java implemented functional testing tool. This tool can be used with JUnit or Tomcat.
• Cactus: A server – side unit testing tool provides a simple framework for unit testing on server side. It also extends on JUnit. The tool provides three types of unit tests i.e., the code logic unit testing, functional unit testing and the integration unit testing.
• Abbot and Jemmy: Tools for GUI testing. The first one keeps a scripted control on actions based on high level semantics. Jemmy is more powerful and provides full support for swing.
• Xmlunit: This tool lets you draw comparisons between 2 xml files.
• Ant: The tool is based up on java and xml. It is completely platform independent.
• Anteater: An Ant based testing framework. It is used for writing tests that check the functionality of web services and web applications.
• Subversion: This one is a replacement for CVS and controls directories in a better way.
• Jmock: The tool uses mock objects that are dummy implementations of the real functionality and are used for checking the behaviour of the code. The non – trivial code cannot be tested in isolation. Unit tests can be used for everything right from simplification of test structure and cleaning up domain code. You should start from testing one feature at a time and rooting out the problems one by one. Carrying out the testing normally without using any mock objects can be hard. Before carrying out the test, you should decide what needs to be verified. Then you must show that it passes the test. Only after this, the mock objects can be added for representing these concepts.


csUnit
CUnit
Googletest
DUnit
HTMLUnit
NUnit
OUnit
PHPUnit
pyUnit

The above mentioned tools are used for writing unit tests in Test Driven development. However, these unit tests are bound to have errors. Also the unit tests are not apt for finding errors resulting because of interactions between different units. The path to success involves keeping things as simple as possible.
The end result of TDD is highly rewarding. The program design is organic and consists of loosely coupled components. All that you think can go wrong should be tested by proceeding in baby steps. One thing to be taken care of is that the unit tests should run at their 100%. Refactoring is not about changing the behaviour of the code. It is about making improvements to the internal code structure. It should be carried out to improve the quality, maintainability and reliability of the code.


What is the difference between Test Driven development and Acceptance Test Driven development?

Test driven development (TDD) and Acceptance Test Driven Development (ATDD) are related to each other. TDD is a developer’s best tool for helping in creating properly written code units i.e., modules, classes and functions, that carry out the desired function properly. On the other side, ATDD acts as a tool for communication between the developers, testers and the customers to define requirements. Another difference between TDD and ATDD is that the former requires test automation while ATDD does not. But ATDD might require automated regression testing. The tests written for ATDD can be used for deriving tests for TDD. This is so because a part of the requirement is implemented in the code. It is important that the tests must be readable by the customers, but it is not required for TDD tests.
TDD involves writing the test cases and making them fail before the code is written. Next enough code is written to make the tests pass and then refactor it as per the requirements. The tests must pass after it. The primary focus of TDD is on methods and single classes i.e., the low level functionality. This leads to much flexible code. Now what is the need for ATDD? The reason is that the TDD just tells you that your code is fine but it doesn’t tell you why that piece of code is even required. However, in ATDD, in the early stages of the software development process itself the acceptance criteria are stated. In the succeeding stages of development process this criteria is used for guiding the development. ATDD is more like a collaborative activity engaging everyone from developers and testers to business analysts and product owners. It ensures the implementation process is understood by everyone involved in the development process.
There is a third thing called the BDD or the behaviour driven development. This one is quite similar to TDD except that we call tests as specs. The focus of the BDD is on system’s behaviour rather than its implementation details. It also focuses on the interactions taking place in software development. Developers following this approach make use of two languages i.e., the domain language and their native language. TDD consists of unit tests while ATDD uses acceptance tests. Plus, the focus is on high level. BDD is required for making the product more focussed on customer. BDD allows the collaboration of developers with other stakeholders easy. Using TDD techniques and tools requires technical skills because you have to know the details of the object. Non – technical stakeholders would be lost while following the unit tests of TDD. BDD provides a clearer view of the purpose of the system. TDD provides this view to the developers.
The result of the methodology of ATDD is entirely based up on the communication that takes place between the developers, testers and their customers. A number of practices are involved in ATDD such as Behavior driven testing, specification by example, story test – driven development, example – driven development and so on. With these processes, the developers are able to understand the needs of the customers before the actual implementation. The difference between TDD and ATDD arises because of the latter’s emphasis on the collaboration between the three i.e., developers, testers and customers. The acceptance testing is encompassed in ATDD but in one way it is similar to TDD i.e., it insists on writing the tests before coding can be started. These tests provide an external view of the system from the point of view of a user.


Monday, September 1, 2014

What is Commitment ordering method for database concurrency control?

Commitment ordering or CO is a class that consists of techniques for implementing interoperable serializability in concurrency control mechanism of the transaction processing system, database systems and other applications related to database management. With the use of commitment ordering methods we can have non – blocking or optimistic implementations. With the advent of multi – processor CPUs, there has been a tremendous increase in the employment of CO in transactional memory (software transactional memory to be particular) and concurrent programming. In these fields CO is used as a means for having non – blocking serializability.
In a schedule that is CO compliant, there is compatibility between the chronological order of the events and precedence order of respective transactions. Conflict serializability when viewed with a broad meaning is nothing but CO. it is highly effective, offers high performance, reliability, distributable, scalable etc.; with these qualities it is a great way of achieving modular serializability across a heterogeneous database systems collection i.e., the one which contains database systems employing different concurrency control methods. A database system that is not CO compliant is linked to a CO component such as COCO – commitment order coordinator. The purpose of this component is put the commitment events in order so as to make the system CO compliant. This also removes access to data and any interference in the operation of transactions. All this leads to reduction of overhead and we get an appropriate solution for distributed serializability and global serializability. A fundamental part of this solution is the atomic commitment protocol or ACP which is used in breaking the cycles present in the conflict graph. This graph can either be a serializability graph or a precedence graph. If the concurrency control information is not shared among the involved database systems beyond ACP messages or if they don’t have any knowledge about the transactions, then for achieving global serializabiolity, CO becomes the absolutely necessary condition.
Another advantage of CO is that its local concurrency information distribution is not costly. This information includes Timestamps, tickets, relations, locks and the local precedence relations etc. it makes use of SS2PL property. SS2PL used with 2PC (two – phase commit protocol) becomes the de facto standard through which global serializability can be achieved. This also creates a transparent process through which the other CO compliant systems can join such global solutions. When a multi – database environment is based upon commitment ordering, the global deadlocks can be resolved automatically without requiring human intervention. This is an important benefit of having CO compliant systems. There is another concept where we intersect CO and strictness called as the strict commitment ordering or SCO. This results in a better overall throughput, shorter execution times for transactions and thus better performance when compared to the traditional SS2PL. The positive impact of having SCO can be felt during lock contention. The same database recovery mechanism can be used by both SCO and SS2PL by virtue of the strictness property. Today we have two major variants of CO namely:
- CO – MVCO and
- CO – ECO
The first one is the multi - version and the second one is called the extended version. Any concurrency control method that is relevant can be combined with these two for employing non – blocking implementations. Both make use of additional information for making relaxations to the constraints and for better performance. A technique is used by CO and variants called the Vote Ordering or VO – a container schedule set. In case of absence of concurrency control information sharing, global serializability can be guaranteed only if there is local VO. The inter – operation of CO and variants is quite transparent which makes automatic deadlock resolution possible also in the heterogeneous environments. 


Facebook activity