Subscribe by Email

Monday, September 1, 2014

What is Commitment ordering method for database concurrency control?

Commitment ordering or CO is a class that consists of techniques for implementing interoperable serializability in concurrency control mechanism of the transaction processing system, database systems and other applications related to database management. With the use of commitment ordering methods we can have non – blocking or optimistic implementations. With the advent of multi – processor CPUs, there has been a tremendous increase in the employment of CO in transactional memory (software transactional memory to be particular) and concurrent programming. In these fields CO is used as a means for having non – blocking serializability.
In a schedule that is CO compliant, there is compatibility between the chronological order of the events and precedence order of respective transactions. Conflict serializability when viewed with a broad meaning is nothing but CO. it is highly effective, offers high performance, reliability, distributable, scalable etc.; with these qualities it is a great way of achieving modular serializability across a heterogeneous database systems collection i.e., the one which contains database systems employing different concurrency control methods. A database system that is not CO compliant is linked to a CO component such as COCO – commitment order coordinator. The purpose of this component is put the commitment events in order so as to make the system CO compliant. This also removes access to data and any interference in the operation of transactions. All this leads to reduction of overhead and we get an appropriate solution for distributed serializability and global serializability. A fundamental part of this solution is the atomic commitment protocol or ACP which is used in breaking the cycles present in the conflict graph. This graph can either be a serializability graph or a precedence graph. If the concurrency control information is not shared among the involved database systems beyond ACP messages or if they don’t have any knowledge about the transactions, then for achieving global serializabiolity, CO becomes the absolutely necessary condition.
Another advantage of CO is that its local concurrency information distribution is not costly. This information includes Timestamps, tickets, relations, locks and the local precedence relations etc. it makes use of SS2PL property. SS2PL used with 2PC (two – phase commit protocol) becomes the de facto standard through which global serializability can be achieved. This also creates a transparent process through which the other CO compliant systems can join such global solutions. When a multi – database environment is based upon commitment ordering, the global deadlocks can be resolved automatically without requiring human intervention. This is an important benefit of having CO compliant systems. There is another concept where we intersect CO and strictness called as the strict commitment ordering or SCO. This results in a better overall throughput, shorter execution times for transactions and thus better performance when compared to the traditional SS2PL. The positive impact of having SCO can be felt during lock contention. The same database recovery mechanism can be used by both SCO and SS2PL by virtue of the strictness property. Today we have two major variants of CO namely:
- CO – MVCO and
- CO – ECO
The first one is the multi - version and the second one is called the extended version. Any concurrency control method that is relevant can be combined with these two for employing non – blocking implementations. Both make use of additional information for making relaxations to the constraints and for better performance. A technique is used by CO and variants called the Vote Ordering or VO – a container schedule set. In case of absence of concurrency control information sharing, global serializability can be guaranteed only if there is local VO. The inter – operation of CO and variants is quite transparent which makes automatic deadlock resolution possible also in the heterogeneous environments. 

Wednesday, August 27, 2014

What is Strong strict Two-Phase locking?

The strong strict two – phase locking is the life – saver concept of a database system. We might call it as rigorous scheduling, rigorous two – phase locking or rigorousness. In short it is written as SS2PL. To comply with this protocol, both the read (S) locks and the write (x) locks are released by the locking protocol that has been made by a transaction. But the locks are released only after the complete execution of the transaction or if the transaction aborts midway. Also this protocol follows with the S2PL rules. A transaction that obeys this protocol is said to be in phase – 1 and will continue to be in the same phase till it completes its execution. There is no degenerate phase – 2 in such transactions. Thus, we have only one phase but still we say two – phase because of the fact that the concept has derived from 2PL which is its super class.
A schedule’s SS2PL property is also called as rigorousness. The same name is also used for the schedule class exhibiting this property. And so an SS2PL schedule is often characterized as a rigorous schedule. People mostly prefer to use this term since it does not follow the legacy of using ‘two phase’ (unnecessarily) but it also independent of the locking protocols. The mechanism used by this property is known as rigorous 2PL. The S2PL’s special case is SS2PL which means that it is a proper sub – class of S2PL. Most of the database systems use SS2PL as their concurrency control protocol. This protocol is in wide use since the early days of databases in 1970s. It is a popular choice with many database developers because apart from providing serializability, it also imposes strictness which is nothing but a type of cascadeless recoverability.
Strictness is very much important for efficient recovery of the database in event of failure. For a database system to participate in a distributed environment, committment ordering or CO is needed which in turn comes from strictness. Global serializability and serializability solutions based upon CO are implemented. An implementation of  distributed SS2Pl that does not depends on DLM or distributed lock manager is a subset of commitment ordering method. There is no problem with distributed deadlocks as they are resolved automatically.
Global serializability can be ensured by employing SS2PL for the multi–database systems. Though this fact was known way too long before the arrival of the CO concept, it is with this concept that we are able to understand the atomic commitment protocol’s role in the maintenance of this serializability and resolution of the deadlocks. The fact that the SS2PL has properties inherited from CO and recoverability is more significant than the fact that it is a subset of 2PL. 2PL just has the primitive serializability mechanism and therefore is not capable of implementing SS2PL with other qualities. S2PL i.e., strictness combined with 2PL is not of much practical use. Contrary to S2PL, SS2PL provides the properties of commitment ordering also.
Today we have a number of variants of SS2PL, each having different semantics and used under different conditions. Multiple granularity locking is one such popular variant. Any two schedules  which are either incomparable or one among them contains the other one, have common schedules. Locks are the main culprits for causing blocks between the transactions. This mutual blocking leads to deadlocks – a condition where the execution of the transactions seems to go nowhere. In order to release the trapped resources the deadlocks need to be resolved. A deadlock occurs if we get a cycle in the precedence graph.

Monday, August 25, 2014

What is index concurrency control?

In this article we will discuss about the index concurrency control method for controlling database concurrency. Index as we know is a data structure that is used for easy navigation through the user data in a database. Index data should not be confused with user data. The difference between the two is that the former primarily consists of pointers.
Indexes have to be updated if any changes including delete, insert or modify are made to the database files so that user data can be accurately accessed. The index integrity is maintained by the means of a technique called the index locking. While a database transaction is taking place, lock is placed on a certain portion of the index. This is the portion that the transaction accesses in turn to access user data related to it. On top of this, for modifying and maintaining an index, special database system transactions are called. This is done by the system as a part of its self – maintenance routine. When a transaction locks a part of the index, the access to this portion is blocked to the other transactions. Thus, the other transactions cannot read or modify that portion. Only read operations can be performed if the lock is a shared one.
Indexes can be accessed using the techniques specializing in concurrency control. These techniques perform based upon the structure and type of the index. These techniques when applied to indexes are more effective when compared to application on user data. For B – trees we have specialized techniques that are effective in B – Tree concurrency control. For maintaining coordination between the threads that want to access the same indexes, index locks are used. The duration of the index locks is less than the duration of the usual transaction locks. Sometimes these index locks are also known as latches. Real time database systems are the ones that are most dependent on indexes for speeding up the access to data.
Index concurrency control also helps these systems in completing as many transactions as possible before deadline. For the prevention of the index contention so that it does not become a problem, we have special protocols called the high performance ICC (index concurrency control) protocols. By means of a detailed simulation model, real time variants of ICC protocols for B – Tree can be created and also their performances can be compared. GUARD – link is an ICC protocol for real time systems which can be both evaluated as well as presented. The classical B – link protocol is augmented with the admission control mechanism based upon feedback using this protocol. The ICC protocols are evaluated using certain performance metrics which are the percentage of the missed transactions. Sometimes the metric might be the percentage of the fairness w.r.t. type and size of the transaction.
According to some experiements, there is a difference between the real time ICC protocols’ performance characteristics and the performance characteristics of these same protocols in general purpose database systems. A thing to be noted about B link protocols is that they perform best when implemented in the conventional database systems, whereas in real time systems their performance is very poor since the load is too heavy. This GUARD – link protocol provides an all – new approach even though it has been developed on the grounds of B – link approach.  It has been found after an experiment that this is the protocol that gives best performance under all conditions be it less or heavy real time workload. This is all because of its admission control mechanism. 

Friday, August 22, 2014

What are some of the problems of Test Driven development?

The Test driven development (TDD) does have a lot of benefits, but you would expect some issues, and there are some problems too. Regression testing, though sufficient and good enough for most cases of software development, there are sometimes situations where testing requires full functional testing. In these situations, partial functional testing cannot be used for determining success or failure since a large number of unit tests are used. The types of software facing this problem includes user interfaces, programs using databases and certain network configurations. This is so because of the feature that is typically an advantage of TDD, is a disadvantage too in this situation. This feature is that TDD encourages programmers to write modules with minimum amount of code with maximum logic that can be tested using mock – ups and fakes. The outside world of the module is also represented using these mock – ups. Another shortcoming of TDD is that it cannot do without management support. It is essential for its efficient existence. If the entire organization does not believes that TDD will improve their product, it may seem to management that it is wasting time in writing the tests. Also in TDD, the programmer who is writing the code is responsible for writing the units tests which means – a number of blind spots in the code will go untested. It is always a good practice to have unit tests written by someone else. Certain things to be checked might not come to the notice of the developer leaving the code partially tested.
Take another example where the requirements of a module are misinterpreted by the developer. In this case both the code as well as the unit tests that he will write will be wrong. So the even though incorrect, the tests will pass and the developer will assume that his code is correct. Also if these unit tests are many in number, they will pass giving a wrong sense of security which might result in less testing of the software i.e., compliance testing, integration testing etc. writing and executing tests adds to the maintenance overhead of the software. Tests with bad code such as error strings or code which is prone to failure are quite expensive when it comes to maintenance.
The same is the case of fragile tests. They involve risk that tests likely to generate failures regularly will be neglected. Therefore, when a failure occurs in actual, it won’t be detected. Though a bit difficult, it is possible to write code that requires low maintenance. This can be done by reusing the error strings wherever possible. Code refactoring should focus on code reusability. Writing and executing a lot of tests takes a lot of time too. However, some flexible modules having few tests might incorporate new requirements without having to make any changes in the tests. It is because of this reason that it is easy to adjust with few number of tests and little data than with big and complex tests. With advanced factor analysis and sampling skills, developers can be warned of excessive – testing.
The repeated TDD cycles achieve a level of testing and coverage that cannot be recreated later. Therefore, this original coverage is very much essential and has to be fixed early. On the other hand if the testing strategy is poor, changes must be fixed individually. And this should be done carefully as just deleting or disabling them can cause holes in the coverage. TDD insists on writing tests for every part of your code which instead of making it better, might make it contorted and damage it. 

Tuesday, August 19, 2014

How does Test Driven development benefit developers?

Test driven development (TDD) as a process has proven to be a boon for developers time and again. Businesses tend to change rapidly with the time and so does their requirements from the software that they are using. If we develop these software products using the traditional development methodologies, it is obvious that later it will be more difficult for us to maintain them as requirement changes. If you suggest making changes to the existing model developed using traditional method, it might create havoc in unpredictable ways. As a result of these consequences, organizations often don’t go for modification of the existing software as it might hamper their productivity and effectiveness. But this is no problem for companies who have developed their products using TDD. This is so because it is like a continuous integration model in which testing modules are added. Then it becomes easy for the organizations to make modifications to their products, without fearing any breakdown.
First benefit of TDD to developers is that it is easy for them to maintain the software by virtue of its extensibility and flexibility. Since both testing and development go hand in hand in TDD at the lowest level, it guarantees testing of every single logical piece and can even be changed. Once the development is done, the application is tested once again with thousands of tests. After making a change to the application, associated tests are run to see if there is any impact on the other parts of the application. With this approach there occurs no problem in modifying the existing legacy applications. Apart from benefiting developers, this has benefits for organizations seeking growth by making it easier for them to update their systems.
The codebase is streamlined along with test coverage which is unparallel. In TDD, writing a test first before writing code is mandatory. That is how it provides unparalleled coverage. Further, regression testing and refactoring make the code as minimal and economical as possible. This plays a big role in streamlining the codebase. If for a functionality there is no use case, the test is not written and no code and thus there is no growth in the codebase. This is also another reason behind easy maintenance.
TDD provides a clean interface throughout the development process. Since the tests are written first, the APIs thus produced are from the perspective of an API – user. That’s why it is very easy to use these APIs when compared to the APIs written by programmer’s perspective.
The code refactoring process is central to the success of TDD and makes sure that the codebase is strengthened preventing the software from getting outdated and monolithic.

TDD aims at improving the code and is particularly useful in the following:
> Addition of new feature or functionality: With TDD the programmer feels confident in changing part of a large application (even otherwise the programmer might feel confident but this feeling may not be shared by other stakeholders). If it wasn’t flexible, then we would be adding functionality to the application but without any proper integration. This would have definitely caused many problems.
> Changes in the technical infrastructure: Developers are always thinking of making technical changes to the code for increasing its elegance and make it more extensible.
The use cases used in TDD are actually tests which can be used by the other developers as examples of how the code works and how it can be used. Thus TDD provides executable documentation. Updating the software without any painstaking efforts has been possible because of TDD.
An organization can be successful only if it embraces the changes and makes improvements. Test driven development makes all this possible with its extensability, maintainability and flexibility. 

Saturday, August 16, 2014

Test Driven Development - Some benefits

According to a study, test driven development involves writing more tests and thus makes programmers more productive. The hypotheses produced from the study were inconclusive regarding code quality and relation between productivity and TDD. Programmers that use the pure test driven development on projects said that they rarely felt like invoking the debugger. Version control system and TDD when used together, makes it easy to revert the code when it fails unexpectedly to the previous form that passed all the tests. Thus debugging here proves to be less productive than reverting.
TDD has more to offer other than just correctness validation. It can also be used for driving the designing process. Since the focus is shifted to the test cases, the programmer can see how the client will be using the functionality. As a result of this programmer’s concern with interface increases. This benefit of TDD is just the opposite of design by contract. It is so because we approach through test cases rather than using preconceptions and mathematical assertions.
With TDD you get the ability to proceed with programming in small steps with as much as you are comfortable with. Thus you are able to concentrate at the present task where your goal is to make the code pass the test. Initially in TDD, we are concerned with error and exception handling. There is a separate process for creation and implementation of these extraneous situations. This way it is ensured that at least one test covers each part of the code. The programmer’s and users’ confidence in the software is boosted.
Even though it is obvious that programming with TDD requires more code than with other techniques, according to a model by Müller and Padberg, implementation time in TDD is shorter. The defects are caught early because of the frequent testing in the development cycle. The errors are thus prevented from turning into expensive and endemic bugs. It also reduces the time period of debugging phase. The code produced with test driven development is more extensible, flexible and modular. This is by virtue of TDD’s methodology of forcing the programmers think in the terms of smaller code units. The benefit of all this is that you  get a cleaner, focused but loosely coupled code.
The modularization of the code to some extent is also affected by the design pattern of the mock objects used. This is so because the pattern requires writing the modules so as to make it easy to switch between versions for deployment and testing reasons. Every code path is covered by the automated tests unless no more code is required for the code to pass the test. For example, if the programmer wants to add an else branch to an if statement, then his first requirement is to write a code that causes that branching. That is why all the tests produced in TDD are quite thorough and even the most unexpected of the changes is detected.
It has been proved experimentally that TDD approach is superior to the tradition test – last approach when it comes to lower CBO (coupling between objects). The experiment also showed that TDD results in better modularization, easier testing and reuse of the already developed products. The effect of the TDD approach on generation of unit tests was measured using the MSI (mutation score indicator) and the BC (branch coverage). These indicate the effectiveness and thoroughness of the unit tests. A medium effect is represented by a mean effect size based up on the meta – analysis of the experiment that was performed. For branch coverage the effect size was medium and therefore taken to be substantive. 

Tuesday, August 12, 2014

Test Driven Development - The Process

Continuing from the previous post on the basics of TDD (link), this post continues with the topic of Test Driven Development.

What is the process of Test Driven development?
For the test driven development to be implemented on some software artifact, its units should be kept small. By units here we mean a group or class of functions related to each other. Sometimes these units might also be referred to as the modules. These are a couple of the benefits of using small units:
> The debugging effort is reduced – Upon detection of test failures; it becomes easy to track down the faults when you have smaller units.
> Tests are often self – documenting: Readability and understandability is increased by virtue of small tests.

TDD can be converted into ATDD (acceptance test driven development) by mixing TDD with more advanced practices. The criteria that the customer specifies are converted into acceptance tests which are then used for driving the UTDD (unit TDD) process in traditional manner. With this process it is ensured that there is an automated mechanism which can be used by the customers for deciding whether their requirements have been met or not. The ATDD provides the development team with a fixed target i.e., the acceptance tests which keeps them steadily focused on the requirements of the customers. Now let us examine the TDD cycle. It's test driven development cycle consists of the following phases:

1. Adding a test: The beginning of a new feature is marked by writing a test which must fail because of being written before the implementation of the feature. If this test succeeds then either the test is defective or the feature has already been added to the software. Before writing a test, the requirements and specifications must be fully understood by the developer by means of user stories and use case stories. This step puts the focus of the developer on the requirements before he/ she begins writing the code that makes a subtle yet importance difference.
2. Running the tests and checking if they fail: This step does the step of validating the correct working of the test harness. The test itself is also tested in this process ruling out the possibility that the new test will pass always. This results in increase in confidence in software.
3. Allowing the test to pass by writing some code: the code written at this step is not perfect as proved by later tests but it is eventually improved. The code is written only to pass the test.
4. Running the tests: If all the tests pass, the programmer can be sure of the code that the requirements are being met.
5. Refactoring the code: Now the code has to be refactored as required. This also involves placing the code in its logical place and removing any redundant code. It has to be made sure that the function and variable names represent properly their current values. Any misinterpreted constructs should be clarified. After this the code should be re – run to be sure that the refactored code has not changed the other functionalities.
6. Repeat: Another test is taken to test the next functionality. The steps should be kept small. If the new code does not satisfy this test or if it fails, the changes made should be undone instead of excessive debugging. Maintaining revertible checkpoints becomes easy if continuous integration is used. If external libraries are being used, then it is necessary that the size of the increments should be as small as the library itself unless the library is not sufficient or it has bugs. Between each test run, there can be maximum 10 edits. This cycle goes on until all the functionalities have been tested.

Facebook activity