Subscribe by Email

Thursday, September 25, 2014

Which languages support logic programming?

In this article we discuss about the languages supporting the logic programming. There are many such languages and we divide them in to the following two categories:
- Prolog programming language family
- Functional logic programming languages

First we shall discuss some languages falling under the first category:
1. Algebraic logic functional programming language or ALF: Combines the techniques of both the logic programming techniques and the functional programming techniques. This language is based on the horn clause logic. The resolution rules for solving the literals and evaluating the functions form the foundation for the operational semantics of this language. It follows a left – most inner – most narrowing strategy for reducing the number of steps in solving a problem. Thus, these operational semantics are much more efficient and powerful than those produced by the resolution strategy of prolog.
2. Alice ML: This programming language was designed at the Saarland University and is actually a standard ML dialect. But it has the support for concurrency (this includes distributed computing as well as multithreading), lazy evaluation, constraint programming etc. this language uses a relation concept called “promise” according to which a future value provided by one thread will be computed to another thread. Thus data flow synchronization is possible in Alice ML by use of future typed variables and promise concept.
3. Ciao
4. Curry
5. Leda: It stands for ‘library of efficient data types and algorithms’ and is a multi – paradigm – programming language. The language is used for mixing the features of logic based, object based, functional and imperative programming in to one.
6. Mercury
7. Metal
8. Mozart
9. Oz: This one is another multi – paradigm programming language for the purpose of programming language education. It was developed in the year of 1991 at the Swedish institute of computer science. A primary example of Oz implementation is the Mozart Programming system. The system comes with an open source language and supports many platforms including Microsoft windows, mac os x, Linux, Unix etc.
10. Visual prolog

Now we shall see the second category of languages i.e., the functional logic programming languages:
1. B – prolog: This is the high – level implementation of the prolog and has some additional features such as the event handling rules, matching clauses, arrays, declarative loops, tabling, constraint solving etc. it was developed in 1994 and now is a popular CLP system. Though B – prolog is a commercial product, it comes free for research purposes. A clause in which the input/ output unifications and the determinacy are denoted in an explicit way is called a matching clause.
The matching clauses are translated in to the respective trees by the compiler and indexes are generated. This compilation is easier when compared to the compilation of the normal clauses in prolog. B – prolog overcomes the absence of active sub goals programming facility in prolog by introducing the action rules or AR which is a powerful but simple language for serving this purpose. The sub –goals are called agents. Activation of an agent is followed by the execution of an action. The CHIP system heavily affected the finite domain solver of B – prolog.  For creating arrays, a built – in is provided by B – prolog called new_array(X, Dims) where x stands for uninitialized variable and Dims for positive integers for specifying the array dimensions.
2. Eclipse
3. GNU prolog
4. Jprolog
5. KL0 and KL1
6. Logtalk
7. Objlog: This one is a frame based language that combines two things: Prolog II and object from CNRS.
8. Prolog
9. Prolog++
10. Strawberry prolog
11. Tuprolog
12. Visual prolog
13. YAP

What is a programming paradigm?

The fundamental style in which you design and write computer programs is called a programming paradigm. It’s the way you build the elements and structures of your programs. Presently we have the following 6 kinds of programming paradigms:
- Imperative
- Declarative
- Functional
- Object – oriented
- Logic
- Symbolic

Just as the different methodologies define the software engineering, different paradigms define the programming languages. Different languages are designed for supporting different paradigms. For example,
- Object oriented paradigm is adopted by smalltalk
- Functional programming is adopted by Haskell

Though there are certain programming languages that support only one kind of paradigm, there are many others that can work with multiple paradigms. Examples are:
- Object pascal
- C++
- Java
- C#
- Scala
- Visual basic
- Common lisp
- Scheme
- Perl
- Python
- Ruby
- Oz
- F#

The pascal and C++ programs can either be purely object oriented or purely procedural or may contain features of both. It is up to the programmers, how they want to use the paradigms. In object oriented programming, the program is considered as a collection of objects that can interact with each other. Whereas, in functional programming the program is considered as a sequence of the evaluations of stateless functions.
The process – oriented programming helps in designing the programs as a set of processes running concurrently in systems with multiple processors. The shared data structures are used by these processes. Some techniques are allowed by programming paradigms while others are forbidden. For example, the use of side effects is forbidden while using pure functional programming. The use of the dangerous goto statement is not allowed in structured programming. Because of this the modern programming paradigms are considered to be overly strict when compared to their older counterparts.
Proving the theorems can be easy if you avoid the use of certain techniques or if you understand the programs’ behavior. Sometimes programming models are compared with the paradigms. The abstractions of the systems are called systems. An example is the von Neumann model which is used in the sequential computers. There are a number of models for computers using parallel processing.
 Most of the models are based upon message passing, shared memory, hybrid etc. machine code and the assembly language instructions are the programming paradigms of the lowest level. The assembly language makes use of the mnemonics for operations. They are often referred to as the first generation languages. The assembly language is still in use for programming the embedded systems for obtaining direct control over the machine. The next generation of the languages is represented by the procedural languages. Examples are cobol, fortran, algol, PL/I, basic, c etc. procedural paradigm is adopted by all these languages.
The experience and ability of the programmer affects the efficiency and the efficacy of the problem’s solution. Later, the object oriented languages came in to the scenario such as the smalltalk, simula, java, Eiffel, C++ etc. These languages use objects (data and functions or methods for handling it) for modeling the real world problems. The object’s methods are the only way through which the user can access the data. Object oriented paradigm has led to the possibilities of creation of the object – oriented assembler language. For example, the HLA (high level assembly) is such a language that provides support for the advanced data types. In declarative programming paradigms, the problem is told to the computer but not the method. The program thus consists of properties that can be used for finding the result that is expected. The program is not a procedure. 

Wednesday, September 24, 2014

What is a reasoning system?

In the field of information technology, a software application which is used for the purpose of drawing conclusions from the knowledge that is available is called a reasoning system. The reasoning systems work on the principles of logical induction, deduction and some other reasoning techniques. These systems fall under the category of more sophisticated systems called the intelligent systems. Reasoning systems have a very important role to play in the fields of artificial intelligence and knowledge engineering.
In these systems, the knowledge that has been acquired already is manipulated for generating new knowledge. By knowledge here we mean symbolical representation of the propositional statements and facts based on assumptions, beliefs and assertions. Sometimes knowledge representations that are being used might be connectionist or sub – symbolic. An example of this is a trained neural net. The process of inferring and deriving new knowledge via logic is automated by means of the reasoning systems. The reasoning systems provide support for the procedural attachments for application of knowledge in a situation or a domain. Reasoning systems are used in a wide range of fields:
- Scheduling
- Business rule processing
- Problem solving
- Complex event processing
- Intrusion detection
- Predictive analysis
- Robotics
- Computer vision
- Natural language processing
As we mentioned above, logic is used by reasoning systems for generating knowledge. However there is a lot of difference and variation in usage of different systems of logic. This is also affected by formality. Symbolic logic and propositional logic is used by majority of the reasoning systems. The variations or the differences demonstrated are usually the FOL (formal logic systems) representations, their hybrid versions, extensions etc.  that are mathematically very precise.
There are other additional logic types such as the temporal, modal, deontic logics etc. that might be implemented explicitly by the reasoning systems. But, we also have some reasoning systems for the implementation of the semi – formal and imprecise approximations to the logic systems that are already recognized. A number of semi – declarative and procedural techniques are supported by these systems for modeling various reasoning strategies.
The emphasis of the reasoning systems is over pragmatism rather than formality. It also depends on attachments and other custom extensions for solving the real world problems. Other reasoning systems make use of the deductive reasoning for drawing inferences. Both the backward reasoning and forward reasoning are supported by the inference engines in order to draw conclusions through modus ponens. These reasoning methods are recursive and are called as backward chaining and forward chaining respectively.
Though majority systems use deductive inference, a small portion also uses inductive, abductive and defeasible reasoning methods. For finding acceptable solutions for intractable problems, heuristics might also be used. OWA (open world assumption) and CWA (closed world assumption) are used by reasoning systems. The first one is related with the semantic web and the ontological knowledge representation. Different systems have different approaches towards negation.
Apart from the bitwise and logical complement, other existing forms of negation both strong and weak (such as inflationary negation, negation – as - failure) are also supported by the reasoning systems. There are two types of reasoning that are used by reasoning systems namely monotonic reasoning and non – monotonic reasoning. There are many reasoning systems that are capable of reasoning under uncertainty. This is very useful particularly in situated reasoning agents that are used for dealing with the world’s uncertain representations. Some common approaches include:
- Probabilistic methods: Demster – Shafer theory, Bayesian inference, fuzzy logic etc.
- Certainty factors
- Connectionist approaches
Types of reasoning system are:
- Constraint solvers
- Theorem provers
- Logic programs
- Expert systems
- Rule engines

Tuesday, September 23, 2014

What is concurrency control with respect to databases?

Concurrency control is a technique applied in the fields of operating systems, databases, computer programming, and multi processing etc. for ensuring that concurrent operations produce the correct results in as less time as possible. Both the hardware and the software parts of the computer are made up of smaller modules and components where each of them is designed and programmed respectively to work correctly according to some consistency rules. There is concurrent interaction between these components through messages, shared data, which can lead to a result which is in violation of those rules.
The basic idea of concurrency control is to provide methodologies, theories and rules for enforcing consistency in the whole system. Implementation of concurrency control reduces the performance because we apply some constraints on the components which does have the effect of reducing the overall speed. However one thing that should be taken care of is to achieve consistency with as much efficiency as possible and without reducing the performance below minimum levels.
The drawbacks of concurrency control include additional complexity and generation of more overhead in using a concurrent algorithm. The concurrent algorithms generate more overhead when compared to their sequential algorithm counterparts. If the concurrency control mechanism fails, it can lead to torn read and write operations and can corrupt the data.
In this article we talk about concurrency with respect to the databases. Concurrency control is implemented in DBMS, distributed applications etc. for ensuring that the concurrent data transactions are accomplished without causing damage to the data integrity. Distributed applications include cloud computing and grid computing. Concurrency control is also used in some other transactional objects.
It is an essential part of the systems where two or more transactions overlap over the same time instant and can operate on the same data. This happens in almost any general purpose database management system.
Since the advent of database systems, research has been going on related to this concept. Serializability theory is the best established theory that helps define the concept of concurrency control.  This theory also lets us in designing as well as analyzing the concurrency control methods as well as mechanism as effectively as possible.  There is another theory that does not emphasize upon concurrency control over the abstract data types but rather over atomic transactions. However this theory though having a wider scope and more refined, it adds more complexity to the system. Both the theories have their advantages and disadvantages. Merging these two theories might help because they are complementary to some extent.
For ensuring proper concurrency control and correct execution of the transactions, only the serializable transactions and schedules are generated by the system and executed. In some cases, the serializability might be relaxed intentionally by the system for increasing the performance. But, this is done only in those cases where it won’t generate incorrect output.
There are many cases when the transactions fail. Here the system needs to have a recoverability property for recovering from the damage. A good database system also ensures that the result of the transactions that have committed is not lost if the system is switched off accidentally or crashes. On the other hand it also ensures that the incomplete results of the aborted transactions are erased and the actions are rolled back. The ACID rules (mentioned below) characterize the transactions:
- Atomicity: Each thread consists of a single transaction.
- Consistency: This characteristic depends largely on user.
- Isolation: Every transaction should be executed in isolation i.e., should not interfere with others.
 - Durability: The results of the committed actions should persist.
 Nowadays as, database systems are becoming more distributed, the focus is more upon the distribution of the concurrency control mechanism.  

Saturday, September 20, 2014

What are some concurrency control mechanisms (with respect to databases)?

In this article we discuss about different types of concurrency control mechanisms that are implemented in databases.
- Optimistic: This type of concurrency control method delays transaction checking. It does not check the integrity and isolation rules such as the recoverability, serializability etc. of the transaction until it has completed executing all of its associated operations. Also, any of the operations of the transaction are not blocked. If, upon commitment the transaction’s operations are violating these rules, the transaction is aborted. Immediately after being aborted, this transaction is executed again though this generates a restart and re – execution overhead. This mechanism is good to follow if there not too many abortions.
- Pessimistic: The concurrency control methods falling under this category block a transaction from carrying out any transaction if it is suspected to violate the rules. The transaction is not allowed to execute until the probability of rule violation becomes zero. However such prolonged blocking of the transactions reduces the performance drastically.
- Semi – optimistic: These concurrency control mechanisms consider blocking the transaction only in some situations where it is important to do so. Transactions are not blocked when rules are being checked as in optimistic concurrency control mechanisms.

The performance of these different types of concurrency control mechanisms is different. By this we mean they all have different throughputs (rate of transaction completion). This depends on various factors such as the level of parallelism, transaction types mix etc. The trade – offs between the various categories should be considered and the one providing highest performance in the particular situation should be chosen. Two transactions mutually locking each other result in a deadlock. In such a situation the involved transactions go on waiting forever and are not able to complete. The concurrency control mechanisms that are non – optimistic are observed to have more deadlocks. The transactions have to be aborted for resolving the deadlocks. All this deadlocks, blocking, resolving introduces delays in performance and these are the major trade – off factors between the types. Below we mention some major concurrency control methods which many variants falling under the above mentioned categories:
- Locking: Its variants include the two – phase locking (2PL). This mechanism facilitates the access to the locks on data acquired by the transactions. If a transaction tries to acquire a lock over a piece of data already locked on by another transaction, it is blocked till the latter transaction releases its lock. This however depends on the type of access operation and type of lock.
- Serialization graph checking or precedence graph checking: This mechanism checks out for any cycles in the graph of the schedule and if found breaks them by aborting the involved transactions.
- Timestamp ordering: The timestamps are assigned to the transactions and access to data is kept under control by constant checking and time stamping.
- Commitment ordering: The transactions are checked in the order of their commitment so as to maintain their compatibility with their precedence order.

There are some other concurrency control methods are used along with the above mentioned types:
- Index concurrency control: The access operations are synchronized with the indexes instead of synchronizing with the user data. Performance can be gained using specialized methods.
- Multi – version concurrency control or MVCC: Each time an object is written, it generates a new copy of that object so that the other transactions can still read the object. This increases concurrency without compromising with the performance.
- Private workspace model: A private workspace is maintained by each transaction for accessing the data. Any changes made to the data become visible to the outside transactions after the transaction commits. 

Friday, September 19, 2014

Best practices for concurrency control with respect to databases?

Today almost all the service – oriented businesses have grown highly dependent on reliable and speedy access to their data. Most of the global enterprises need this access to databases on a 24x7 level without interruptions. These reliability, availability and performance needs of the organizations are met by database management systems (DBMS). Thus, a DBMS is responsible for two things. Firstly, for protecting data that it stores and for providing correct, reliable and all – time access to this data. The concurrency control and recovery mechanisms of DBMS are responsible for carrying out these functions properly.
Concurrency control mechanism ensures that you get to see the execution of only your transaction even though 100s of users are accessing the database at the same time. The recovery mechanism ensures that the database is able to recover from all the faults. It is because of the existence of these functionalities that the programmers feel free to add new parts to the system without having much to worry about. A transaction is nothing but a unit of work which consists of several operations and updates. Every transaction is expected to obey the ACID rules. In this article we discuss about some best practices for concurrency control.
- Two phase locking: Locking is perhaps the most widely used technique for maintaining control over concurrency matters. This mechanism provides two types of locks namely, the shared lock (S), and the exclusive lock (X). The compatibility matrix defines the compatibility of these two locks. According to the compatibility matrix, S locks can be held by two different transactions at the same time but this is not possible in the case of X locks for the same data item. With this policy multiple read operations can be carried out concurrently. In other words read access to an item is protected by the S locks. On the other side, the write access is protected by the exclusive locks. In simple words, no other transaction can obtain a lock on a data item which has already been locked by another transaction if the two locks are conflicting. A transaction requesting for a lock at an instant when it cannot be granted, it is blocked by the mechanism until the other transaction releases its lock.
- Hierarchical locking: Practically, the notion of the conflicting locks works at different levels of granularity. Deciding for proper granularity for locking an item generates a locking overhead and might interfere with concurrency. Locking at the granularity of a tuple (one row) allows the system to keep concurrency at the maximum level. The disadvantage of this locking mechanism is that for a transaction to access multiple tuples, it needs to lock all those tuples. This will require issuing same number of calls to the lock manager generating a substantial overhead. This can be avoided by considering a coarser granularity but at the expense of more false conflicts.

The two phase locking is categorized under the pessimistic technique as it assumes that there will be interference among the transactions and takes measures against the same. Optimistic concurrency control provides an alternative to this. With this, the transactions can carry out the operations without having to acquire locks. For ensuring that there is no violation of serializability, a validation phase is performed before the transactions can commit. A number of optimistic protocols have been proposed. The validation process makes sure that the read and write operations of two transactions running concurrently do not conflict. If such a conflict is determined during validation, the transaction is immediately aborted and forced to restart. Thus, for ensuring isolation, optimistic mechanisms rely on restarting the transaction whereas the locking policies use blocking strategy.

Wednesday, September 17, 2014

What are some challenges with respect to database concurrency control?

During the sequential execution of the transactions, the execution time periods do not overlap each other and therefore the transaction concurrency does not exist. But if we allow interleaving of the transactions in a manner that is not controlled properly, we are bound to get undesirable results. There are many situations in which concurrency of a database might be harmed.
In this article we discuss such challenges. The transaction models based upon the ACID rules have proved to be quite durable in their course of time. They serve as a foundation for the present database and transaction systems. Many types of environments, parallel as well as distributed have used this basic model for implementing their own complex database systems. These environments however require some additional techniques such as the two - phase commit protocol. Though this model has been a great success, it suffers from few limitations.
The model lacks flexibility and thus is not able to model some particular kinds of interactions between the organization and the complex systems. Also in collaborative environments, a piece of data cannot be strictly isolated even if it is desirable. Also since the ACID transaction model suits well for the systems with short and simple transactions, it is not so appropriate for the workflow management systems.
Such systems require rich transaction models with multi – level notion. For other environments for e.g., the mobile wireless networks also this model does not suffice. In such environments expectations of having large disconnection period are higher. Then we have the internet which is a loosely coupled WAN (wide area network) which too cannot fully adopt ACID model because the availability is low. We require techniques which can help the ACID model in adjusting to such extremely varying situations.
Research is going on new techniques that help in maintaining concurrency control as well as recovering in dissemination – oriented environments, heterogeneous systems and so on. Another problem is that this model is not capable of exploiting the data and application semantics through a general mechanism. Knowledge about this can help a great deal in improving the performance of the system by a significant margin. A separate research has been going on over the subject of recovery and concurrency control. The DBMS’ recovery component is responsible for durability as well as atomicity of the ACID transactions. Distinguishing between the volatile storage and the non – volatile storage becomes absolutely necessary. The below mentioned three types of failure pose a challenge for the proper working of a DBMS:
- Transaction failure: In some cases it happens that the transaction during execution reaches a state from where it cannot commit successfully. In such cases all the updates made by that transaction have to be erased from the database as a measure of atomicity preservation. This is called transaction rollback.
- System failure: A system failure often causes a loss of volatile memory contents. It has to be ensured that the updates made by all the transactions before the occurrence of the crash persist in the database and the updates made by the unsuccessful transactions have been removed.
- Media failure: In these failures, the non – volatile storage gets corrupted which makes it impossible to recover the online version of the data. Here the option is to restore the database from an archive and using operation logs all the updates must be made.
Recovery from the last kind of failure requires using other additional mechanisms. For all this to take effect, the recovery component has to be reliable. Flaws in the recovery system of a database put it to a high risk of losing data.

Tuesday, September 16, 2014

What is 2 phase locking with respect to database concurrency control?

2PL or Two – phase locking is the most widely used concurrency control method. This method is used for implementing the serializability theory. The protocol makes use of two types of locks that the transactions apply to data which leads to blocking of the other transactions from accessing the same data item during the execution period of that transaction.
The 2PL protocol works in the following 2 phases:
- The expanding phase: In this phase the locks are only acquired and there is no releasing.
- The shrinking phase: In this phase the locks are only released and not acquired.

Shared locks (S) and exclusive locks (X) are the two types of lock used by this protocol. However many refinements have been produced of this protocol which utilize more than one type of lock. Since 2PL blocks processes using locks, it can lead to deadlocks as a result of the blocked transactions. SS2PL (strong strict two – phase locking) is combined to form the 2PL and is also known as rigorousness. It is mostly used for maintaining concurrency control in the database systems. The protocol has a number of variants, the most common being the strict 2PL, which is a combination of 2PL and strictness. A schedule that obeys the protocol is said to be serializable.
In a typical transaction, when the phase – 1 of transaction ends and there is no explicit information available, it is said to be in a ready state i.e., it can commit now without requiring any more locks. In such cases, we can end the phase-2 immediately or sometimes it might not even be required. In other cases where more than one processes are involved, we determining the end of phase – 1 and begin releasing of the locks with the help of a synchronization point. If this is not done, we violate the serializability and strict 2PL rules. But, determining such a transaction point is very costly and therefore the transaction end is merged with the end of phase – 1 eliminating the need of phase – 2.
Thus 2PL is turned in to SS2PL. In S2PL, the transactions must release their locks (X locks) after they have completed their write operation either by aborting or committing. The read locks (S) on the other hand are released on regular basis in phase 2.  Explicit phase – 1 end support is required for implementation of the general S2PL.
Strong strict 2Pl is also known as rigorous two – phase locking, rigorousness or rigorous scheduling and so on. Both the read and write locks are released after the completion of the transaction by the protocol. A transaction that complies with SS2PL is the one having only phase – 1 during its entire life time and no phase – 2. The class of schedules exhibiting the SS2PL property is called rigorousness. S2PL is a superset of SS2PL classes. This one has been the concurrency control mechanism choice for most of the database designers. The main advantage is that it provides strictness apart from serializability.
These two properties are very much necessary for an efficient recovery of the database as well as in commitment ordering. Global serializability and distributed serializability solutions are used for distributed environments. The down side of 2PL protocol is deadlocks. The data access operations are blocked by the locks resulting in a deadlock.  In this situation none of the blocked transactions can reach completion. Thus resolving the deadlocks effectively is a major issue. It can be resolved by aborting one of the locked transactions, thus eliminating the cycle in the precedence graph. The wait -  for -  graphs are used for detecting deadlocks. 

Sunday, September 14, 2014

What is Timestamp ordering method for database concurrency control?

Concurrency control methods with respect to the database systems can be divided in to two categories namely:
- Lock concurrency control methods: Example, 2PL, SS2PL etc.
- Non – lock concurrency control methods: Example, timestamp ordering method.
In this article we focus upon the latter one (Non – lock concurrency control methods). This method is widely used for handling transactions safely with the help of time stamps. Let us see how it operates.

The timestamp ordering mechanism makes three assumptions prior to operating:

- The value of every timestamp is unique and the time instant it represents is accurate.
- There are no duplicate timestamps.
- The lower – valued time stamp occurs before a higher – valued timestamp.

There are three methods that for generating the timestamps:

- It takes the values from the system clock as timestamp when the transaction starts.
- It uses a thread – safe shared counter as timestamp and it is incremented when the transaction starts.
- It combines the above two methods.

Formal: A transaction is considered to have an ordered list of operations. Before the first operation begins, the current timestamp is marked on the transaction. An initial empty set of transactions and empty set of objects is assigned to every transaction on which it depends and updates respectively. Two timestamp fields are given to each object which are meant to be used only for concurrency control.

Informal: A timestamp is assigned to the transaction at the very beginning. This makes it possible to tell in which order the transactions have to be applied. So when we have two transactions meant for operating on the same object, the one with the lower timestamp is executed first. However, if this transaction is incorrect, then it must be aborted immediately and restarted. The object has one read timestamp and one write timestamp, both of which are updated when the corresponding read and write operations are carried out.

Two cases are considered when an object has to be read by a transaction:
- The transaction starts prior to the write timestamp: This indicates that the data of the object was changed by something. The transaction is aborted and then restarted.
- The transactions starts after the write timestamp: The transaction can safely read the object. The read timestamp is changed to transaction timestamp.

The following cases are considered when an object has to be written or updated by a transaction:
- Transaction starts prior to read timestamp: This indicates that the object has been read by something. Assuming that the reading transaction has a copy of the data, we don’t write to it for preventing changes from being made to the copy. So we abort and restart the transaction.
- Transaction starts prior to the write timestamp: This means that the object was changed by something at the starting time of our transaction. Here we apply the Thomas write rule, skipping the current operation and then continuing as normal. Aborting and restarting is not required.
- The transaction changes the object and its timestamp is written over by the write timestamp.

Recoverability: In this concurrency control method, recoverable histories are not produced. To make recovery possible, we have to employ a scheduler for keeping a list of transactions having read from. Unless there are only committed transactions in the list, a transaction should not be allowed to commit. Also the data produced by uncommitted transactions can be tagged as dirty and read operations should be banned from using such data as a measure against cascading aborts. The scheduler should not permit transactions to carry out any operations on dirty data so as to maintain a strict history. 

Friday, September 12, 2014

What are some major goals of database concurrency control?

Serial or sequential execution of transactions that access the database have no overlapping of time periods and therefore, no concurrency can be achieved in database systems following such a simple execution mechanism. But at the same time, if you start working out the combinations, if concurrency does exist by allowing interleaving of the operations of the transactions, we risk getting some undesirable results because of the improper control over concurrency. Below we give some examples:
- The dirty read problem: One transaction say A reads a data item which has been already written by another transaction say B which later aborted. This value is a result of an aborted operation and therefore is not valid and should not be read by other transactions. This is called dirty read. As a result of this, the transaction B will produce incorrect results.
- The lost update problem: A transaction B writes a data item for the first time which has already been written by another concurrent transaction A while still in progress. In this case we lose the value written by transaction B to transaction A because of overwriting.  According to the rules of precedence the first value should be read by the transactions first before the second value. As a consequence of this, the transactions yield wrong results.
- The incorrect summary problem: We have one transaction A in execution that considers all the values of the multiple instances of a single data item and we have another transaction B whose operations change the value of some of the instances. Therefore the end result does not reflect the correct summary. This is necessary for correctness.  Also it is possible that certain results might have not been included in the summary depending on the time instances at which the updates were made.

The database systems that require high transactional performance need that there concurrent transaction are executed properly so as to meet the goals. In fact, for modern businesses, a database cannot even be considered which does not meet such goals. Now what are these goals? Let us see below:
- Correctness: For attaining this goal, it is important that the system allows the execution of only the serializable schedules. If there is no serializability, we might face the above listed problems. Serializability can be defined as the equivalency of a schedule to some serial schedule having the same transactions. The transactions must be sequential in nature void of any time overlaps and isolated. The highest isolation level can be obtained only through serializability. In some cases, serializability might be cut down for allowing system to give better performance. It might also be relaxed in distributed environments for satisfying their availability. But this is done only on the condition that there is no compromise with correctness. A practical example is of the transactions involving money. If we relax serializability here, money can be transferred to the wrong account. Serializability is achieved by concurrency control mechanisms by means of the conflict serializability. This is nothing but a special case of serializability.
- Recoverability: This is another major goal which insists that after a crash the database must be able to recover efficiently without losing the effects of the successfully committed transactions. Recoverability ensures that the database system is able to tolerate the faults and might not get corrupted because of the media failure. The responsibility of the protection of the data that system stores in its database is given to the recovery manager. This component works together with the concurrency control component for providing an all time available and reliable access to data. Together these two components keep the database system in a consistent state.

The database systems must ensure that the integrity of the transaction is not affected by any kind of fault. In the modern world, with high frequency of transactions, and large monetary values at stake, any problem or violation of the above goals can be very expensive.

Thursday, September 11, 2014

Database access: What is Multi - version concurrency control?

A common concurrency control method used by the database management systems is the multiversion concurrency control often abbreviated as MVCC or MCC. The method helps in regulating the probelm causing situation of concurrent access to the database. Plus this concurrency control method is used for implementing transactional memory in the programming languages. When a read operation and another write operation is being carried out on the same piece of data item in a database, the database is likely to appear as inconsistent to the user. Using such half – written piece of data further is dangerous and can lead to system failure.
Concurrency control methods provide the simplest solution to this problem. These make sure that no read operation is carried out until the write operation is complete. In this case, the data item is said to have been locked by the write operation. But this traditional method is quite slow, with processes waiting for read access having to wait. So we use a different approach to concurrency control as provided by MVCC. At a given instant of time, a snapshot of the database is shown to each connected user.
Changes made by a write operation won’t be visible to the other database users until it is done completely. In this approach when an item has to be updated, the old data is not overwritten with the new data. Instead the old data is marked as obsolete and the new data is added to the database. Thus, the database has multiple versions of the same data but out of them only one is up to date. With this method the users are able to access the data that they were reading when they began the operation. Now it doesn’t matter whether the data was modified or deleted by some other thread. Though this method avoids the overhead generated by system in filling memory holes, it does require the system to periodically inspect the database and remove the obsolete data.
In case of the document – oriented databases, this method allows optimization of these documents by storing them on to a contiguous memory section of the disk. If any update is made to the document, it is simply rewritten. Thus, having multiple pieces of the documented maintained through links is not required. One can have point in time views in MVCC at a certain consistency. Any read operations carried out under MVCC make use of either transaction ID or a timestamp for determining what state the database is in. this avoids the need for lock management or  reading the transactions by isolating the write operations.
Only the future version of the data is affected by the write operation whereas the transaction ID on which the read operation is being carried out remains consistent because it’s a later transaction ID on which the transaction ID is working. The transactional consistency is achieved by MVCC by means of increasing IDs or timestamps. Another advantage of this concurrency control method is that no transaction has to wait for an object since it maintains many versions of the same data object. Each version is labeled with a timestamp or an ID. But, MVCC fails too at certain points. First of all true snapshot isolation cannot be achieved by MVCC. In some cases read – read anomalies and skew write anomalies also surface. There are two solutions to these anomalies namely:

- Serializable snapshot isolation and
- Precisely serializable snapshot isolation

But these solutions work at the cost of transaction abortion. Some databases which use MVCC are:
- ArangoDB
- Bigdata
- CouchDB
- Cloudant
- HBase
- Altibase
- Ingres
- Netezza

Sunday, September 7, 2014

Some tools that can be used in Test Driven development

Here we present a list of tools that can be used for carrying out Test Driven Development (TDD). The test driven development is equivalent to the Big upfront designing. The efficiency and the development speed both are improved in this process. Finding mistakes is quite fast and cheap in TDD. The iterations are short and therefore provide a means for frequent feedback. The test suites are up to date and executable and serve as a complete documentation in every sense. The tests are written before the code. Then the code is refactored to produce high quality code.

Tools for TDD are:
• JUnit: this is the java’s unit – testing framework.
• Some commonly used refactoring tools are RefactorIT, IntelliJ IDEA, Eclipse, Emacs and so on.
• HTTPUnit: This is a black box web testing tool and is used for automated web site testing.
• Latka: A java implemented functional testing tool. This tool can be used with JUnit or Tomcat.
• Cactus: A server – side unit testing tool provides a simple framework for unit testing on server side. It also extends on JUnit. The tool provides three types of unit tests i.e., the code logic unit testing, functional unit testing and the integration unit testing.
• Abbot and Jemmy: Tools for GUI testing. The first one keeps a scripted control on actions based on high level semantics. Jemmy is more powerful and provides full support for swing.
• Xmlunit: This tool lets you draw comparisons between 2 xml files.
• Ant: The tool is based up on java and xml. It is completely platform independent.
• Anteater: An Ant based testing framework. It is used for writing tests that check the functionality of web services and web applications.
• Subversion: This one is a replacement for CVS and controls directories in a better way.
• Jmock: The tool uses mock objects that are dummy implementations of the real functionality and are used for checking the behaviour of the code. The non – trivial code cannot be tested in isolation. Unit tests can be used for everything right from simplification of test structure and cleaning up domain code. You should start from testing one feature at a time and rooting out the problems one by one. Carrying out the testing normally without using any mock objects can be hard. Before carrying out the test, you should decide what needs to be verified. Then you must show that it passes the test. Only after this, the mock objects can be added for representing these concepts.


The above mentioned tools are used for writing unit tests in Test Driven development. However, these unit tests are bound to have errors. Also the unit tests are not apt for finding errors resulting because of interactions between different units. The path to success involves keeping things as simple as possible.
The end result of TDD is highly rewarding. The program design is organic and consists of loosely coupled components. All that you think can go wrong should be tested by proceeding in baby steps. One thing to be taken care of is that the unit tests should run at their 100%. Refactoring is not about changing the behaviour of the code. It is about making improvements to the internal code structure. It should be carried out to improve the quality, maintainability and reliability of the code.

What is the difference between Test Driven development and Acceptance Test Driven development?

Test driven development (TDD) and Acceptance Test Driven Development (ATDD) are related to each other. TDD is a developer’s best tool for helping in creating properly written code units i.e., modules, classes and functions, that carry out the desired function properly. On the other side, ATDD acts as a tool for communication between the developers, testers and the customers to define requirements. Another difference between TDD and ATDD is that the former requires test automation while ATDD does not. But ATDD might require automated regression testing. The tests written for ATDD can be used for deriving tests for TDD. This is so because a part of the requirement is implemented in the code. It is important that the tests must be readable by the customers, but it is not required for TDD tests.
TDD involves writing the test cases and making them fail before the code is written. Next enough code is written to make the tests pass and then refactor it as per the requirements. The tests must pass after it. The primary focus of TDD is on methods and single classes i.e., the low level functionality. This leads to much flexible code. Now what is the need for ATDD? The reason is that the TDD just tells you that your code is fine but it doesn’t tell you why that piece of code is even required. However, in ATDD, in the early stages of the software development process itself the acceptance criteria are stated. In the succeeding stages of development process this criteria is used for guiding the development. ATDD is more like a collaborative activity engaging everyone from developers and testers to business analysts and product owners. It ensures the implementation process is understood by everyone involved in the development process.
There is a third thing called the BDD or the behaviour driven development. This one is quite similar to TDD except that we call tests as specs. The focus of the BDD is on system’s behaviour rather than its implementation details. It also focuses on the interactions taking place in software development. Developers following this approach make use of two languages i.e., the domain language and their native language. TDD consists of unit tests while ATDD uses acceptance tests. Plus, the focus is on high level. BDD is required for making the product more focussed on customer. BDD allows the collaboration of developers with other stakeholders easy. Using TDD techniques and tools requires technical skills because you have to know the details of the object. Non – technical stakeholders would be lost while following the unit tests of TDD. BDD provides a clearer view of the purpose of the system. TDD provides this view to the developers.
The result of the methodology of ATDD is entirely based up on the communication that takes place between the developers, testers and their customers. A number of practices are involved in ATDD such as Behavior driven testing, specification by example, story test – driven development, example – driven development and so on. With these processes, the developers are able to understand the needs of the customers before the actual implementation. The difference between TDD and ATDD arises because of the latter’s emphasis on the collaboration between the three i.e., developers, testers and customers. The acceptance testing is encompassed in ATDD but in one way it is similar to TDD i.e., it insists on writing the tests before coding can be started. These tests provide an external view of the system from the point of view of a user.

Monday, September 1, 2014

What is Commitment ordering method for database concurrency control?

Commitment ordering or CO is a class that consists of techniques for implementing interoperable serializability in concurrency control mechanism of the transaction processing system, database systems and other applications related to database management. With the use of commitment ordering methods we can have non – blocking or optimistic implementations. With the advent of multi – processor CPUs, there has been a tremendous increase in the employment of CO in transactional memory (software transactional memory to be particular) and concurrent programming. In these fields CO is used as a means for having non – blocking serializability.
In a schedule that is CO compliant, there is compatibility between the chronological order of the events and precedence order of respective transactions. Conflict serializability when viewed with a broad meaning is nothing but CO. it is highly effective, offers high performance, reliability, distributable, scalable etc.; with these qualities it is a great way of achieving modular serializability across a heterogeneous database systems collection i.e., the one which contains database systems employing different concurrency control methods. A database system that is not CO compliant is linked to a CO component such as COCO – commitment order coordinator. The purpose of this component is put the commitment events in order so as to make the system CO compliant. This also removes access to data and any interference in the operation of transactions. All this leads to reduction of overhead and we get an appropriate solution for distributed serializability and global serializability. A fundamental part of this solution is the atomic commitment protocol or ACP which is used in breaking the cycles present in the conflict graph. This graph can either be a serializability graph or a precedence graph. If the concurrency control information is not shared among the involved database systems beyond ACP messages or if they don’t have any knowledge about the transactions, then for achieving global serializabiolity, CO becomes the absolutely necessary condition.
Another advantage of CO is that its local concurrency information distribution is not costly. This information includes Timestamps, tickets, relations, locks and the local precedence relations etc. it makes use of SS2PL property. SS2PL used with 2PC (two – phase commit protocol) becomes the de facto standard through which global serializability can be achieved. This also creates a transparent process through which the other CO compliant systems can join such global solutions. When a multi – database environment is based upon commitment ordering, the global deadlocks can be resolved automatically without requiring human intervention. This is an important benefit of having CO compliant systems. There is another concept where we intersect CO and strictness called as the strict commitment ordering or SCO. This results in a better overall throughput, shorter execution times for transactions and thus better performance when compared to the traditional SS2PL. The positive impact of having SCO can be felt during lock contention. The same database recovery mechanism can be used by both SCO and SS2PL by virtue of the strictness property. Today we have two major variants of CO namely:
- CO – MVCO and
- CO – ECO
The first one is the multi - version and the second one is called the extended version. Any concurrency control method that is relevant can be combined with these two for employing non – blocking implementations. Both make use of additional information for making relaxations to the constraints and for better performance. A technique is used by CO and variants called the Vote Ordering or VO – a container schedule set. In case of absence of concurrency control information sharing, global serializability can be guaranteed only if there is local VO. The inter – operation of CO and variants is quite transparent which makes automatic deadlock resolution possible also in the heterogeneous environments. 

Facebook activity