Subscribe by Email


Sunday, September 6, 2009

Deductive Object-Oriented Databases

Deductive Object Oriented databases (DOODs) came about through the integration of the OO paradigm and logic programming. The following broad approaches have been adopted in the design of DOOD systems:
- Language Extension : An existing deductive language model is extended with object oriented features. For example, Datalog is extended to support identity, inheritance, and other OO features.
- Language Integration : A deductive language is integrated with an imperative programming language in the context of an object model or type system. The resulting system supports a range of standard programs, while allowing different and complementary programming paradigms to be used for different tasks, or for different parts of the same task.
- Language Reconstruction : An object model is reconstructed, creating a new logic language that includes object-oriented features. in this strategy, the goal is to develop an object logic that captures the essentials of the object-oriented paradigm and that can also be used as a deductive programming language in DOODs.

Validity combines deductive capabilities with the ability to manipulate complex objects. The ability to declaratively specify knowledge as deduction and integration rules brings knowledge independence. Moreover, the logic-based language of deductive databases enables advanced tools, such as those for checking the consistency of a set of rules, to be developed. VALIDITY provides the following :
- A DOOD data model and language, called DEL (Datalog Extended Language).
- An engine working along a client-server model.
- A set of tools for schema and rule editing, validation, and querying.

The DEL data model provides object-oriented capabilities, similar to those offered by the ODMG data model, and includes both declarative and imperative features. The declarative features include deductive and integrity rules, with full recursion, stratified negation, disjunction, grouping, and quantification. The imperative features allow functions and methods to be written.


What is the ARIES Recovery Algorithm ?

'Algorithms for Recovery and Isolation Exploiting Semantics', or ARIES is a recovery algorithm designed to work with a no-force, steal database approach; it is used by IBM DB2, Microsoft SQL Server and many other database systems.

Three main principles lie behind ARIES:
- Write ahead logging: Any change to an object is first recorded in the log, and the log must be written to stable storage before changes to the object are written to disk.
- Repeating history during Redo: On restart after a crash, ARIES retraces the actions of a database before the crash and brings the system back to the exact state that it was in before the crash. Then it undo the transactions still active at crash time.
- Logging changes during Undo: Changes made to the database while undoing transactions are logged to ensure such an action isn't repeated in the event of repeated restarts.

The ARIES recovery procedure consists of three main steps :
- Analysis : It identifies the dirty (updated) pages in the buffer and the set of transactions active at the time of crash. The appropriate point in the log where REDO operation should start is also determined.
- REDO phase : It actually reapplies updates from the log to the database. Generally the REDO operation is applied to only committed transactions. However, in ARIES, this is not the case. Certain information in the ARIES log will provide the start point for REDO, from which REDO operations are applied until the end of the log is reached. Thus only the necessary REDO operations are applied during recovery.
- UNDO phase : The log is scanned backwards and the operations of transactions that were active at the time of the crash are undone in reverse order. The information needed for ARIES to accomplish its recovery procedure includes the log, the transaction table, and the dirty page table. In addition, checkpointing is used.

DATA STRUCTURES USED IN ARIES RECOVERY ALGORITHM :
Log records contain following fields :
- LSN
- Type (CLR, update, special)
- TransID
- PrevLSN (LSN of prev record of this txn)
- PageID (for update/CLRs)
- UndoNxtLSN (for CLRs)
* indicates which log record is being compensated
* on later undos, log records upto UndoNxtLSN can be skipped
- Data (redo/undo data); can be physical or logical.

Transaction Table :
- Stores for each transaction:
* TransID, State.
* LastLSN (LSN of last record written by txn).
* UndoNxtLSN (next record to be processed in rollback).
- During recovery:
* Initialized during analysis pass from most recent checkpoint.
* Modified during analysis as log records are encountered, and during undo.

Dirty Pages Table
- During normal processing :
* When page is fixed with intention to update
"Let L = current end-of-log LSN (the LSN of next log record to be generated).
" if page is not dirty, store L as RecLSN of the page in dirty pages table.
* When page is flushed to disk, delete from dirty page table.
* Dirty page table written out during checkpoint.
* (Thus RecLSN is LSN of earliest log record whose effect is not reflected in page on disk).
- During recovery :
* Load dirty page table from checkpoint.
* Updated during analysis pass as update log records are encountered.

Checkpoints :
- Begin_chkpt record is written first.
- Transaction table, dirty_pages table and some other file mgmt information are written out.
- End_chkpt record is then written out.
* For simplicity all above are treated as part of end_chkpt record.
- LSN of begin_chkpt is then written to master record in well known place on stable storage.
- Incomplete checkpoint.
* if system crash before end_chkpt record is written.
- Pages need not be flushed during checkpoint
* They are flushed on a continuous basis.
- Transactions may write log records during checkpoint.
- Can copy dirty_page table fuzzily (hold latch, copy some entries out, release latch, repeat).


Database Recovery Techniques - Overview

The main goal of recovery is to ensure the atomicity property of a transaction. If a transaction fails before completing its execution, the recovery mechanism has to make sure that the transaction has no lasting effects on the database. The different approaches to recovery :
- Deferred Update : These techniques postpone any actual updating of the database on disk until a transaction reaches its commit point. The transaction force writes the log to disk before recording the updates in the database. This approach, when used with certain concurrency control methods, is designed never to require transaction rollback, and recovery simply consists of redoing the operations of transactions committed after the last checkpoint from the log.
Disadvantage : Too much buffer space may be needed, since updates are kept in the buffers and are not applied to disk until a transaction commits.
Deferred update can lead to a recovery algorithm known as NO-UNDO/REDO. Immediate update techniques may apply changes to the database on disk before the transaction reaches a successful conclusion. Any changes applied to the database must be first recorded in the log and force-written to disk so that these operations can be undone if necessary.
- Another algorithm, known as UNDO/NO-REDO, can also be developed for immediate update if all transaction actions are recorded in the database before commit.
- There is another technique, called shadow paging technique, which keeps track of old database pages by using a shadow directory. This technique, which is classified as NO UNDO/NO-REDO, does not require a log in single-user systems but still needs the log for multiuser systems.
- ARIES, a specific recovery scheme is used in some of IBM's relational database products.
- Two-phase commit protocol is used for recovery from failures involving multi-database transactions.
- Recovery from catastrophic failures is typically done by backing up the database and the log to tape. The log can then be backed up more frequently than the database, and the backup log can be used to redo operations starting from the last database backup.


How is shadow paging performed ?

Shadow paging is an alternative to log-based recovery techniques, which has both advantages and disadvantages. It may require fewer disk accesses, but it is hard to extend paging to allow multiple concurrent transactions. The paging is very similar to paging schemes used by the operating system for memory management.

How Shadow Paging is performed ?
- To commit a transaction :
* Flush all modified pages in main memory to disk.
* Output current page table to disk.
* Make the current page table the new shadow page table, as follows:
1. Keep a pointer to the shadow page table at a fixed (known) location on disk.
2. To make the current page table the new shadow page table, simply update the pointer to point to current page table on disk.
- Once pointer to shadow page table has been written, transaction is committed.
- No recovery is needed after a crash: New transactions can start right away, using the shadow page table.
- Pages not pointed to from current/shadow page table should be freed (garbage collected).

Advantages of shadow-paging technique:
- No overhead of writing log records.
- Recovery is trivial.

Disadvantages of shadow-page technique:
- Commit overhead : The commit of a single transaction using shadow paging requires multiple blocks to be output -- the current page table, the actual data and the disk address of the current page table. Log-based schemes need to output only the log records.
- Data fragmentation : Shadow paging causes database pages to change locations (therefore, no longer contiguous.
- Garbage collection : Each time that a transaction commits, the database pages containing the old version of data changed by the transactions must become inaccessible. Such pages are considered to be garbage since they are not part of the free space and do not contain any usable information. Periodically it is necessary to find all of the garbage pages and add them to the list of free pages. This process is called garbage collection and imposes additional overhead and complexity on the system.


Overview of Shadow Paging

A computer system, like any other mechanical or electrical system is subject to failure. There are a variety of causes, including disk crash, power failure, software errors, a fire in the machine room, or even sabotage. Whatever the cause, information may be lost. The database must take actions in advance to ensure that the atomicity and durability properties of transactions are preserved. An integral part of a database system is a recovery scheme that is responsible for the restoration of the database to a consistent stage that existed prior to the occurrence of the failure.

Shadow paging is a technique used to achieve atomic and durable transactions, and provides the ability to manipulate pages in a database. During a transaction, the pages affected by the transaction are copied from the database file into a workspace such as volatile memory, and modified in that workspace. When a transaction is committed, all of the pages that were modified by the transaction are written from the workspace to unused pages in the database file. During execution of the transaction, the state of the database exposed to the user is that is which the database existed prior to the transaction, since the database file still contains the original versions of the modified pages, as they existed before being copied into the workspace if a user accesses the database before the transaction is complete, or upon recovery of failure, it will appear as though the transaction has not occurred.

- Shadow paging is an alternative to log-based recovery; this scheme is useful if transactions execute serially.
- Basic Idea: Maintain two page tables during the lifetime of a transaction – the current page table, and the shadow page table.
- Store the shadow page table in nonvolatile storage, such that state of the database prior to transaction execution may be recovered.
* Shadow page table is never modified during execution.
- Initially, both the page tables are identical. Only current page table is used for data item accesses during execution of the transaction.
- Whenever any page is about to be written for the first time
* A copy of this page is made onto an unused page.
* The current page table is then made to point to the copy.
* The update is performed on the copy.


Wednesday, August 26, 2009

Shortest-Job-First (SJF) Scheduling

Shortest-Job-First (SJF) is a non-preemptive discipline in which waiting job (or process) with the smallest estimated run-time-to-completion, is run next. In other words, when CPU is available, it is assigned to the process that has smallest next CPU burst. The SJF scheduling is especially appropriate for batch jobs for which the run times are known in advance. Since the SJF scheduling algorithm gives the minimum average time for a given set of processes, it is probably optimal. The SJF algorithm favors short jobs (or processors) at the expense of longer ones.

Example :
Process Burst time Arrival
P1 6 0
P2 8 0
P3 7 0
P4 3 0
Gantt chart: Order P1, P2, P3, P4
| P4 | P1 | P3 | P2 |
0 3 9 16 24
Average waiting time: (0+3+16+9)/4 = 7
With FCFS: (0+6+(6+8)+(6+8+7))/4 = 10.25

Problem: SJF minimizes the average wait time because it services small processes before it services large ones. While it minimizes average wiat time, it may penalize processes with high service time requests. If the ready list is saturated, then processes with large service times tend to be left in the ready list while small processes receive service. In extreme case, where the system has little idle time, processes with large service times will never be served. This total starvation of large processes may be a serious liability of this algorithm.


CPU / Process Scheduling

CPU scheduling is the basis of multi-programmed operating systems. By switching the CPU among processes, the operating system can make the computer more productive. A multiprogramming operating system allows more than one process to be loaded into the executable memory at a time and for the loaded process to share the CPU using time-multiplexing.

Goals for Scheduling :
* Utilization/Efficiency: keep the CPU busy 100% of the time with useful work.
* Throughput: maximize the number of jobs processed per hour.
* Turnaround time: from the time of submission to the time of completion, minimize the time batch users must wait for output.
* Waiting time: Minimize the waiting time i.e. sum of times spent in ready queue.
* Response Time: time from submission till the first response is produced, minimize response time for interactive users.
* Fairness: make sure each process gets a fair share of the CPU.

Some goals can be met by incorporating a notion of priority into a “base” scheduling discipline. Each job in the ready pool has an associated priority value; the scheduler favors jobs with higher priority values.
External priority values:
• imposed on the system from outside.
• reflect external preferences for particular users or tasks.
“All jobs are equal, but some jobs are more equal than others.”
• Example: Unix nice system call to lower priority of a task.
• Example: Urgent tasks in a real-time process control system.

Internal priority: system adjusts priority values internally as as an implementation technique within the scheduler. It improves fairness, resource utilization, freedom from starvation.
• drop priority of jobs consuming more than their share
• boost jobs that already hold resources that are in demand
e.g., internal sleep primitive in Unix kernels
• boost jobs that have starved in the recent past
• typically a continuous, dynamic, readjustment in response to observed conditions and events may be visible and controllable to other parts of the system.

Scheduling policies may be preemptive or non-preemptive.
* Non-Preemptive: Non-preemptive algorithms are designed so that once a process enters the running state(is allowed a process), it is not removed from the processor until it has completed its service time ( or it explicitly yields the processor).
* Preemptive: Preemptive algorithms are driven by the notion of prioritized computation. The process with the highest priority should always be the one currently using the processor. If a process is currently using the processor and a new process with a higher priority enters, the ready list, the process on the processor should be removed and returned to the ready list until it is once again the highest-priority process in the system.


Facebook activity