Subscribe by Email

Sunday, September 27, 2009

Risk Identification

The objectives of risk identification are to :
(1) identify and categorize risks that could affect the project and
(2) document these risks.
The outcome of risk identification is a list of risks. For non-complex, low-cost projects, the risks may be kept simply as a list of red flag items. The items can then be assigned to individual team members to watch throughout the project development process and used for risk allocation purposes, as described later in this document. For complex, high-cost projects, the risks can feed the rigorous process of assessment, analysis, mitigation and planning, allocation, and monitoring and updating described in this document.

The risk identification process begins with the team compiling the project's risk events. The identification process will vary, depending on the nature of the project and the risk management skills of the team members, but most identification processes begin with an examination of issues and concerns created by the project development team. These issues and concerns can be derived from an examination of the project description, work breakdown structure, cost estimate, design and construction schedule, procurement plan, or general risk checklists. Checklists and databases can be created for recurring risks, but project team experience and subjective analysis almost always will be required to identify project specific risks. The team should examine and identify project events by reducing them to a level of detail that permits an evaluator to understand the significance of any risk and identify its causes, (i.e., risk drivers).

One method for identifying risks is to create a risk item checklist. The checklist can be used for risk identification and focuses on some subset of known and predictable risks in the following generic subcategories :
- Product size : risks associated with the overall size of the software to be built or modified.
- Business Impact : risks associated with constraints imposed by management or the marketplace.
- Customer characteristics : risks associated with the sophistication of the customer and the developer's ability to communicate with the customer in a timely manner.
- Process definition : risks associated with the degree to which the software process has been defined and is followed by the development organization.
- Development environment : risks associated with the availability and quality of the tools to be used to build the product.
- Technology to be built : risks associated with the complexity of the system to be built and the "newness" of the technology that is packaged by the system.
- Staff size and experience : risks associated with the overall technical and project experience of the software engineers who will do the work.

Wednesday, September 23, 2009

Quick Overview of SQA Plan

The SQA plan provides a road map for instituting software quality assurance.

Basic components of a SQA:
- The purpose of the plan and its scope
- management
* organization structure, SQA tasks, their placement in the process.
* roles and responsibilities related to product quality.
- documentation
* project documents, models, technical documents, user documents.
- standards, practices, and conventions.
- reviews and audits.
- test - test plan and procedure.
- problem reporting, and correction actions.
- tools.
- code control.
- media control.
- supplier control.
- records collection, maintenance, and retention.
- training.
- risk management.

Statistical Quality Assurance - Overview

Statistical quality assurance reflects a growing trend throughout industry to become more quantitative about quality. And what does Statistical quality assurance means. Well, those big word imply the following series of steps steps that form the process:

- Information about software defects is collected and categorized.
- An attempt is made to trace each defect to its underlying cause.
- Using the Pareto principle (80 percent of the defects can be traced to 20 percent,
and isolate the 20 percent).
- Once the vital few causes have been identified, the defects are corrected.

Causes of errors:
- incomplete or erroneous specification (IES).
- misinterpretation of customer communication (MCC).
- intentional deviation from specification (IDS).
- violation of programming standards (VPS).
- error in data representation (EDR).
- inconsistent module interface (IMI).
- error in design logic (EDL).
- incomplete or erroneous testing (IET).
- inaccurate or incomplete documentation (IID).
- error in programming language translation of design (PLT).
- ambiguous or inconsistent human-computer interface (HCI).
- miscellaneous (MIS).

In conjunction with the collection of defect information, software developers can calculate an error index (EI) for each major step in the software engineering process. After analysis, design, coding, testing, and release, the following data are collected:
Ei = the total no. of errors uncovered during the ith step in the process.
Si = the no. of serious errors.
Mi = the no. of moderate errors.
Ti = the no. of minor errors.
PS = the size of the product at the ith step.
At each step in the software engineering process, a phase index (PI i ) is computed:
PI i = ws (Si/Ei) + wm(Mi/Ei) + wt(Ti/Ei)
Error index (EI) can be computed as follows:
EI = (PI 1 + 2 PI 2 + 3 PI 3 + iPI I)/PS

Introduction To Software Reviews

What are software reviews?
- serves as a “filter” for the software engineering process.

Purpose: Software reviews serve to uncover errors in analysis, design, coding, and testing and are hence invaluable.

Why do software reviews? The general principle behind doing reviews are:
- To err is human.
- It may not be easy to catch the errors in engineers’ work, the kind of errors that are not found in bugs.
- Finding errors early in the process is always more efficient than detecting errors much later in the cycle.

A review is a way to :
- identify needed improvements of the various parts and modules of a product.
- confirm the improvement areas of a product (if the improvements had been suggested by others).
- achieve technical work that can be more uniform, predicable, and manageable, making for a more better end result and a smoother cycle.

What are the different types of reviews:
- Informal reviews : informal meeting and informal desk checking.
- Formal reviews : Walkthrough, inspection, and round-robin reviews.

Formal Technical Reviews are used to improve work products. As a part of this process, any deliverable that is produced during the development life-cycle is eligible to be reviewed and should be picked up for review starting at an early stage, beginning in the specification stage and recurring at each identifiable milestone.

- A Review meeting is an important part of the of FTR (Formal Technical Review). There are are some essential parameters for the meeting, such as
1. There should be a reasonable number of persons conducting the meeting, not being a free-for-all
2. Each one of them should have done his/her homework and other preparation
3. The meeting should not be carried out for very long and should have a well defined agenda to avoid wastage of time. The duration should be just enough to churn out some constructive results.
4. The Formal Technical Review is effective when a small and specific part of the overall software is under scrutiny.
5. A review is more productive when the review artefacts have been broken down into small modules that can be reviewed independently. The target of the FTR (Formal Technical Review) is on a component of the project, a single module.

- Record keeping and reporting are additional quality assurance activities. They are applied to different stages of software development. During the FTR, it is the responsibility of reviewer (or recorder) to record all issues that have been raised.
At the end of review meeting, a review issue list that summarizes all issues, is produced. A simple review summary report is also compiled. A review summary report answers the following questions.
1. What was reviewed?
2. Who reviewed it ?
3. What were the findings and conclusions?

- Review Guidelines : A minimum set of guidelines for FTR:
* Review the product, not the producer.
* Set an agenda and maintain it.
* Limit debate and rebuttal.
* Enunciate problem areas, but don’t attempt to solve every problem noted.
* Take written notes.
* Limit the number of participants and insist upon advance preparation.
* Develop a checklist for each work product that is likely to be reviewed.
* Allocate resources and time schedule for FTRs.
* Conduct meaningful training for all reviewers.
* Review your early reviews.

Tuesday, September 22, 2009

Software Quality Assurance - SQA

Software Quality Assurance (SQA) is defined as a planned and systematic approach to the evaluation of the quality of and adherence to software product standards, processes, and procedures. SQA includes the process of assuring that standards and procedures are established and are followed throughout the software acquisition life cycle. Compliance with agreed-upon standards and procedures is evaluated through process monitoring, product evaluation, and audits. Software development and control processes should include quality assurance approval points, where an SQA evaluation
of the product may be done in relation to the applicable standards.

Establishing standards and procedures for software development is critical, since these provide the framework from which the software evolves. Standards are the established criteria to which the software products are compared. Procedures are the established criteria to which the development and control processes are compared.
Standards and procedures establish the prescribed methods for developing software; the SQA role is to ensure their existence and adequacy. Proper documentation of standards and procedures is necessary since the SQA activities of process monitoring, product evaluation, and auditing rely upon unequivocal definitions to measure project compliance.

Software quality assurance is composed of a variety of tasks associated with two different constituencies - the software engineers who do technical work and an SQA group that has responsibility for quality assurance planning, oversight, record keeping, analysis, and reporting. Product evaluation and process monitoring are the SQA activities that assure the software development and control processes.
Product evaluation is an SQA activity that assures standards are being followed. Product evaluation assures that the software product reflects the requirements of the applicable standard(s) as identified in the Management Plan.
Process monitoring is an SQA activity that ensures that appropriate steps to carry out the process are being followed.
A fundamental SQA technique is the audit, which looks at a process and/or a product in depth, comparing them to established procedures and standards. Audits are used to review management, technical, and assurance processes to provide an indication of the quality and status of the software product.

- Prepares an SQA plan for a project.
- Participates in the development of the project's software process description.
- Reviews software engineering activities to verify compliance with defined software process.
- Audits designated software work products to verify compliance with those defined as part of software process.
- Ensures that deviations in software work and work products are documented and handled according to a document procedure.
- Records any noncompliance and reports to senior management.

Quality Management - Overview

Software quality management is an umbrella activity - incorporating both quality control and quality assurance - that is applied at each step in the software process. SQA encompasses procedures for the effective application of methods and tools, formal technical reviews, testing strategies and techniques, procedures for change control, procedures for assuring compliance to standards, and measurement and reporting mechanisms.
SQA is complicated by the complex nature of software quality - an attribute of computer programs that is defined as "conformance to explicitly and implicitly specified requirements. But when considered more generally, software quality encompasses many different product and process factors and related metrices.

Software reviews are one of the most important quality control activities. Reviews serve as filters throughout all software engineering activities, removing errors while they are relatively inexpensive to find and correct. The formal technical review is stylized meeting that has been shown to be extremely effective in uncovering errors.
To properly conduct software quality assurance, data about the software engineering process should be collected, evaluated, and disseminated. Statistical SQA helps to improve the quality of the product and the software process itself. Software reliability models extend measurements, enabling collected defect data to be extrapolated into projected failure rates and reliability predictions.

Software quality assurance is the mapping of the managerial precepts and design disciplines of quality assurance onto the applicable managerial and technological space of software engineering. The ability to ensure quality is the measure of a mature engineering discipline. When the mapping is successfully accomplished, mature software engineering is the result.

Sunday, September 20, 2009

Software Configuration Management Process

The SCM repository is the set of mechanisms and data structures that allow a software team to manage change in an effective manner. It provides the obvious functions of a DBMS but in addition, the repository performs or precipitates functions such as :
- data integrity.
- information sharing.
- tool integration.
- data integration.
- methodology.
- document standardization.
To achieve these functions, the repository is defined in terms of a meta-model.

Software Configuration Management Process :
A process defines the steps by which you perform a specific task or set of tasks. An SCM process is the way SCM is performed on your project—specifically, how an SCM tool is applied to accomplish a set of tasks.
- Identification of Artefacts
Early identification and change control of artefacts and work products is integral to the project. The configuration manager needs to fully identify and control changes to all the elements that are required to recreate and maintain the software product.
- Version Control
The primary goal of version control is to identify and manage project elements as they change over time. The Configuration Manager should establish a version control library to maintain all lifecycle entities. This library will ensure that changes (deltas) are controlled at their lowest atomic level eg documents, source files, scripts and kits etc.
- Change Request Management
Change Request management can be described as management of change/enhancement requests. Typically the Configuration Manger should set up a repository to manage these requests and support activities like status tracking, assignment etc.
- Configuration audit
Identification, version control, and change control help the software developer to maintain order in what would otherwise be a chaotic and fluid situation. But how can we ensure that the change has been properly implemented?
A software configuration audit complements the formal technical review by addressing the following questions :
- Has the change specified in the ECO been made? Have any additional modifications been incorporated.
- Has a formal technical review been conducted to assess technical correctness.
- Has the software process been followed?
- Has the change been highlighted in SCI?
- Have SCM procedures for noting the change, recording it, and reporting it been followed?
- Have all related SCIs been properly updated?

Software Configuration Management - Baselines

Change is a fact of life in software development. A baseline is a software configuration management concept that helps us to control change without seriously impeding justifiable change. Before a software configuration item becomes a baseline, change may be made quickly and formally. However once a baseline is established, we figuratively pass through a swinging one door. Changes can be made but a specific formal procedure must be applied to evaluate and verify each change.

In terms of software engineering, a baseline is a milestone in the development of software. A baseline is marked by the delivery of one or more software configuration items that have been approved as a consequence of a formal technical review. Software engineering tasks produce one or more software configuration items (SCIs). After SCIs are reviewed and approved, they are placed in a project database. When a member of a software team wants to make a modification to a baselined SCI, it is copied from the project database into the engineer's private
workspace. However, this extracted SCI can be modified only if SCM controls are followed.

The Software Configuration Management process applied to this project identifies many types of baselines;each of them has specific identification rule and minimum verification characteristics.
All the development artifacts (source code, makefile, documentation) are identified and managed under configuration.

- Working baseline:
* Description: this is a development baseline in the private working area of a developer. This baseline is not identified since it is a private one.
* Identification: none
* Minimum characteristics: none
- Unstable baseline:
* Description: this is a development baseline in the public area of CVS.
* Identification: this baseline is not identified. It is only accessible as the 'latest' one.
* Minimum characteristics: none
- Stable baseline:
* Description: this is a development baseline in the public area of CVS. This baseline identifies a consistent set of modifications or an important step during the implementation of a modification.
* Identification: a stable baseline is identified according to the involved GHOSTS component and the current date.
* Minimum characteristics: a stable baseline shall be at least compilable / linkable.
- Official baseline:
* Description: this is a user baseline in the public area of CVS. This baseline identifies a consistent set of modifications; those modifications are well implemented, verified and tested.
* Identification: an official baseline is identified according to the part of the involved GHOSTS component and the release identification.
* Minimum characteristics: an official baseline shall be fully implemented, verified and tested.

Software Configuration Management (SCM)

Configuration management(CM)is the discipline of controlling the evolution of complex systems; software configuration management (SCM) is its specialization for computer programs and associated documents. SCM differs from general CM in the following two ways. First, software is easier to change than hardware, and it therefore changes faster. Even relatively small software systems, developed by a single team, can experience a significant rate of change, and in large systems, such as telecommunications systems, the update activities can totally overwhelm manual configuration management procedures. Second, SCM is potentially more automatable
because all components of a software system are easily stored on-line.

The goals of SCM are :
* Configuration identification - Identifying configurations, configuration items and baselines.
* Configuration control - Implementing a controlled change process. This is usually achieved by setting up a change control board whose primary function is to approve or reject all change requests that are sent against any baseline.
* Configuration status accounting - Recording and reporting all the necessary information on the status of the development process.
* Configuration auditing - Ensuring that configurations contain all their intended parts and are sound with respect to their specifying documents, including requirements, architectural specifications and user manuals.
* Build management - Managing the process and tools used for builds.
* Process management - Ensuring adherence to the organization's development process.
* Environment management - Managing the software and hardware that host our system.
* Teamwork - Facilitate team interactions related to the process.
* Defect tracking - Making sure every defect has traceability back to the source.

Basic SCM Concepts :
defines the basic elements of a data base for software configuration management. The data base stores all software objects produced during the life-cycle of a project.
- Creation of Software Objects : A source object is a software object that is composed manually, for instance with an interactive editor. Creating a source object requires human action; it cannot be produced automatically.A derived object is generated fully automatically by a program, usually from other software objects. A program that produces derived objects is called a deriver. Examples of derivers are compilers, linkers, document formatters, pretty printers, cross referencers, and call graph generators.
- Structure of Software Objects : The body of a software object is either atomic or structured. An atomic object, or atom, has a body that is not decomposable for SCM; its body is an opaque data structure with a set of generic operations such as copying, deletion, renaming, and editing. A configuration has a body that consists of sub-objects, which may themselves have sub-objects, and so on. Configurations have two subclasses: composites and sequences. A composite object, or simply composite, is a record structure comprised of fields. Each field consists of a field identifier and s field value. A field value is either an object identifier or a version group identifier. A sequence is a list of object and version group identifiers. Sequences represent ordered multi-sets of objects.

Friday, September 18, 2009

Overview of Change Management - Software configuration management

Software configuration management is an umbrella activity that is applied throughout the software process. SCM identifies, controls, audits, and reports modifications that invariably occur while software is being developed and after it has been released to a customer. All information produced as part of software engineering becomes part of a software configuration. The configuration is organized in a manner that enables orderly management of change.

The software configuration is composed of a set of interrelated objects, also called software configuration items, that are produced as a result of some software engineering activity. In addition to documents, programs, and data, the development environment that is used to create software can also be placed under configuration control. All SCIs are stored within a repository that implements mechanisms and data structures to ensure data integrity, provides integration support for other software tools, supports information sharing among all members of the software team, and implements functions in support of version and change control.

Once a configuration object has been developed and reviewed, it becomes a baseline. Changes to a baselined object result in the creation of a new version of that object. The evolution of a program can be tracked by examining the revision history of all configuration objects. Basic and composite objects form an object pool from which versions are created. Version control is the set of procedures and tools for managing the use of these objects.

Change control is a procedural activity that ensures quality and consistency as changes are made to a configuration object. The change control process begins with a change request, leads to a decision to make or reject the request for change, and culminates with a controlled update of the SCI that is to be changed.

The configuration audit is a SQA activity that helps to ensure that quality is maintained as changes are made. Status reporting provides information about each change to those with a need to know. Configuration management for Web engineering is similar in most respects to SCM for conventional software. However, each of the core SCM tasks should be streamlined to make it as lean as possible, and special provisions for content management must be implemented.

Overview of Formal Methods - Foundation for analysis methods

Formal methods provide a foundation for specification environments leading to analysis models that are more complete, consistent, and unambiguous than those produced using conventional or object-oriented methods. The descriptive facilities of set theory and logic notation enable a software engineer to create a clear statement of facts.

The underlying concepts that govern formal methods are :
- the data invariant, a condition true throughout the execution of the system that contains a collection of data.
- the state, a representation of a system's externally observable mode of behavior, or the stored data that a system accesses and alters.
- the operation, an action that takes place in a system and reads or writes data to a state. An operation is associated with two conditions : a precondition and a postcondition.

Discrete mathematics - the notation and heuristics associated with sets and constructive specification, set operators, logic operators, and sequences - forms the basis of formal methods. Discrete mathematics is implemented in the context of formal specification languages, such as OCL and Z. These formal specification languages have both syntactic and semantic domains. The syntactic domain uses a symbology that is closely aligned with the notation of sets and predicate calculus. The semantic domain enables the language to express requirements in a concise manner.

A decision to use formal methods should consider startup costs as well as the cultural changes associated with a radically different technology. In most instances, formal methods have highest payoff for safety-critical and business-critical systems.

Where are Formal Methods applied?
Although a complete formal verification of a large complex system is impractical at this time, formal methods are applied to various aspects, or properties, of large systems. More commonly, they are applied to the detailed specification, design, and verification of critical parts of large systems such as avionics and aerospace systems, and to small, safety-critical systems such as heart monitors.

Thursday, September 17, 2009

Cleanroom Software Engineering - Certification

The verification and testing techniques leads to software components that can be certified. Certification implies that the reliability can be specified for each component. The potential impact of certifiable software components goes far beyond a single cleanroom project. Reusable software components can be stored along with their usage scenarios, program stimuli, and probability distributions.

The certification approach involves five steps :
1. Usage scenarios must be created.
2. A usage profile is specified.
3. Test cases are generated from the profile.
4. Tests are executed and failure data are recorded and analyzed.
5. Reliability is computed and certified.

Cleanroom Certification Models :
- Sampling model : Software testing executes m random test cases and is certified if no failures or a specified numbers of failures occur. The value of m is derived mathematically to ensure that required reliability is achieved.
- Component model : A system composed of n components is to be certified. The component model enables the analyst to determine the probability that component i will fail prior to completion.
- Certification model : The overall reliability of the system is projected and certified.

Cleanroom Software Engineering - Design Refinement and Cleanroom Testing

Design Refinement & Verification
- If a function f is expanded into a sequence g and h, the correctness condition for all input to f is:
• Does g followed by h do f?
When a function f is refined into a conditional (if-then-else), the correctness condition for all input to f is:
• Whenever condition is true does g do f and whenever is false, does h do f?
- When function f is refined as a loop, the correctness conditions for all inputs to f are:
• Is termination guaranteed?
• Whenever is true does g followed by f do f, and whenever is false, does skipping the loop still do f?

Advantages of Design Verification :
- It reduces verification to a finite process.
- It lets cleanroom teams verify every line of design and code.
- It results in a near zero defect level.
- It scales up.
- It produces better code than unit testing.

The strategy and tactics of cleanroom testing are fundamentally different from conventional testing approaches.
Statistical Testing
- Generation of test cases
* each test case begins in a start state and represents a random walk through the usage model ending at a designated end state.
- Control of statistical testing
* a well-defined procedure is performed under specified conditions.
* each performance is a trial and can be used as part of an empirical probability computation.
- Stopping criteria for testing
* when testing goals or quality standards are achieved.
* when the difference between the predicted usage chain and the actual testing chain becomes very small.

The Cleanroom Strategy and Functional Specifications

The cleanroom approach makes use of a specialized version of the incremental process model. A pipeline of software increments is developed by small independent software teams. As each increment is certified, it is integrated into the whole. Hence, functionality of the system grows with time.

- Increment Planning : adopts the incremental strategy.
- Requirements Gathering : defines a description of customer level requirements (for each increment).
- Box Structure Specification : describes the functional specification.
- Formal Design : specifications (called “black boxes”) are iteratively refined (with an increment) to become analogous to architectural and procedural designs (called “state boxes” and “clear boxes,” respectively).
- Correctness Verification : verification begins with the highest level box structure (specification) and moves toward design detail and code using a set of “correctness questions.” If these do not demonstrate that the specification is correct, more formal (mathematical) methods for verification are used.
- Code Generation, Inspection and Verification : the box structure specifications, represented in a specialized language, are transmitted into the appropriate programming language.
- Statistical Test Planning : a suite of test cases that exercise of “probability distribution” of usage are planned and designed.
- Statistical Usage Testing : execute a series of tests derived from a statistical sample (the probability distribution noted above) of all possible program executions by all users from a targeted population.
- Certification : once verification, inspection and usage testing have been completed (and all errors are corrected) the increment is certified as ready for integration.

Functional Specifications :
Cleanroom Software engineering compiles with the operational analysis principles by using a method called box structure specification. A "box" encapsulates the system at some level of detail. Three types of boxes are used :
- Black box : It specifies a set of transition rules that describe the behavior of system components as responses to specific stimuli, makes use of inheritance in a manner similar to classes. It also specifies system function by mapping all possible stimulus histories to all possible responses.
S*->R i.e. stimulus history->responses
- State box : It is a generalization of a state machine, encapsulates the data and operations similar to an object, the inputs (stimuli) and outputs (responses) are represented, data that must be retained between transitions is encapsulated. The state is the encapsulation of the stimulus history. State variables are invented to save any stimuli that need to retained.
S x T -> R x T i.e. stimuli X state data -> responses X state data
- Clear box : It contains the procedural design of the state box, in a manner similar to structured programming. It specifies both data flow and control flow.
S x T -> R x T i.e. stimuli X state data -> responses X state data

- Transaction closure of stimuli and responses
* Users and uses are considered including security and error recovery.
- State migration within box hierarchy
* Downward migration of state data is possible whenever new black boxes are created inside a clear box.
* Upward migration of state date is desirable when duplicate data is updated in several places in the tree.
- Common services
* Reusable boxes from library.

Cleanroom Software Engineering Cont...

The Cleanroom Software Engineering process is a software development process intended to produce software with a certifiable level of reliability. The focus of the Cleanroom process is on defect prevention, rather than defect removal. The name Cleanroom was chosen to evoke the cleanrooms used in the electronics industry to prevent the introduction of defects during the fabrication of semiconductors.

Central principles of Cleanroom Software Engineering :
- Software development based on formal methods
Cleanroom development makes use of the Box Structure Method to specify and design a software product. Verification that the design correctly implements the specification is performed through team review.
- Incremental implementation under statistical quality control
Cleanroom development uses an iterative approach, in which the product is developed in increments that gradually increase the implemented functionality. The quality of each increment is measured against pre-established standards to verify that the development process is proceeding acceptably. A failure to meet quality standards results in the cessation of testing for the current increment, and a return to the design phase.
- Statistically sound testing
Software testing in the Cleanroom process is carried out as a statistical experiment. Based on the formal specification, a representative subset of software input/output trajectories is selected and tested. This sample is then statistically analyzed to produce an estimate of the reliability of the software, and a level of confidence in that estimate.

Advantages of Cleanroom Approach :
- Proven Practice : Several major products have been developed using Cleanroom and delivered to customers.
- Low Error Rate, highly reliable code.
- Increased Productivity.
- Can Be Incorporated Within A Quality Management Program Such As CMM Or ISO 9000.

Why are Cleanroom Techniques Not Widely Used ?
- Some people believe cleanroom techniques are too theoretical, too mathematical, and too radical for use in real software development.
- Relies on correctness verification and statistical quality control rather than unit testing (a major departure from traditional software development).
- Organizations operating at the ad hoc level of the Capability Maturity Model, do not make rigorous use of the defined processes needed in all phases of the software life cycle.

Cleanroom software engineering

Cleanroom software engineering is a formal approach to software development that can lead to software that has remarkably high quality. It uses box structure specification (or formal methods) for analysis and design modeling and emphasizes correctness verification, rather than testing, as the primary mechanism for finding and removing errors. Statistical use testing is applied to develop the failure rate information necessary to certify the reliability of delivered software.

The cleanroom approach begins with analysis and design models that use a box structure representation. A "box" encapsulates the system at a specific level of abstraction. Black boxes are used to represent the externally observable behavior of a system. State boxes encapsulate state data and operations. A clear box is used to model the procedural design that is implied by the data and operations of a state box.

Correctness verification is applied once the box structure design is complete. The procedural design for a software component is partitioned into a series of sub-functions. To prove the correctness of the sub-functions, exit conditions are defined for each sub-function and a set of sub-proofs is applied. If each exit is condition is satisfied, the design must be correct.

Once correctness verification is complete, statistical use testing commences. Unlike conventional testing, cleanroom software engineering does not emphasize unit or integration testing. Rather, the software is tested by defining a set of usage scenarios, determining the probability of use for each scenario, and then defining random tests that conform to the probabilities. The error records that result are combined with sampling, component, and certification models to enable mathematical computation of projected reliability for the software component.

The cleanroom philosophy is a rigorous approach to software engineering. It is a software process model that emphasizes mathematical verification of correctness and certification of software reliability. The bottom line is extremely low failure rates that would be difficult or impossible to achieve using less formal methods.

Wednesday, September 16, 2009

Component Based Development (CBD)

Component-based development is a CBSE activity that occurs in parallel with domain engineering. Using analysis and architectural design methods discussed earlier, the software team refines an architectural style that is appropriate for the analysis model created for the application to be built. Once the architecture has been established, it must be populated by components that are available from reuse libraries and/or are engineered to meet custom needs.

For those requirements that are addressed with available components, the following software engineering activities must be done:
- Component qualification : It examines reusable components. These are identified by characteristics in their interfaces, i.e. the services provided, and the means by which consumers access these services. This does not always provide the whole picture of whether a component will fit the requirements and the architectural style. This is a process of discovery by the software Engineer. This ensures a candidate component will perform the function required, and whether it is compatible or adaptable to the architectural style of the system. The three important characteristics looked at are performance, reliability and usability.

- Component adaptation : It is required because very rarely will components integrate immediately with the system. Depending on the component type, different strategies are used for adaptation or wrapping. The most common approaches are:
* White box wrapping : The implementation of the component is directly modified in order to resolve any incompatibilities. This is, obviously, only possible if the source code is available for a component, which is extremely unlikely in the case of COTS.
* Grey box wrapping : This relies on the component library providing a component extension language or API that enables conflicts to be removed or masked.
* Black box wrapping : This is the most common case, where access to source code is not available, and the only way the component can be adapted is by pre/post processing at the interface level.
It is the job of the software engineer to determine whether the effort required to wrap a component adequately is justified, or whether it would be “cheaper” to engineer a custom component which removes these conflicts.

- Component composition : The component composition is a task assembles qualified, adapted, and engineered components to populate the architecture established for an application. To accomplish this, an infrastructure must be established to bind the components into an operational system. The infrastructure provides a model for the coordination with one another and performs common tasks. Among the many mechanisms for creating an effective infrastructure is a set of four “architectural ingredients” that should be present to achieve component composition.
* Data exchange model: Mechanism that enables users and applications to interact and transfer data. The data exchange mechanisms not only allow human-to-software and component-to-component data transfer but also transfer among system resources.
* Automation: A variety of tools, macros, and scripts should be implemented to facilitate interaction between reusable components.
* Structured storage: Heterogeneous data contained in a “compound document” should be organized and accessed as a single data structure, rather than a collection of separate files.
* Underlying object model: The object model ensures that components developed in different programming languages that reside on different platforms can be interoperable.

- Component update : When systems are implemented with COTS components, update is complicated by the imposition of a third party. The organization that developed the reusable component may be outside the immediate control of the software engineering organization.

Domain Engineering in CBSE process

The CBSE process encompasses two concurrent sub-processes - domain engineering and component based development.
The intent of domain engineering is to identify, construct, catalog, and disseminate a set of software components that have applicability to existing and future software in a particular application domain. The overall goal is to establish mechanisms that enable software engineers to share these components—to reuse them.

The overall approach to domain analysis is often characterized within the context of object-oriented software engineering. The steps in the process are defined as follows:
- Define the domain to be investigated.
- Categorize the items extracted from the domain.
- Collect a representative sample of applications in the domain.
- Analyze each application in the sample and define analysis classes.
- Develop an analysis model for the classes.

Some of the domain characteristics are as follows :
- It is sometimes difficult to determine whether a potentially reusable component is applicable in a particular situation. A set of domain characteristics may be defined to make this determination.
- A domain characteristic is shared by all software within a domain. It defines generic attribute of all products that exist within the domain. E.g., generic characteristics might include: the importance of safety/reliability, programming language, concurrency in processing.

Examples of application domains are:
* Air traffic control systems
* Defense systems
* Financial market systems

Domain engineering begins by identifying the domain to be analyzed. This is achieved by examining existing applications and by consulting experts of the type of application you are aiming to develop. A domain model is then realized by identifying operations and relationships that recur across the domain and therefore being candidates for reuse. This model guides the software engineer to identify and categorize components, which will be subsequently implemented.

One particular approach to domain engineering is Structural Modelling. This is a pattern-based approach that works under the assumption that every application domain has repeating patterns. These patters may be in function, data, or behaviour that have reuse potential.

Software Components

Components are the software units that are context independent, both in the conceptual and the technical domain. A component contain a self-contained entity that exports functionality to its environment and also imports functionality from its environment using well defined and open interfaces. Components may support their integration into the surrounding by providing mechanics such as configuration functionality.

Software components can also be characterized based on their use in the CBSE process. In addition to COTS components, the CBSE process yields :
- Qualified components: Assessed by software engineers to ensure that not only functionality, but also performance, reliability, usability, and other quality factors conform to the requirements of the system/product to be built.
- Adapted components: Adapted to modify (wrapping) unwanted or undesired characteristics.
- Assembled components: integrated into an architectural style and interconnected with an appropriate component infrastructure that allows the components to be coordinated and managed effectively.
- Updated components: replacing existing software as new versions of components become available.

Software Component :
A software component simply cannot be differentiated from other software elements by the programming language used to implement the component. The difference lies in how software components are used.

Component model :
A component model defines specific interaction and composition standards. A component model implementation is the dedicated set of executable software elements required to support the execution of components that conform to the model. The standards have to contain a set of elements or services which are shown in the following alignment :
- Interfaces: Specification of component behavior and properties; definition of Interface Description Languages (IDL).
- Naming: Global unique names for interfaces and components.
- Meta data: Information about components, interfaces, and their relationships; APIs to services providing such information.
- Interoperability: Communication and data exchange among components from different vendors, implemented in different languages.
- Customization: Interfaces for customizing components.
- Composition: Interfaces and rules for combining components to create larger structures and for substituting and adding components to existing structures.
- Evolution Support: Rules and services for replacing components or interfaces with newer versions.
- Packing and Deployment: Packing implementation and resources needed for installing and configuring a component.

Reuse of Components :
The main purpose of software components is software reuse. The main types of software reuse are white-box reuse and black-box reuse. White-box reuse means that the source of a software component is fully available. Black-box reuse is based on the principle of information hiding. Software components are reused in general through black-box reuse. Components hide the inner working as much as possible.

Component Based Software Engineering (CBSE)

Component based software engineering (CBSE) offers inherent benefits in software quality, developer productivity, and overall system cost. And yet, many roadblocks remain to be overcome before the CBSE process model is widely used throughout the industry. The benefits of object-oriented design and component-based development seem obvious:
- Reusing software saves money in the development phase of software projects, i.e., the more components you reuse, the less software you have to build.
- The more applications in which you use a given component, the more valuable that component becomes.
- Reusable components enable application developers to customize applications without high costs and long development cycles.
- Reused software components have fewer bugs because they are used more often, and errors are uncovered and corrected along the way.

The goal of component-based software engineering is to increase the productivity, quality, and time-to-market in software development thanks to the deployment of both standard components and production automation. One important paradigm shift implied here is to build software systems from standard components rather than "reinventing the wheel" each time. This requires thinking in terms of system families rather than single systems. CBSE uses Software Engineering principles to apply the same idea as OOPs to the whole process of designing and constructing software systems. It focuses on reusing and adapting existing components, as opposed to just coding in a particular style. CBSE encourages the composition of software systems, as opposed to programming them.
CBSE should, in theory, allow software systems to be more easily assembled, and less costly to build. Although this cannot be guaranteed, the limited experience of adopting this strategy has shown it to be true. The software systems built using CBSE are not only simpler to bold and cheaper – but usually turn out to be more robust, adaptable and updateable.
CBSE allows use of predictable architectural patterns and standard software architecture leading to a higher quality end result.

The CBSE Process :
CBSE is in many ways similar to conventional or object-oriented software engineering. A software team establishes requirements for the system to be built using conventional requirements elicitation techniques. An architectural design is established. Here though, the process differs. Rather than a more detailed design task, the team now examines the requirements to determine what subset is directly amenable to composition, rather than construction. For each requirement, the team will ask:
- Are commercial off-the-shelf (COTS) components available to implement the requirement?
- Are internally developed reusable components available to implement the requirement?
- Are the interfaces for available components compatible within the architecture of the system to be built?
The team will attempt to modify or remove those system requirements that cannot be implemented with COTS or in-house components. The CBSE process identifies not only candidate components but also qualifies each components’ interface, adapts components to remove architectural mismatches, assembles components into selected architectural style, and updates components as requirements for the system change.
There are two processes that occur in parallel during the CBSE process. These are:
* Domain Engineering
* Component Based Development

CBSE Process

Restructuring Concepts

Software restructuring modifies source code and/or data in an effort to make it amenable to future changes. In general, restructuring does not modify the overall program architecture. It tends to focus on the design details of individual modules and on local data structures defined within modules. Reasons for restructuring :

1. The current code would have reached a stage where it is impossible to do any more modifications without breaking something else.
2. The requirements have changed so much that the current architecture/design cannot handle it without a redesign.
3. The current application doesn’t/will not scale, perform well enough as it wasn’t designed to handle the current load/anticipated growth.
4. The folks who worked on the code are no longer there and nobody knows what the code really does.
5. You don’t really like the current way it is implemented and think that there is a better way to do it using framework X or library Y.

Code Restructuring :
It is performed to yield a design that produces the same function as the original program but with higher quality. In general, code restructuring techniques model program logic using Boolean algebra and then apply a series of transformation rules that yield restructured logic. A resource exchange diagram maps each program module and the resources that are exchanged between it and other modules. By creating representations of resource flow, the program architecture can be restructured to achieve minimum coupling among modules.

Data Restructuring :
Before data restructuring begins, a reverse engineering activity called analysis of source code must be conducted. Once the data analysis is done, data redesign commences. A data record standardization step clarifies data definitions to achieve consistency among the data item names or physical record formats within an existing data structure or file format. Another form of redesign, called data name rationalization ensures that all data naming conventions conform to local standards and that aliases are eliminated as data flow through the system. When restructuring moves beyond standardization and rationalization, physical modifications to existing data structures are made to make the data design more effective.

Tuesday, September 15, 2009

Introduction to Reengineering

Re-engineering occurs at two different levels of abstraction. At the business level, re-engineering focuses on the business process with the intent of making changes to improve competitiveness in some area of the business. At the software level, re-engineering examines information systems and applications with the intent of restructuring or reconstructing them so that they exhibit higher quality.
Business process reengineering defines business goals, identifies and evaluates existing business processes, specifies and designs revised processes, and prototypes, refines, and instantiates them within a business.
Business process reengineering (BPR) has a focus that extends beyond software. The result of BPR is often the definition of ways in which information technologies can better support the business.
Software reengineering encompasses a series of activities that include inventory analysis, document restructuring, reverse engineering, program and data restructuring, and forward engineering. The intent of these activities is to create versions of existing programs that exhibit higher quality and better maintainability-programs that will be viable well into the next century.
Inventory analysis enables an organization to assess each application systematically, with the intent of determining which are candidates for reengineering. Document restructuring creates a framework of documentation that is necessary for the long-term support of an application. reverse engineering is the process of analyzing a program in an effort to extract data, architectural, and procedural design information. Finally, forward engineering reconstructs a program using modern software engineering practices and information learned during reverse engineering.
The cost/benefit of reengineering can be determined quantitatively. The cost of the status quo, that is, the cost associated with ongoing support and maintenance of an existing application, is compared to the projected costs of reengineering and the resultant reduction in maintenance costs. In almost every case in which a program has a long life and currently exhibits poor maintainability, reengineering represents a cost-effective strategy.

Software Reverse Engineering Techniques

Software reverse engineering is done to retrieve the source code of a program because the source code was lost, to study how the program performs certain operations, to improve the performance of a program, to fix a bug (correct an error in the program when the source code is not available), to identify malicious content in a program such as a virus or to adapt a program written for use with one microprocessor for use with another. Reverse engineering for the purpose of copying or duplicating programs may constitute a copyright violation. In some cases, the licensed use of software specifically prohibits reverse engineering.

- De compilers
These are programs which will convert object code back to high level languages such as C.
- Functional Analysis
The input and output states of a chip can be monitored using an oscilloscope, or special purpose probes such as logic state analyzers or protocol analyzers, to acquire a picture of the behavior of the chip over time or in response to input signals.
- Patents
Many patented goods are not sold with restrictive licenses, and hence a bonafide purchaser cannot usually be prevented by the patent from doing what they like with the patented product. Indeed, the patent itself may give the reverse engineer valuable information on how the patented product operates.
- Software anti-tamper technology
It is used to deter both reverse engineering and re-engineering of proprietary software and software-powered systems. In practice, two main types of reverse engineering emerge. In the first case, source code is already available for the software, but higher-level aspects of the program, perhaps poorly documented or documented but no longer valid, are discovered. In the second case, there is no source code available for the software, and any efforts towards discovering one possible source code for the software are regarded as reverse engineering.
- Analysis through observation of information exchange, most prevalent in protocol reverse engineering, which involves using bus analyzers and packet sniffers, for example, for accessing a computer bus or computer network connection and revealing the traffic data thereon. Bus or network behavior can then be analyzed to produce a stand-alone implementation that mimics that behavior.
- Disassembly using a disassembler, meaning the raw machine language of the program is read and understood in its own terms, only with the aid of machine-language mnemonics.

Hardware Reverse Engineering Techniques

Reverse engineering is taking apart an object to see how it works in order to duplicate or enhance the object. The practice, taken from older industries, is now frequently used on computer hardware and software.
Hardware reverse engineering involves taking apart a device to see how it works. In general, hardware reverse engineering requires a great deal of expertise and is quite expensive.

- REFAB (Reverse Engineering - Feature Based)
This tool uses a laser digitizer to scan the part, and the analysis software then analyzes the shape of the part, using features which are based on typical machining operations, to generate a computerized manufacturing description which can be displayed, used to copy the product, or produce new products using the design.

Computer vision has been widely used to scan PCBs for quality control and inspection purposes, and based on this, there are a number of machine vision for analysing and reverse engineering PCBs.

* The first step is to get through the encapsulating material into the product itself, by chemical etching or grinding.
* Once at the chip surface, each layer of components is photographed, then ground away to reveal the layer below. This process reveals the structure of the chip.
* Although these processes can reveal the structure of the chip, they do not indicate the voltages at each point. However, if the chip is undamaged, voltage contrast electron microscopy can be used to scan the chip in use, and watch the voltage level change over time.
These processes are generally referred to as "stripping" or "peeling" the chip.

Overview of Reverse Engineering

Reverse engineering is the attempt to recapture the top level specification by analyzing the product - I call it an "attempt", because it is not possible in practice, or even in theory, to recover everything in the original specification purely by studying the product.

Reverse engineering is difficult and time consuming, but it is getting easier all the time thanks to IT, for two reasons:
- Firstly, as engineering techniques themselves become more computerised, more of the design is due to the computer. Thus, recognisable blocks of code, or groups of circuit elements on a substrate, often occur in many different designs produced by the same computer program. These are easier to recognise and interpret than a customised product would be.
- Secondly, artificial intelligence techniques for pattern recognition, and for parsing and interpretation, have advanced to the point where these and other structures within a product can be recognized automatically.

Reverse engineering generally consists of the following stages:
1. Analysis of the product
2. Generation of an intermediate level product description
3. Human analysis of the product description to produce a specification
4. Generation of a new product using the specification.
There is thus a chain of events between the underlying design specification and any intermediate level design documents lying behind the product, through the product itself, through the reverse engineered product description, through the reverse engineered specification, and into the new product itself.

Reasons for reverse engineering:

- Interoperability.
- Lost documentation: Reverse engineering often is done because the documentation of a particular device has been lost (or was never written), and the person who built it is no longer available. Integrated circuits often seem to have been designed on obsolete, proprietary systems, which means that the only way to incorporate the functionality into new technology is to reverse-engineer the existing chip and then re-design it.
- Product analysis : To examine how a product works, what components it consists of, estimate costs, and identify potential patent infringement.
- Digital update/correction : To update the digital version (e.g. CAD model) of an object to match an "as-built" condition.
- Security auditing.
- Military or commercial espionage : Learning about an enemy's or competitor's latest research by stealing or capturing a prototype and dismantling it.
- Removal of copy protection, circumvention of access restrictions.
- Creation of unlicensed/unapproved duplicates.
- Academic/learning purposes.
- Curiosity.
- Competitive technical intelligence.
- Learning.

Monday, September 14, 2009

The Extended Memory (XMS)

The extended memory refers to memory above the first megabyte of address space in an IBM PC or compatible with an 80286 or later processor. The term is mainly used under the DOS and Windows operating systems. DOS programs, running in real mode or virtual x86 mode, cannot directly access this memory, but are able to do so through an application programming interface called the eXtended Memory Specification (XMS).
With the exception of the first 65,520 bytes, extended memory is not accessible to a PC when running in real mode. This means that under normal DOS operation, extended memory is not available at all; protected mode must be used to access extended memory.

Note: Extended memory is different from expanded memory (EMS), which uses bank switching and a page frame in the upper memory area to access memory over 1 MB.

There are two ways that extended memory is normally used. A true, full protected mode operating system like Windows NT, can access extended memory directly. However, operating systems or applications that run in real mode, including DOS programs that need access to extended memory, Windows 3.x, and also Windows 95, must coordinate their access to extended memory through the use of an extended memory manager. The most commonly used manager is HIMEM.SYS, which sets up extended memory according to the extended memory specification (XMS). XMS is the standard that PC programs use for accessing extended memory.
The main uses extended memory are:
- RAM-disks : It is a chunk of semiconductor memory that behaves like an ordinary disk but is extremely fast. It also loses its data instantly once power is turned off but is great for temporary files such as index files, extracted data from Lotus to be imported into another application etc.
- Disk caches : It is a program to speed up disk access by storing the most frequently use information in the computer's memory and reading ahead from the disk in anticipation.
- Print spoolers : It utilizes the computer's memory as a high speed buffer so that a fast computer is not slowed down by a slow printer. For example you can print a 100 page database report and then load a spread sheet program, print reports and graphs, then use your word processor while the database report is still printing. Print spoolers that use extended memory usually come with the memory card.
- OS/2 : The operating system OS/2 can make good use of extended memory.
- UNIX : UNIX is another operating system that can use extended memory.

Object Query Language (OQL)

Object Query Language (OQL) is a query language standard for object-oriented databases modelled after SQL. OQL was developed by the Object Data Management Group (ODMG). Because of its overall complexity no vendor has ever fully implemented the complete OQL. OQL has influenced the design of some of the newer query languages like JDOQL and EJB QL, but they can't be considered as different flavours of OQL, and should be treated separately.

The key Differences Between OQL and SQL :
- OQL supports object referencing within tables. Objects can be nested within objects.
- Not all SQL keywords are supported within OQL. Keywords that are not relevant to Netcool/Precision IP have been removed from the syntax.
- OQL can perform mathematical computations within OQL statements.

General Rules of OQL :
- All complete statements must be terminated by a semi-colon.
- A list of entries in OQL is usually separated by commas but not terminated by a comma.
- Strings of text are enclosed by matching quotation marks.

OQL was designed to be object-oriented. Queries are specified using objects and their attributes (data-members). Similarly, queries return sets of objects. The complex relationships in an object model can be easily navigated, using the same class-member paradigm, used by object-oriented programming languages. This can often lead to increased performance over SQL, where resource-consuming join processes are necessary to capture relationships. Another big advantage is, that table names and column names are not necessary in the query strings, because queries are formulated using class names and attribute names and no mapping knowledge is necessary in the application.
OQL may be used as an embedded language or as a standalone query language. Both of these are supported by OpenAccess. As an embedded language, OQL queries can be used directly in your application programs. Programs can embed OQL queries, and receive results in the native data types of the programming language being used. OQL statements are simply text strings, which means that the standard string representation of your programming language is used to express the query.

Simple query :
The following example illustrates how one might retrieve the CPU-speed of all PCs with more than 64MB of RAM from a fictional PC database:
SELECT pc.cpuspeed
WHERE pc.ram > 64

Query with grouping and aggregation :
The following example illustrates how one might retrieve the average amount of RAM on a PC, grouped by manufacturer:
SELECT manufacturer, AVG(SELECT part.pc.ram FROM partition part)
GROUP BY manufacturer: pc.manufacturer
Note the use of the keyword partition, as opposed to aggregation in traditional SQL.

Sunday, September 13, 2009

Object Definition language (ODL)

Object Definition Language (ODL) is the specification language defining the interface to object types conforming to the ODMG Object Model. Often abbreviated by the acronym ODL.
This language's purpose is to define the structure of an Entity-relationship diagram.

- Class Declarations
* interface < name > {elements = attributes, relationships,methods }
- Element Declarations
* attribute < type > < name >;
* relationship < rangetype >< name >;
- Method Example
* float gpa(in: Student) raises(noGrades)
# float = return type.
# in: indicates Student argument is read-only.
Other options: out, inout.
# noGrades is an exception that can be raised by method gpa.

ODL Type System :
- Basic types: int, real/ float, string, enumerated types, and classes.
- Type constructors: Struct for structures and four collection types: Set, Bag, List, and

ER versus ODL :
- E/R: arrow pointing to "one".
- ODL: don't use a collection type for relationship in the “many" class.
* Collection type remains in “one.”
- E/R: arrows in both directions.
- ODL: omit collection types in both directions.
- ODL only supports binary relationship.
- Convert multi-way relationships to binary and then represent in ODL.
- Create a new connecting entity set to represent the rows in the relationship set.
- Problems handling cardinality constraints properly!!

Keys in ODL :
- Indicate with key(s) following the class name, and a list of attributes forming the key.
* Several lists may be used to indicate several alternative keys.
* Parentheses group members of a key, and also group key to the declared keys.
* Thus, (key(a1; a2; : : : ; an )) = “one key consisting of all n attributes." (key a1; a2; : : : ; an ) =“each ai is a key by itself.
- Keys are not necessary for ODL. Object identity and not keys differentiates objects

Friday, September 11, 2009

Introduction to Data Binding

Data binding is the process that establishes a connection between the application UI and business logic. If the binding has the correct settings and the data provides the proper notifications, then, when the data changes its value, the elements that are bound to the data reflect changes automatically. Data binding can also mean that if an outer representation of the data in an element changes, then the underlying data can be automatically updated to reflect the change. A typical use of data binding is to place server or local configuration data into forms or other UI controls.

Basic Data Binding Concepts :
Data binding is based on a component architecture that consists of four major pieces : the data source object (DSO), data consumers, the binding agent, and the table repetition agent. Data source objects provide the data to a page, data-consuming HTML elements display the data, and the agents ensure that both the provider and the consumer are synchronized.

Direction of the Data Flow :
The data flow of a binding can go from the binding target to the binding source and/or from the binding source to the binding target.
- One Way binding causes changes to the source property to automatically update the target property, but changes to the target property are not propagated back to the source property. This type of binding is appropriate if the control being bound is implicitly read-only.
- Two Way binding causes changes to either the source property or the target property to automatically update the other. This type of binding is appropriate for editable forms or other fully-interactive UI scenarios. Most properties default to One Way binding, but some dependency properties default to Two Way binding.
- OneWayToSource is the reverse of OneWay binding; it updates the source property when the target property changes. One example scenario is if you only need to re-evaluate the source value from the UI.

Data Source Objects
To bind data to the elements of an HTML page in Windows Internet Explorer, a DSO must be present on that page. DSOs implement an open specification that leaves it up to the DSO developer to determine the following:
- How the data is transmitted to the page. A DSO can use any transport protocol it chooses. This might be a standard Internet protocol, such as HTTP or simple file I/O. A DSO also determines whether the transmission occurs synchronously or asynchronously. Asynchronous transmission is preferred, because it provides the most immediate interactivity to the user.
- How the data set is specified. A DSO might require an Open Database Connectivity (ODBC) connection string and an Structured Query Language (SQL) statement, or it might accept a simple URL.
- How the data is manipulated through scripts. Since the DSO maintains the data on the client, it also manages how the data is sorted and filtered.
- Whether updates are allowed.

Data Consumers
Data consumers are elements on the HTML page that are capable of rendering the data supplied by a DSO. Elements include many of those intrinsic to HTML, as well as custom objects implemented as Java applets or Microsoft ActiveX Controls.
A DSO typically exposes this functionality through an object model that is accessible to scripts.

Binding Agents
The binding and repetition agents are implemented by MSHTML.dll, the HTML viewer for Internet Explorer, and they work completely behind the scenes. When a page is first loaded, the binding agent finds the DSOs and the data consumers among those elements on the page. Once the binding agent recognizes all DSOs and data consumers, it maintains the synchronization of the data that flows between them.

Introduction to Database Concurrency

DATABASE CONCURRENCY: - Database concurrency is the particular situation when a single database is being accessed by multiple programs. Databases, by design in most cases are shared resources, but in this case, they are shared across multiple applications.
Database concurrency controls ensure that transactions occur in an ordered fashion. The main job of these controls is to protect transactions issued by different users/applications from the effects of each other. They must preserve the four characteristics of database transactions: atomicity, isolation, consistency and durability. Concurrency control is one of the main issues in the study of real time database systems. In addition to satisfying consistency requirements as in traditional database systems, a real time transaction processing system must also satisfy timing constraints.

Conflicts between transactions can be detected in two ways.
Pessimistic method detects conflicts before making access to the data object. When a transaction requests access to some data item, the concurrency control manager will examine this request and will determine whether to grant the request or not.
Optimistic schemes are designed to get rid of the locking overhead. They are optimistic in the sense that they take into account the explicit assumption that conflicts among transactions are rare events. The task of concurrency control is deferred until the end of transaction when some checking for potential conflicts has to take place and will be resolved accordingly, taking into consideration the amount of progress that has been done and the nature of conflict with transactions.
When concurrency control detects a conflict among some concurrent transactions accessing the same object, a conflict resolution mechanism needs to be put on. Concurrency control manager decides which transaction (victim) to penalize (the lock holder or the requester) and chooses an appropriate action and suitable timing. Two possible actions are most used: Blocking (wait) and abort (restart). In pessimistic concurrency control either blocking or abort can be used to resolve the conflict. However, in optimistic concurrency control only aborting is appropriate since conflict has been detected after the transaction has accessed the data object and performed some computation.

OPTIMISTIC CONCURRENCY CONTROL : The basic idea of an optimistic concurrency control mechanism is that the execution of a transaction consists of three phases: read, validation and write phases. For all optimistic concurrency control (OCC) schemes a conflict is detected after the data object has been accessed. In the OCC, conflict detection and resolution are both done at the certification time when a transaction completes its execution; it requests the concurrency control manager to validate all its accessed data objects. If it has not yet been marked for abort, it enters the commit phase where it writes all its updates to the database.

Introduction to Database Encryption

Encryption can provide strong security for data, but is that enough ? Data in a database can be accessed by many systems, but developing a database encryption strategy must take many factors into consideration. Where should the encryption be performed, for example — in the database, or in the application where the data originates? Who should have access to the encryption keys? How much data must be encrypted to provide security? What’s an acceptable trade-off between data security and application performance?
Data encryption is a process of converting stored or transmitted data to a coded form in order to prevent it from being read by unauthorized person. It is an application of a specific algorithm to alter the appearance of data, making it incomprehensible to those who are not authorized to see the information.
There are 2 types of encryption algorithm: -
- Secret key or Symmetric key algorithm: -In this encryption algorithm, a single secret or private key is shared between the sender and receiver. The sender encrypts this using this key and receiver decrypts it using the same key. It is highly assumed that no one else knows the key.
- Public key or Asymmetric key algorithm: - In this algorithm, every sender and receiver has a pair of keys. One is made public to the network and called public key and the other is kept private to that node called private key. The pair is such made that if the data is encrypted with one of the keys in the pair, it can only be decrypted with other key in the pair. When a sender has to send, it encrypts the data with receiver’s public key & the receiver decrypts it with its private key.

Advice on how to overcome some of the challenges in database encryption:
- Regulatory drivers : Advanced security through database encryption is required across many different sectors, and increasingly to comply with regulatory mandates.
One approach that can help companies address the encryption challenges associated with regulation is the defense-in-depth principle which advocates many layers to strong security – ranging from physical security and access controls to rights assignment and network security, including firewalls and, crucially, encryption of both data at rest and in transit.
- Overcoming key management issues
It is important that database encryption be accompanied by key management; however, statistics show that this is also the main barrier to database encryption. It is well-recognized that key use should be restricted and that key backup is extremely important. An additional best practice rule of encryption is that the encrypted key should never be stored alongside the data it was used to encrypt. Placing encryption keys within the HSM enforces this policy.
- Separation of duties and dual control
Many organizations pay close attention to separation of duties and dual control, which is required to pass audits to show that there are internal controls protecting against rogue administrators or unauthorized employees and is often required by the various regulatory requirements discussed above.

Thursday, September 10, 2009

Database Locking - Control concurrent access of the database

Locking is a procedure used to control concurrent access of the data. A lock may deny access to other transaction in order to prevent incorrect results (if multiple people are updating the same data).

Locks can be of two types: -
• Exclusive lock or Write lock
• Shared lock or Read lock

- Exclusive lock: Provides exclusive use of a data item to one transaction. Transaction has to be made exclusive to modify the value of data in a table. If the transaction has obtained an exclusive lock, then no other transaction can access the data item until the lock is released, including being able to read the data.

- Shared locks: Provides a read permission to the transaction. Any number of transactions can make shared lock & read the data item. This helps when the database is being read for multiple transactions, and putting a shared lock can be bad for business logic.

Basic rules for locking: -
• If a transaction has a read lock on the data item, it can read the item but cannot update it.
• If a transaction has a read lock, other transaction can obtain a read lock on the data item, but no write lock.
• If a transaction has a write lock, it can both read and update the data item.
• If a transaction has a write lock, then others can’t obtain either a read lock or a write lock on the data item.

TWO PHASE LOCKING (2PL):-One way to handle the concurrency control is 2PL mechanism.
Every transaction is divided into two phases: -
• A growing phase
• A shrinking phase or Contracting phase
In the growing phase, the transaction acquires all locks needed but can’t release any locks. The number of locks increases from zero to maximum for a transaction.
In the contracting phase, the number of locks held decreases from maximum to zero.
The transaction can acquire the locks, proceed with the execution & during the course of execution acquire additional locks as needed. But it never releases any lock until it has reached a stage where no new locks are required anymore.

2PL are of the following types: -
• Basic two-phase locking
• Conservative two-phase locking
• Strict two-phase locking
• Rigorous two-phase locking.

Lock Time-out :
Locks are held for the length of time needed to protect the resource at the level requested. If a connection attempts to acquire a lock that conflicts with a lock held by another connection, the connection attempting to acquire the lock is blocked until:
· The conflicting lock is freed and the connection acquires the lock it requested.
· The time-out interval for the connection expires.

Database Integrity

Database Integrity is the preservation of data correctly & implies the process of keeping the dbase away from accidental deletion or alteration.
There are following types of integrity constraints:-
• Entity integrity constraints
• Referential integrity constraints
• Domain integrity constraints

DATABASE SECURITY: - Database security is a measurement of confidence that the integrity of a system and its data will be preserved.
Database security is assigned to address the following issues:-
• Privacy of data elements
• Preserving policies of organization
• System related security level
• Maintaining integrity of the database

Data integrity can be compromised in a number of ways:
- Human errors when data is entered.
- Errors that occur when data is transmitted from one computer to another.
- Software bugs or viruses.
- Hardware malfunctions, such as disk crashes.
- Natural disasters, such as fires and floods.

There are many ways to minimize these threats to data integrity. These include:
- Backing up data regularly.
- Controlling access to data via security mechanisms.
- Designing user interfaces that prevent the input of invalid data.
- Using error detection and correction software when transmitting data.

* Declarative Ease
Define integrity constraints using SQL statements. For these reasons, declarative integrity constraints are preferable to application code and database triggers. The declarative approach is also better than using stored procedures, because the stored procedure solution to data integrity controls data access, but integrity constraints do not eliminate the flexibility of ad hoc data access.
* Centralized Rules
Integrity constraints are defined for tables (not an application) and are stored in the data dictionary. Any data entered by any application must adhere to the same integrity constraints associated with the table.
* Maximum Application Development Productivity
If a business rule enforced by an integrity constraint changes, then the administrator need only change that integrity constraint and all applications automatically adhere to the modified constraint.
* Superior Performance
The semantics of integrity constraint declarations are clearly defined, and performance optimizations are implemented for each specific declarative rule.
* Flexibility for Data Loads and Identification of Integrity Violations
You can disable integrity constraints temporarily so that large amounts of data can be loaded without the overhead of constraint checking.
* The Performance Cost of Integrity Constraints
The advantages of enforcing data integrity rules come with some loss in performance.

Overview of Database Security

Database security is the set of systems, processes, and procedures that protect a database from unintended activity. Unintended activity can be categorized as authenticated misuse, malicious attacks or inadvertent mistakes made by authorized individuals or processes. Database security is also a specialty within the broader discipline of computer security. The database is the entity where all the data is stored, so protecting it from unauthorized access and change is extremely critical.
Traditionally databases have been protected from external connections by firewalls or routers on the network perimeter with the database environment existing on the internal network opposed to being located within a demilitarized zone.
Database security can begin with the process of creation and publishing of appropriate security standards for the database environment. The standards may include specific controls for the various relevant database platforms; a set of best practices that cross over the platforms; and linkages of the standards to higher level polices and governmental regulations.
One of the easiest steps to take is regarding passwords. Default or weak passwords are still often used by enterprises to protect an online asset as important as a database, but it's a problem that's easy to fix. The remedy is enforcing a strong password policy; that is, passwords must be changed regularly and be at least 10 digits long and contain both alphanumeric characters and symbols.
SQL Injection attacks pose tremendous risks to web applications that depend upon a database back-end to generate dynamic content. In this type of attack, hackers manipulate a web application in an attempt to inject their own SQL commands into those issued by the database. To prevent this type of attack, it is essential to ensure that all user-supplied data is validated before letting it anywhere near your scripts, data access routines and SQL queries, and preferably use parametrized queries. Another reason to validate and clean data received from users is to prevent cross-site scripting (XSS) attacks, which can be used to compromise a database connected to a Web server.
A database security program should include the regular review of permissions granted to individually owned accounts and accounts used by automated processes. The accounts used by automated processes should have appropriate controls around password storage such as sufficient encryption and access controls to reduce the risk of compromise.
The software used for the database, for the middle layers and for all other layers should be updated regularly with patches, updates and fixes. Falling behind in this task is pretty painful if you end up exposing holes in the software to attackers (and attackers know that a number of companies do not upgrade their systems on an immediate basis).

Tuesday, September 8, 2009

Structured Query Language (SQL)

The structured query language (SQL) is the language used to query and manipulate information within a SQL Server database. SQL is actually an ISO and ANSI standardised language. However, a lot of RDBMS software use their own proprietary extensions within their own Transact-SQL (T-SQL) variant of SQL.

The basic building block of the structured query language is the SQL statement. Using statements, information in a database can be manipulated and queried.
* CREATE - a data structure.
* SELECT - read one or more rows from a table.
* INSERT - one or more rows into a table.
* DELETE - one or more rows from a table.
* UPDATE - change the column values in a row.
* DROP - a data structure.

Language Structure :
SQL is a keyword based language. Each statement begins with a unique keyword. SQL statements consist of clauses which begin with a keyword. SQL syntax is not case sensitive.
The other lexical elements of SQL statements are:
* names -- names of database elements: tables, columns, views, users, schemas; names must begin with a letter (a - z) and may contain digits (0 - 9) and underscore (_)
* literals -- quoted strings, numeric values, date time values.
* delimiters -- + - , ( ) = < > <= >= <> . * / || ? ;
Basic database objects (tables, views) can optionally be qualified by schema name. A dot -- ".", separates qualifiers: schema-name . table-name
Column names can be qualified by table name with optional schema qualification.

Syntax of Simple SELECT : SELECT column FROM tablename
- Using "Where"
WHERE SALARY >= 50000;
- Compound Conditions
- Using "IN"
WHERE POSITION IN ('Manager', 'Staff');
- Using "Between"
- Using "LIKE"

Introduction to Relational Databases

Relational databases are probably the most common type of database used for general-purpose tasks. In a relational database, information is grouped according to its type, generally in tables (see below). For example, in a database designed to hold fleet information you may include a table of employees and a table of vehicles.
- In addition to separating information according to its data structure, a relational database allows relationships to be created. A relationship defines a possible link between data types; the actual linkage of data is dependent upon the information held.
- Relational databases use the concept of normalization. Normalization is a design technique that minimizes the duplication of information. It also reduces the risk of errors. By using relationships, the duplication required can be lessened or eliminated completely.
A Relational model is the basis for any relational database management system (RDBMS). A relational model has mainly three components:
- A collection of objects or relations.
- Operators that act on the objects or relations.
- Data integrity methods.

Elements of a Relational Database Schema :
There are several key elements to a relational database. Each of these forms a part of the database's schema. The schema is the logical data model that determines the information that may be stored in the database and how it is to be arranged. To design a database we need three things:
- Table : A table is one of the most important ingredient to design the database. It is also known as a relation, and is a two dimensional structure used to hold related information. A database consists of one or more tables.
- Rows : A table contains rows. Rows are collection of instance of one thing.
- Columns : A table contains the columns. Columns contains all the information of a single type. Each column in a table is a category of information referred to as a field.
- Indexes : One of the greatest benefits of holding information in a database is the ability to quickly retrieve it. When querying a database, it is possible to apply criteria to ask for a specific set of rows.
- Keys : A primary key is a single column, or group of several columns (compound key), that can be used to uniquely identify rows in a table. Each table in a database may have a single primary key. Once defined, no two rows in the table may contain matching data in the primary key columns. Foreign keys are used when defining relationships between tables. A foreign key is a single column, or group of columns, in a table that reference the primary key in another table. This creates a link between the two tables.
- Constraints : Constraints are rules that are applied to the information in a database. These are usually used to enforce business rules upon the tabular data.
- Views : Views provide the useful concept of virtual tables. A view gathers specific information from one or more sources and presents it in the format of a single table. The information may be filtered within the view to remove unnecessary information.
- Stored Procedures : A stored procedure is a predefined set of statements that can be executed when required. Stored procedures provide the main means of creating programs within SQL Server databases.

Domain and Integrity Constraints :
* Domain Constraints
o limit the range of domain values of an attribute
o specify uniqueness and `nullness' of an attribute
o specify a default value for an attribute when no value is provided.
* Entity Integrity
o every tuple is uniquely identified by a unique non-null attribute, the primary key.
* Referential Integrity
o rows in different tables are correctly related by valid key values (`foreign' keys refer to primary keys).

Monday, September 7, 2009

Database Normalization

Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table).
Database normalization can essentially be defined as the practice of optimizing table structures. Optimization is accomplished as a result of a thorough investigation of the various pieces of data that will be stored within the database, in particular concentrating upon how this data is interrelated. An analysis of this data and its corresponding relationships is advantageous because it can result both in a substantial improvement in the speed in which the tables are queried, and in decreasing the chance that the database integrity could be compromised due to tedious maintenance procedures.

Why Normalize?
- To reduce redundancy : One obvious drawback of data repetition is that it consumes more space and resources than is necessary. Redundancy introduces the possibility for error.
- Unforeseen Scalability Issues : as a database grows in size, initial design decisions will continue to play a greater role in the speed of and resources allocated to this database.

The process towards database normalization progressing through a series of steps, typically known as Normal Forms.
- First Normal Form (1NF): A relation is in 1NF if and only if all underlying domains contain scalar values only. 1NF is often refered to the atomic rule. In a database, this means that each column should only be designed to hold one and only one piece of information.
- Second Normal Form (2NF): It further addresses the concept of removing duplicative data:
* Meet all the requirements of the first normal form.
* Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
* Create relationships between these new tables and their predecessors through the use of foreign keys.
- Third Normal Form (3NF): It goes one large step further:
* Meet all the requirements of the second normal form.
* Remove columns that are not dependent upon the primary key.
- Fourth Normal Form (4NF): It has one additional requirement:
* Meet all the requirements of the third normal form.
* A relation is in 4NF if it has no multi-valued dependencies.

Temporal Database Concepts

A temporal database is a database with built-in time aspects, e.g. a temporal data model and a temporal version of structured query language. More specifically the temporal aspects usually include valid-time and transaction-time. These attributes go together to form bitemporal data.
* Valid time denotes the time period during which a fact is true with respect to the real world.
* Transaction time is the time period during which a fact is stored in the database.
* Bitemporal data combines both Valid and Transaction Time.

- Temporal DBMS manages time-referenced data, and times are associated with database entities.
- Modeled reality.
- Database entities.
- Fact: any logical statement than can meaningfully be assigned a truth value, i.e., that is either true or false.
- Valid Time (vt).
- Valid time is the collected times when the fact is true.
- Possibly spanning the past, present & future.
- Every fact has a valid time.
- Transaction Time (tt).
- The time that a fact is current in the database.
- Maybe associated with any database entity, not only with facts.
- TT of an entity has a duration: from insertion to deletion.
- Deletion is pure logical operation.
- Time domain may be discrete or continuous.
- Typically assume that time domain is finite and discrete in database.
- Assume that time is totally ordered.
- Uniqueness of “NOW”.
- The current time is ever-increasing.
- All activities is happed at the current time.
- Current time separates the past from the future.
- “NOW” <> “HERE”.
- Time cannot be reused!
- A challenge to temporal database management.

Most applications of database technology are temporal in nature:
- Financial applications : portfolio management, accounting & banking.
- Record-keeping applications : personnel, medical record and inventory management.
- Scheduling applications : airline, car, hotel reservations and project management.
- Scientific applications : weather monitoring.

Facebook activity