Software Product Development | Software Testing Tutorial

Saturday, February 27, 2010

Transparent Bridging

Transparent bridges are devices which connects more than one network segments with other bridges to make all routing decisions.Transparent bridges are sometimes called learning bridges. When they are turned on and receive data packets from a network segment they:
- Learn the relation between MAC address and segment/port, and
- Forward the packet to all other segments/ports.

There are two types of Transparent Bridge Modes:
- Store-and-Forward: Stores the entire frame and verifies the CRC before forwarding the frame. If a CRC error is detected, the frame is discarded.
- Cut-Through: Forwards the frame just after it reads the destination MAC address without performing a CRC check.

Transparent bridges are so named because their presence and operation are transparent to network hosts. When transparent bridges are powered on, they learn the workstation locations by analyzing the source address of incoming frames from all attached networks.

How Transparent Bridging Operates ?
The bridge uses its table as the basis for traffic forwarding. When a frame is received on one of the bridge’s interfaces, the bridge looks up the frame’s destination address in its internal table. If the table contains an association between the destination address and any of the bridge’s ports aside from the one
on which the frame was received, the frame is forwarded out the indicated port. If no association is found, the frame is flooded to all ports except the inbound port. Broadcasts and multi casts also are flooded in this way.
Transparent bridges successfully isolate intra-segment traffic, thereby reducing the traffic seen on each individual segment. This is called filtering and occurs when the source and destination MAC addresses reside on the same bridge interface.

Benefits of Transparent Bridging :
Transparent bridging can deliver many benefits including more efficient usage of finite network resources, improved responsiveness, reduced wait delays, fewer network errors resulting from collisions, plug ‘n’ play installation capabilities, reduced administrative managed requirements and overheads, superior network bandwidth allocation, reduced instances of the deleterious effects of broadcast storms, loop and redundancy issues.

Friday, February 26, 2010

Network Bridging

Bridging is a forwarding technique used in packet-switched computer networks. Unlike routing, bridging makes no assumptions about where in a network a particular address is located. Instead, it depends on flooding and examination of source addresses in received packet headers to locate unknown devices. Once a device has been located, its location is recorded in a table where the MAC address is stored so as to preclude the need for further broadcasting. The utility of bridging is limited by its dependence on flooding, and is thus only used in local area networks.
A network bridge is a device which connects two parts of a network together at the data link layer (layer 2 of the OSI model).

Situations Where Bridging Is Appropriate :
- Connecting Networks.
- Filtering/Traffic Shaping Firewall.
- Network Tap.
- Layer 2 VPN : Two Ethernet networks can be joined across an IP link by bridging the networks to an EtherIP tunnel or a tap(4) based solution such as OpenVPN.
- Layer 2 Redundancy : A network can be connected together with multiple links and use the Spanning Tree Protocol to block redundant paths.

Advantages :
- Self-configuring.
- Isolate collision domain.
- Reduce the size of collision domain by micro segmentation in non-switched networks.
- Transparent to protocols above the MAC layer.
- LANs interconnected are separate, and physical constraints such as number of stations, repeaters and segment length don't apply.
- Helps minimize bandwidth usage.

Disadvantages :
- It does not limit the scope of broadcasts.
- It does not scale to extremely large networks.
- Buffering and processing introduces delays.
- Bridges are more expensive than repeaters or hubs.

Thursday, February 25, 2010

Bridges - Basic Hardware Component

A bridge device filters data traffic at a network boundary. Bridges reduce the amount of traffic on a LAN by dividing it into two segments.A network bridge connects multiple network segments at the data link layer (layer 2) of the OSI model. Bridges do send broadcasts to all ports except the one on which the broadcast was received.

Bridges inspect incoming traffic and decide whether to forward or discard it. Bridges serve a similar function as switches, that also operate at Layer 2. Traditional bridges, though, support one network boundary, whereas switches usually offer four or more hardware ports. Switches are sometimes called "multi-port bridges" for this reason.

Bridges come in three basic types:
- Local bridges: Directly connect local area networks (LANs)
- Remote bridges: Can be used to create a wide area network (WAN) link between LANs. Remote bridges, where the connecting link is slower than the end networks, largely have been replaced with routers.
- Wireless bridges: Can be used to join LANs or connect remote stations to LANs

Today, bridges are slowly but surely falling out of favor. Ethernet switches offer similar functionality; they can provide logical divisions, or segments, in the network. In fact, switches are sometimes referred to as multi-port bridges because of the way they operate.

Wednesday, February 24, 2010

Network Interface Cards (NIC)

A Network Interface Card (NIC) provides the hardware interface between a computer and a network. A NIC technically is network adapter hardware in the form factor of an add-in card. Networked computers communicate with each other using a given protocol or agreed-upon language for transmitting data packets between the different machines, known as nodes. The network interface card acts as the liaison for the machine to both send and receive data on the LAN.
Some NIC cards work with wired connections while others are wireless. Most NICs support either wired Ethernet or WiFi wireless standards. In new computers, many NICs are now pre-installed by the manufacturer. NICs can be differentiated by their type of connectivity to the computer itself.

- 10/100 Ethernet : These cards are networking cards that are used most frequently in the home or small office setting. They are capable of speeds up to 10 or 100 megabits per second.
- Gigabit Ethernet :These NICs provide network transfer speeds of up to one Gigabit per second.
- Fiber Optics : These NICs use fiber optic cabling to reach speeds of 10 gigabits per second currently, with a specification under review to push this limit to 100 gigabits per second.
- Wireless NICs : These NICs provide the same networking capabilities as their wired counterparts, however, they have their own transfer capabilities. Speeds of 54 Mb/s are the most commonly available to wireless NICs without teaming several NICs together to combine bandwidths.
- Wireless Dongles : There is a wireless networking device used by individual machines that have access to a main computer that is connected to a wireless router. This wireless router allows the user to install wireless dongles instead of entire routers with each additional machine on the network.

Saturday, February 20, 2010

Bus Network Topology

A bus network is an arrangement in a local area network (LAN) in which each node (workstation or other device) is connected to a main cable or link called the bus. Bus networks are the simplest way to connect multiple clients, but may have problems when two clients want to transmit at the same time on the same bus. A true bus network is passive – the computers on the bus simply listen for a signal; they are not responsible for moving the signal along.
The bus topology makes the addition of new devices straightforward. The term used to describe clients is station or workstation in this type of network. Bus network topology uses a broadcast channel which means that all attached stations can hear every transmission and all stations have equal priority in using the network to transmit data.

Advantages :
* Easy to implement and extend.
* Well-suited for temporary or small networks not requiring high speeds (quick setup).
* Cheaper than other topologies.
* Cost effective; only a single cable is used.
* Easy identification of cable faults.
* Reduced weight due to fewer wires.

Disadvantages :
* Limited cable length and number of stations.
* If there is a problem with the cable, the entire network goes down.
* Maintenance costs may be higher in the long run.
* Performance degrades as additional computers are added or on heavy traffic.
* Proper termination is required.
* Significant Capacitive Load.
* It works best with limited number of nodes.
* It is slower than the other topologies.

Friday, February 19, 2010

In a ring network, every device has exactly two neighbors for communication purposes. All messages travel through a ring in the same direction (effectively either "clockwise" or "counterclockwise"). A failure in any cable or device breaks the loop and can take down the entire network. To implement a ring network we use the Token Ring technology. A Token is passed from one computer to another which enables each computer to have equal access to the network.
Because a ring topology provides only one pathway between any two nodes, ring networks may be disrupted by the failure of a single link. A node failure or cable break might isolate every node attached to the ring. Each packet is sent around the ring until it reaches its final destination. Today, the ring topology is seldom used.

Advantages of Ring Network :
* Very orderly network where every device has access to the token and the opportunity to transmit.
* Performs better than a star topology under heavy network load.
* Can create much larger network using Token Ring.
* Does not require network server to manage the connectivity between the computers.

Disadvantages of Ring Network :
* One malfunctioning workstation can create problems for the entire network.
* Moves, adds and changes of devices can affect the network.
* Much slower than an Ethernet network under normal load.

Thursday, February 18, 2010

Star Network Topology

Star Topology is the most common type of network topology that is used in homes and offices. In the Star Topology there is a central connection point called the hub which is a computer hub or sometimes just a switch.
In local area networks where the star topology is used, each machine is connected to a central hub. In contrast to the bus topology, the star topology allows each machine on the network to have a point to point connection to the central hub. All of the traffic which transverses the network passes through the central hub.

Advantages of Star Topology :
- A Star Network Topology is very easy to manage because of its simplicity in functionality.
- The problems can be easily located logically in a Star Topology and therefore is easy to troubleshoot also.
- The Star Topology is very simple in format so it is very easy to expand on the Star Topology.

Disadvantages of Star Topology :
- The Star Topology is fully dependent on the hub and the entire working of the network depends on the hub or the switch.
- If there are many nodes and the cable is long then the network may slow down.

Extended Star Network :
A type of network topology in which a network that is based upon the physical star topology has one or more repeaters between the central node and the peripheral or 'spoke' nodes, the repeaters being used to extend the maximum transmission distance of the point-to-point links between the central node and the peripheral nodes.

Distributed Star Topology :
A type of network topology that is composed of individual networks that are based upon the physical star topology connected together in a linear fashion – i.e., 'daisy-chained' – with no central or top level connection point (e.g., two or more 'stacked' hubs, along with their associated star connected nodes or 'spokes').

Wednesday, February 17, 2010

Hierarchical or Tree Network Topology

In its simplest form, only hub devices connect directly to the tree bus, and each hub functions as the "root" of a tree of devices. This bus/star hybrid approach supports future expandability of the network much better than a bus (limited in the number of devices due to the broadcast traffic it generates) or a star (limited by the number of hub connection points) alone.
This type of topology suffers from the same centralization flaw as the Star Topology. If the device that is on top of the chain fails, consider the entire network down.Obviously this is impractical and not used a great deal in real applications.Each node in the network having a specific fixed number, of nodes connected to it at the next lower level in the hierarchy, the number, being referred to as the 'branching factor' of the hierarchical tree.

- A network that is based upon the physical hierarchical topology must have at least three levels in the hierarchy of the tree, since a network with a central 'root' node and only one hierarchical level below it would exhibit the physical topology of a star.
- The total number of point-to-point links in a network that is based upon the physical hierarchical topology will be one less than the total number of nodes in the network.
- If the nodes in a network that is based upon the physical hierarchical topology are required to perform any processing upon the data that is transmitted between nodes in the network, the nodes that are at higher levels in the hierarchy will be required to perform more processing operations on behalf of other nodes than the nodes that are lower in the hierarchy. Such a type of network topology is very useful and highly recommended.

Tuesday, February 16, 2010

Topology in Distributed systems

The sites in the system can be connected physically in a variety of ways. While choosing a topology, following criteria should be kept in mind : Basic cost, Communication cost and Reliability.

Mesh Networks : The value of fully meshed networks is proportional to the exponent of the number of subscribers, assuming that communicating groups of any two endpoints, up to and including all the endpoints.

- Fully Connected Networks
The fully-connected connected network topology, also referred to as a mesh topology, requires that all the terminals be connected to all the other terminals, as it's name implies.
Advantages: A fault in one terminal on the network will not effect the rest, as the data has multiple redundancy paths, depending on the size of the network, that are open to it.When network usage is high, data packets can be transmitted via different cables, thereby reducing network clogging - keeping data transfer rates at an acceptable level.
Disadvantage : A large amount of cabling is required.

- Partially connected Networks
The type of network topology in which some of the nodes of the network are connected to more than one other node in the network with a point-to-point link â€“ this makes it possible to take advantage of some of the redundancy that is provided by a physical fully connected mesh topology without the expense and complexity required for a connection between every node in the network.
In most practical networks that are based upon the physical partially connected mesh topology, all of the data that is transmitted between nodes in the network takes the shortest path (or an approximation of the shortest path) between nodes, except in the case of a failure or break in one of the links, in which case the data takes an alternative path to the destination.

Monday, February 15, 2010

Overview Of Distributed Systems

A distributed system is a collection of processors that do not share memory or clock. Each of the process has its own clock and memory and the processors communicate with each other through various communication lines. These processors are referred to by different names such as sites, machines, hosts, nodes, computers and so on.
A distributed system provides the user with access to various resources that the system maintains. A distributed system must provide various mechanisms for process synchronization and communication, for dealing with the deadlock problem, and other failures which are not encountered in a centralized system.

There are 4 reasons for building distributed systems :
- Resource sharing : If a number of different sites are connected to one another, then a user at one site may be able to use the resources available at another.
- Computation speedup : If a computation can be partitioned into sub computations that can run concurrently, availability of a distributed system may allow us to distribute the computation among various sites, to run the computation concurrently.
- Reliability : If one site fails in distributed system, the remaining sites can potentially continue operating. The failure of the site must be detected by the system and the appropriate action may be needed to recover from failure.
- Communication : Information can be exchanged when several sites are connected to one another by a communication network.

The advantage of a distributed system is that these functions can be carried over great distances.

Sunday, February 14, 2010

Hierarchical Storage Management (HSM)

Hierarchical Storage Management (HSM) is a data storage technique which automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as hard disk drive arrays, are more expensive (per byte stored) than slower devices, such as optical discs and magnetic tape drives. While it would be ideal to have all data available on high-speed devices all the time, this is prohibitively expensive for many organizations.

Instead, HSM systems store the bulk of the enterprise's data on slower devices, and then copy data to faster disk drives when needed. In effect, HSM turns the fast disk drives into caches for the slower mass storage devices. The HSM system monitors the way data is used and makes best guesses as to which data can safely be moved to slower devices and which data should stay on the fast devices.

A hierarchical storage system extends the storage hierarchy beyond primary memory and secondary storage to incorporate tertiary storage — usually implemented as a
jukebox of tapes or removable disks.
It usually incorporates tertiary storage by extending the file system.
* Small and frequently used files remain on disk.
* Large, old, inactive files are archived to the jukebox.
HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data.

Saturday, February 13, 2010

How does the operating system does its job when the storage media are removable ?

Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications.Most operating system handles removable disks almost exactly as they do the fixed disks.A new cartridge is formatted and an empty file system is generated on the disk.

Tapes are often handled differently. Tapes are presented as a raw storage medium, i.e., an application does not not open a file on the tape, it opens the whole tape drive as a raw device.Usually the tape drive is reserved for the exclusive use of that application. The operating system does not provide file-system services when tape drive is presented as a raw device.Since the OS does not provide file system services, the application must decide how to use the array of blocks. Since every application makes up its own rules for how to organize a tape, a tape full of data can generally only be used by the program that created it.

The basic operations of tape drives differ from the operations of disk drive.
* locate positions the tape to a specific logical block, not an entire track (corresponds to seek).
* read position operation returns the logical block number where the tape head is.
* The space operation enables relative motion.

Tape drives are “append-only” devices; updating a block in the middle of the tape also effectively erases everything beyond that block. An EOT (End of tape) mark is placed after a block that is written.

The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer, and then use the cartridge in another computer. Contemporary OS generally leave the name space problem
unsolved for removable media, and depend on applications and users to figure out how to access and interpret the data. Some kinds of removable media (e.g., CDs) are so well standardized that all computers use them the same way.

Friday, February 12, 2010

Performance Issues of Tertiary storage

There are three aspects of tertiary-storage performance :
- Speed : There are two aspects of speed in tertiary storage are bandwidth and
latency.
Bandwidth is measured in bytes per second.
* Sustained bandwidth – average data rate during a large transfer; # of bytes/transfer time.Data rate when the data stream is actually flowing.
* Effective bandwidth – average over the entire I/O time, including seek or locate, and cartridge switching. It drive’s overall data rate.

Access latency is the amount of time needed to locate data.
* Access time for a disk – It moves the arm to the selected cylinder and wait for the rotational latency; < 35 milliseconds.
* Access on tape requires winding the tape reels until the selected block reaches the tape head; tens or hundreds of seconds.
* Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk.
The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives. A removable library is best devoted to the storage of infrequently used data, because the library can only satisfy a relatively small number of I/O requests per hour.

- Reliability : A fixed disk drive is likely to be more reliable than a removable
disk or tape drive. An optical cartridge is likely to be more reliable than a
magnetic disk or tape. A head crash in a fixed hard disk generally destroys the data, whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed.

- Cost : The main memory is much more expensive than disk storage. The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive. The cheapest tape drives and the cheapest disk drives have
had about the same storage capacity over the years. Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives.

Thursday, February 11, 2010

Magnetic tape Data storage

Magnetic type is another type of removable medium. A tape is less expensive than an optical or magnetic disk, and the tape holds more data. The device that performs actual writing or reading of data is a tape drive.Tape drives and disk drives have similar transfer rates but random access to a tape is much slower than a disk seek.

Where are tapes used ?
- Tapes are commonly used to store backup copies of disk data.
- They are also used in large supercomputer centers to hold the enormous volumes of data.

Auto-loaders and tape libraries are frequently used to automate cartridge handling.
When storing large amounts of data, tape can be substantially less expensive than disk or other data storage options. Tape storage has always been used with large computer systems. The surface area of a tape is usually much larger than the surface area of disk.
A robotic tape changer is used for large tape installations. It lowers the overall cost of a data storage. A disk resident file that will not be needed for a while can be archived to tape, where the cost per megabyte is substantially lower; if the file is needed in the future, the computer can stage it back into disk storage for active use. A robotic tape library is sometimes called near-line storage, since it is between the high performance of on-line magnetic disks and the low cost of off-line tapes sitting on shelves in a storage room.

Wednesday, February 10, 2010

Tertiary Storage Devices - Removable Disks

Tertiary storage or tertiary memory, provides a third level of storage. Typically it involves a robotic mechanism which will mount (insert) and dismount removable mass storage media into a storage device according to the system's demands; this data is often copied to secondary storage before use. It is primarily used for archival of rarely accessed information since it is much slower than secondary storage (e.g. 5–60 seconds vs. 1-10 milliseconds). This is primarily useful for extraordinarily large data stores, accessed without human operators.
Examples : Removable disks, magnetic tapes, CD-ROMs etc.

Removable Disks : A removable disk is a type of media that enables a user to easily move data between computers without having to open their computer. Examples :

* Floppy diskettes : A Floppy Disk Drive, or FDD for short, is a computer disk drive that enables a user to easily save data to removable diskettes. They are made from a thin flexible disk coated with magnetic material, enclosed in a protective plastic case/
* CD disc/DVD disc/Blu-ray disc : A compact disc is a flat round storage medium that is read by a laser in a CD-ROM drive. The standard CD is capable of holding 72 minutes of music or 650 MB of data. 80 minute CDs are also commonly used to store data and are capable of containing 700 MB of data.
Digital Versatile Disc or Digital Video Disc, DVD or DVD-ROM is a type of disc drive that allows for large amounts of data on one disc the size of a standard Compact Disc.
Blu-ray Disc, BD or BD-ROM is an optical disc that is capable of storing up to 25 GB on a single layer disc and 50 GB on a dual layer disc.
* Tape drive cartridges : A magnetically thin coated piece plastic wrapped around wheels that is capable of storing data. Tape is much less expensive than other storage mediums but commonly a much slower solution that is commonly used for backup.
* Thumb drives : It is a portable drive that is often the size of your thumb that connects to the computer USB port. Today flash drives are available in various sizes including but not limited to 256MB, 512MB, 1GB, 5GB, 16GB, and beyond and are widely used as an easy and small way to transfer AND store information from their computer.

Tuesday, February 9, 2010

Hybrid networks use a combination of any two or more topologies in such a way that the resulting network does not exhibit one of the standard topologies (e.g., bus, star, ring, etc.).
A standard hybrid network uses something called a hybrid access point, a networking device that both broadcasts a wireless signal and contains wired access ports. The most common hybrid access point is a hybrid router. The typical hybrid router broadcasts a Wi-Fi signal using 802.11 a, b or g and contains four Ethernet ports for connecting wired devices. The hybrid router also has a port for connecting to a cable or DSL modem via Ethernet cable.
There are several different possible network configurations for a hybrid network. The most basic configuration has all the wired devices plugged into the Ethernet ports of the hybrid router. Then the wireless devices communicate with the wired devices via the wireless router.
Computers aren't the only devices that can be linked over a hybrid network. You can now buy both wired and wireless peripheral devices like printers, Web cams and fax machines.

Hybrid network are: star ring network and star bus network
* A Star ring network consists of two or more star topologies connected using a multi-station access unit (MAU) as a centralized hub.
* A Star Bus network consists of two or more star topologies connected using a bus trunk (the bus trunk serves as the network's backbone).

While grid networks have found popularity in high-performance computing applications, some systems have used genetic algorithms to design custom networks that have the fewest possible hops in between different nodes. Some of the resulting layouts are nearly incomprehensible, although they function quite well.

Monday, February 8, 2010

Ensure that you do a post-mortem (or even a minimum introspection) at every milestone

Not every team goes in for something like Scrum; there are some problems with Scrum implementation, especially when you need to plan across multiple Sprint cycles. So, teams will stick to their existing methods of development, such as Waterfall or Incremental / Iterative modes of development.
However, for ensuring that your software development cycle is as effective as it can be, you should ensure that you are getting frequent updates from your team. If you are the project manager / program manager of the team, or the developer or QE manager, it is required that you should set up periodic post-mortems for the team.
Collecting feedback from the team members is an important part of the overall way to help your processes become more efficient and effective. From your location, everything may seem to be fine, but unless you talk to the people in the trenches, you will never find out whether there are issues that are under the radar, or if there are improvements that can be made in the existing processes.
Also, if you hold periodic sessions of introspection and show action, then it makes the team feel that their thoughts and feedback is being incorporated, and they also start thinking in terms of how to improve the overall functioning of the team.

Friday, February 5, 2010

Implementation techniques and Remote File Access for Distributed File Systems

Implementation of transparent naming requires a provision for the mapping of a file name to the associated location. Keeping this mapping manageable calls for aggregating sets of files into component units, and providing the mapping on a component unit basis rather than on a single file basis. To enhance the availability of the crucial mapping information, methods like replication, local caching, or both can be used.
A non-transparent mapping technique:
name ----> < system, disk, cylinder, sector >
A transparent mapping technique:
name ----> file_identifier ----> < system, disk, cylinder, sector >
So, when changing the physical location of a file, only the file identifier need be modified. This identifier must be "unique" in the universe.

Remote File Access
In remote-service mechanism, requests for accesses are delivered to the server, the server machine performs the accesses, and their results are forwarded back to the user. The remote-service method is analogous to performing a disk access for each access request. To ensure reasonable performance of a remote-service mechanism, caching is used. In DFSs, the goal is to reduce both network traffic and disk I/O.

Thursday, February 4, 2010

Naming Schemes of Distributed file systems

Large networks often use a systematic naming scheme, such as using a location (e.g. a department) plus a purpose to generate a name for a computer. However, smaller networks will frequently use a more personalized naming scheme to keep track of the many hosts. Network naming can be hierarchical in nature, such as the Internet's Domain Name System. Indeed, the Internet employs several universally applicable naming methods: Uniform Resource Names (URN), Uniform Resource Locators (URL), and Uniform Resource Identifiers (URI).
A good naming scheme is scalable, unique, and easy to remember. The purpose of these naming schemes is to name network computers, but it can also be used to name projects, variables, streets, pets, kids, or any other project where unique names and remember able names are required.

There are three main approaches to naming files:

- Files are named with a combination of host and local name.
* This guarantees a unique name.
* It is neither location transparent nor location independent.
* Same naming works on local and remote files.
- Remote directories are mounted to local directories.
* A local system seems to have a coherent directory structure.
* The remote directories must be explicitly mounted. The files are location independent.
* SUN (Network File System)NFS is a good example of this technique.
- A single global name structure spans all the files in the system.
* The DFS is built in the same way as a local file system.
* It is location independent.

Difference between Location independence and Static location transparency

There are few aspects that differentiates location independence and static location transparency :
- In location independence, divorcing data from location provides better abstraction fro files. Location independent files can be viewed as logical data containers that are not attached to specific storage location.
In static location transparency, the file name still denotes a specific, although hidden, set of physical disk blocks.

- In static location transparency, users can share remote files by simple naming the files in a location transparent manner, as though the files are local but nevertheless, logical names are still attached to physical storage devices.
Location independence promotes sharing the storage space as well as the data objects.
A possible benefit of such a view is the ability to balance the utilization of disks across the system.

- Location independence separates the naming hierarchy from the storage devices hierarchy and from the inter-computer structure.
In static location transparency, the correspondence between component units machines can be easily exposed.

Once the separation of name and location has been completed, files residing on remote server systems may be accessed by different clients. In fact, these clients may be diskless and rely on servers to provide all files, including the operating system kernel.

Wednesday, February 3, 2010

Naming and Transparency in Distributed File Systems

Naming is the mapping between logical and physical objects. In a conventional file system, it's understood where the file actually resides; the system and disk are known. In a transparent DFS, the location of a file, somewhere in the network, is hidden. File replication means multiple copies of a file; mapping returns a SET of locations for the replicas.

- Location transparency -
* The name of a file does not reveal any hint of the file's physical storage location.
* File name still denotes a specific, although hidden, set of physical disk blocks.
* This is a convenient way to share data.
* It can expose correspondence between component units and machines.

- Location independence :
* The name of a file doesn't need to be changed when the file's physical storage location changes.Dynamic, one-to-many mapping.
* Better file abstraction.
* Promotes sharing the storage space itself.
* Separates the naming hierarchy from the storage devices hierarchy.

A location independent naming scheme is a dynamic mapping, since it can map the same file name to different locations at two different times. Therefore, location independence is a stronger property than location transparency.
Most DFSs today support location transparent systems. They do not support migration and files are permanently associated with specific disk blocks.

Overview of Distributed File Systems (DFS)

A distributed file system or network file system is any file system that allows access to files from multiple hosts sharing via a computer network. This makes it possible for multiple users on multiple machines to share files and storage resources.
In order to understand the structure of a distributed file system, the terms service, server and client should be defined. A service is a software entity running on one or more machines and providing a particular type of function. A server is the service software running on a single machine. A client is a process that can invoke a service using a set of operations that forms its client interface.
A distributed file system (DFS) is a file system whose clients, servers, and storage devices are dispersed among the machines of a distributed system. A service activity has to be carried out across the network, and instead of a single centralized data repository, there are multiple and independent storage devices. the distinctive features of a DFS are the multiplicity and autonomy of clients and servers in the system.
A DFS should look to its clients like a conventional, centralized file system. The client interface of a DFS should not distinguish between local and remote files. The most important performance measurement of a DFS is the amount of time needed to satisfy various service requests. In a DFS, a remote access has the additional overhead attributed to the distributed structure. This overhead includes the time needed to deliver the request to the server, as well as the time for getting the response across the network back to the client. DFS manages a set of dispersed storage devices which is the DFS's key distinguishing feature.

Tuesday, February 2, 2010

How to improve disk reliability ?

Disk used to be the least reliable component of the system. They still have relatively high failures and it causes loss of data and significant downtime. It takes hours to recover from a disk crash. Improving the reliability of disk systems is very important and several improvements in disk-use techniques have been proposed.
To improve speed, disk stripping uses a group of disks as one storage unit. Each data block is broken into several sub blocks, with one sub block stored on each disk. The time required to transfer a block into memory improves because all the disks transfer their sub blocks in parallel. If the disks have their rotation synchronized, the performance improves because all the disks become ready to transfer their sub blocks at the same time rather than waiting for the slowest rotational latency. This organization is usually called a redundant array of independent disks (RAID).

The simplest RAID organization, called mirroring or shadowing, just keeps a duplicate copy of each disk. This solution is costly but it is about twice as fast when reading, because half of the read requests can be sent to each disk.
Block Interleaved Parity, another RAID organization uses much less redundancy. A small fraction of disk space is used to hold parity blocks. Each bit position in the parity block would contain the parity for the corresponding bit positions in each of the 8 data blocks. If one disk block becomes bad, all its data bits are essentially erased but can be recomputed from other data blocks plus parity block.

A parity RAID system has the combined speed of multiple disks and controllers. But the performance is an issue during writing, because updating any single data sub block forces the corresponding parity sub block to be recomputed and rewritten.

Where is swap space located ?

Virtual memory uses disk space as an extension of main memory. Since disk access is much slower than memory access, using swap space has a large effect on system performance.

Swap space can reside at two places :
- It can be carved out of the normal file system : In this case, normal file system routines can be used to create it, name it, and to allocate its space. This approach is easy to implement but inefficient because navigating the directory structure and data structures takes time, and extra disk access. In addition to this, external fragmentation greatly increases swapping times. Performance can be improved by caching the block location information in physical memory but the cost of traversing the file system data structures still remains.

- It can be on a separate disk partition : There is no directory structure or file system placed on this space, a separate swap-space storage manager is used to allocate and deallocate the blocks. This manager uses algorithms that are optimized for speed, rather than for storage efficiency. Internal fragmentation may increase but this is acceptable as data in swap space resides for much shorter period. This approach creates a fixed amount of swap space during disk partitioning, adding more space can only be done via repartitioning of the disk.

Monday, February 1, 2010

How to use swap - space ?

Swap space is an area on a high-speed storage device (almost always a disk drive), reserved for use by the virtual memory system for deactivation and paging processes. At least one swap device (primary swap) must be present on the system. Virtual memory uses disk space as an extension of main memory and since memory access is faster than disk access, using swap space has a large effect on system performance.

It is perfectly normal for the swap file or page file to grow in size, sometimes growing several hundred megs in size. Below is a listing of common Microsoft operating system swap file information; however, it is important to realize that this information may vary. Finally, by default the swap files are hidden.

Operating system Swap file name Location

Windows 3.x 386PART.PAR C:\WINDOWS
Windows 95 / 98 / ME WIN386.SWP C:\
Windows NT / 2000 / XP PAGEFILE.SYS C:\

Swap space is used in various ways by different operating systems :
- Systems implementing swapping may use swap space to hold entire process image, including the code and data segments.
- The pages that are moved out of main memory can be stored by paging systems.
- Some operating systems like UNIX allow use of multiple swap spaces. These swap spaces are generally put on separate disks, so the load placed on the I/O system by paging and swapping can be spread over the system's I/O devices.

The amount of swap space needed on a system can vary depending on the amount of physical memory, the amount of virtual memory it is backing, and the way in which the virtual memory is used.