Subscribe by Email

Friday, March 2, 2012

What are different interpreting data defects?

A software system or application can perform an assigned task only when it is capable of interpreting or analyzing the data. The proper analysis or interpretation of data is very much necessary for the proper execution of the task.
If the data interpretation itself is wrong, then you cannot expect the accurate results.

The interpretation of data involves the following steps:
- Inspection of data
- Cleaning of data
- Transformation of data
- Modeling of data

These steps are responsible for producing only the meaningful data with conclusions and any other supportive decisions. There are many approaches and facets of the data interpretation.

- Data analysis employs various different data interpretation techniques in different domains.
- Following are some data interpreting techniques:
1. Data mining
These techniques are focused up on the modeling of data as well as descriptive purposes.

2. Business intelligence
- This interpreting technique is suitable for heavy data bases where a lot of aggregation work is required.
- This basically used in the business domain.

3. Statistical Analysis
Further comprises of two techniques namely exploratory data analysis (EDA, discovers new features), descriptive analysis and CDA or confirmatory data analysis (responsible for proving the existing hypotheses wrong).

4. Predictive Analytic
It is employed for predictive forecasting.

5. Text Analytic
It is used for extraction and classification of the data from various sources.

Different data types employ different interpreting techniques. The data is classified in to the following categories:

1. Qualitative Data
Data denotes the presence or absence of a particular characteristic (passes/ fail).
2. Quantitative Data
Data is numerical either a continuous decimal number to a specified range or a whole counting number.
3. Categorical Data
Data from several different or similar categories.

The interpretation or analysis of data is not a simple process and indeed involves complex processes. And complex processes are very much prone to defects and errors.
- In a data interpretation process, defects can exist in every phase.
- Let us start from the first step of the process and discuss the defects as we move down in the process.
- Data cleaning involves the removal of erroneous data.
- If the program performing the task of data cleaning itself is diagnosed with some defect, then it can let in some erroneous data which in turn can cause many defects in the whole process.
- The changes made in data should be retrievable and should be documented.
- It is recommended that the data to be analyzed should be quality checked as soon as possible since the defective data is the cause of many defects in the interpretation process.
- There are several ways of checking the quality like:
# Descriptive statistics
# Normality
# Associations
# Frequency counts

- In some cases the values of data might be missing.
- This can also cause the whole interpretation process to hang up or falter or it can also come to a halt.
- In such a case the missing data can be imputed.
- Defects can occur if the data is not uniformly distributed.
- To determine this randomization procedure should be checked for its success.
- If you have not included a randomization procedure, you can use a non sampling randomization procedure.

There are some possible data distortions that also give rise to data interpreting defects:
1. Item Non Response
The data should be analyzed for this factor in the initial stage of the data analysis itself. The presence of randomization does not matter here.
2. Drop Out
Like item non response, the data is to be analyzed for this also in the beginning itself.
3. Quality Treatment
The bad quality of the data should be treated with various manipulation checks.

No comments:

Facebook activity