Friday, May 29, 2015

ETL Testing Interview Questions and answers

ETL Testing Interview Questions and answers.

.1   What is a Data Warehouse?

:    A Data Warehouse is a collection of data marts representing historical data from different operational data source (OLTP). The data from these OLTP are structured and optimized for querying and data analysis in a Data Warehouse.

.2   What is a Data mart?

:   A Data Mart is a subset of a data warehouse that can provide data for reporting and analysis on a section, unit or a department like Sales Dept, HR Dept, etc. The Data Mart are sometimes also called as HPQS (Higher Performance Query Structure).

.3   What is OLAP?

:      OLAP stands for Online Analytical Processing. It uses database tables (Fact and Dimension tables) to enable multidimensional viewing, analysis and querying of large amount of data.

.4   What is OLTP?

:   OLTP stands for Online Transaction Processing Except data warehouse databases the other databases are OLTPs. These OLTP uses normalized schema structure. These OLTP databases are designed for recording the daily operations and transactions of a business.

.5   What are Dimensions?

:   Dimensions are categories by which summarized data can be viewed. For example a profit Fact table can be viewed by a time dimension.

.6   What are Confirmed Dimensions?

:    The Dimensions which are reusable and fixed in nature Example customer, time, geography dimensions.

.7   What are Fact Tables?

:    A Fact Table is a table that contains summarized numerical (facts) and historical data. This Fact Table has a foreign key-primary key relation with a dimension table. The Fact Table maintains the information in 3rd normal form.

            A star schema is defined is defined as a logical database design in which there will be a centrally located fact table which is surrounded by at least one or more dimension tables. This design is best suited for Data Warehouse or Data Mart.

.8    What are the types of Facts?

:    The types of Facts are as follows.

1.     Additive Facts: A Fact which can be summed up for any of the dimension available in the fact table.

2.     Semi-Additive Facts: A Fact which can be summed up to a few dimensions and not for all dimensions available in the fact table.

3.     Non-Additive Fact: A Fact which cannot be summed up for any of the dimensions available in the fact table.

.9   What are the types of Fact Tables?

:    The types of Fact Tables are:

1.     Cumulative Fact Table: This type of fact tables generally describes what was happened over the period of time. They contain additive facts.

2.     Snapshot Fact Table: This type of fact table deals with the particular period of time. They contain non-additive and semi-additive facts.

.10   What is Grain of Fact?

:   The Grain of Fact is defined as the level at which the fact information is stored in a fact table. This is also called as Fact Granularity or Fact Event Level.

.11   What is Factless Fact table?

:   The Fact Table which does not contains facts is called as Fact Table. Generally when we need to combine two data marts, then one data mart will have a fact less fact table and other one with common fact table.

.12   What are Measures?

:    Measures are numeric data based on columns in a fact table.

.13   What are Cubes?

:   Cubes are data processing units composed of fact tables and dimensions from the data warehouse. They provided multidimensional analysis.

.14   What are Virtual Cubes?

:     These are combination of one or more real cubes and require no disk space to store them. They store only definition and not the data.

.15   What is a Star schema design?

:     A Star schema is defined as a logical database design in which there will be a centrally located fact table which is surrounded by at least one or more dimension tables. This design is best suited for Data Warehouse or Data Mart.

.16   What is Snow Flake schema Design?

:      In a Snow Flake design the dimension table (de-normalized table) will be further divided into one or more dimensions (normalized tables) to organize the information in a better structural format. To design snow flake we should first design star schema design.

.17   What is Operational Data Store [ODS] ?

:     It is a collection of integrated databases designed to support operational monitoring. Unlike the OLTP databases, the data in the ODS are integrated, subject oriented and enterprise wide data.

.18   What is Denormalization?

:    Denormalization means a table with multi duplicate key. The dimension table follows Denormalization method with the technique of surrogate key.

.19   What is Surrogate Key?

:    A Surrogate Key is a sequence generated key which is assigned to be a primary key in the system (table).

Data Normalization

Data Normalization

In this section you will learn the concepts of data normalization. The most important thing in database designing is to make sure that the data get properly distributed among the tables. Simply we can say that the designing of  the table in proper manner is called Normalization.

Normalization is a process that is used in relational database design to organize the data for minimizing the duplication. In normalization, we divide the database in two or more tables and create a relationship between them. After isolating the data we perform some addition, deletion or modification on the fields of a table then we propagate and remove the duplicate data from the related tables.

The main goals of normalization process are: 
- To eliminate the redundancy of data
- To make sure that the data dependencies (relationship) make sense.
By these two goals we reduce the space occupied by the duplicate data in the database tables and ensure that the data is logically stored there.

Some of the positive points of the data normalization in database is as under :

Data integrity
To make optimized queries on the normalized tables and produce fast, efficient results.
To make faster index and also make a perfect sorting.
To increase the performance of the database.

First normal Form (1NF)
The First Normal Form requires the atomic values in each column. Atomic means the set of values are not available with in the column. In other words, in First Normal Form table must have at least one candidate key and make sure that the table don?t have any duplicate record. In First Normal Form repeating groups are not allowed, that is no attributes which occur a different number of times on different records.

Second normal Form (2NF)
The Second Normal Form can be achieved only when a table is in the 1NF. We can make the 2NF by eliminating the partial dependencies. As the First Normal Form deals with only the atomicity of data, but Second Normal Form deals with the relationships of tables like composite keys and non-key columns. In Second Normal Form subset of data is removed and is organized in separate tables. This process is applied to multiple rows of a table till the duplicity get reduced.

Third Normal Form (3NF)
The Third Normal Form can be achieved only when a table is in the Second Normal Form. We can make the 3NF by eliminating all transitive dependencies lying among the fields of a record. In Third Normal Form, all columns should depend on the primary key only i.e. remove the column, which is not dependent upon the primary key.
 
Boyce-Codd Normal Form (BCNF)
In the normalization Boyce - Codd Normal Form needs a table to meet the Third Normal Form. In Boyce - Codd Normal Form every non-trivial functional dependency must be a dependent on a superkey.

Fourth Normal Form (4NF)
The Fourth Normal Form can be achieved when a table meets the Boyce-Codd Normal Form. Fourth Normal Form confirms that the independent multivalued facts are efficiently and correctly represented in database design.

Fifth Normal Form (5NF)
The Fifth Normal Form is achieved only when a table meets the Fourth Normal Form. In this normalization it is ensured that all non-trivial join dependencies get eliminated.