Tuesday, July 7, 2015

Sunday, June 7, 2015

BI DWH OLTP DRAFT

Introduction to Basic BI Concepts
What is Business Intelligence
Definitions
•Howard Dresner’s definition - business intelligence is an umbrella term to describe ‘concepts and methods to improve business decision making by using fact-based support systems’
•Business intelligence (BI) is the set of techniques and tools that transform raw data into meaningful and useful information for the purpose of business analysis

What can we understand from the Definitions?
•concepts and methods / techniques and tools
–BI is more than a set of tools and technologies
•business decision / business analysis
–BI is driven by the need of business to take informed decisions
•fact-based / raw data  useful information
–Analysis is based on raw data / facts

What is data / “raw facts”
Data / Raw Facts
•Facts generated by operational workflow of a business process
–This training has generated data
»Topic, trainer name, date, duration, info about participants …
•Facts generated by non-business events
–Weather information
–Information gathered by traffic sensor
•Can be digital, paper, structured, unstructured, semi-structured
•Can be recorded somewhere or just in a person’s head


Where are raw facts / data created?

OLTP System
•An application or modules of application that automates a specific area of business operations workflow, such as pre-sales, sales, procurement, r&d, accounting etc.
•Implements current business rules that govern operations
•Focuses on managing current data slice rather than historical data
–historical data may be purged / archived to improve database performance or to reduce data storage costs or implement new business rules that earlier data does not comply with
•Focuses on managing large volumes of simultaneous transactions, involving low volumes of data per transaction
•May support operational reporting, but full-fledged analysis is not the core purpose
•Data repository usually RDBMS (exceptions are mainframe / COBOL applications etc. & non-transaction data such as ATM logs, network signals, traffic signal data, etc.). Data model is usually normalized


Saturday, June 6, 2015

OLTP - On-line Transaction Processing

Introduction to OLTP:


What is OLTP?

Definition:
OLTP stand for "On-line Transaction Processing", it is an operational database that supports or facilitates high transaction-oriented applications. Operational data means those data occupied in the daily transactions from a particular system.

It is used in traditional database that includes insert, update, and delete operations, while also perform to obtain information using on inquiry requirement.

Example of OLTP:
Banks, Airlines, Retailers, Railways etc.

Data warehouse (DWH) Introduction - Design & Concepts

Data warehouse (DW or DWH) also known as an enterprise data warehouse (EDW)


What is Data warehouse (DW or DWH):

There is a lot of different definitions for a data warehouse. The simple and popular definition was first defined by Bill Inmon in 1990.

“A data warehouse is a subject-oriented, integrated, time-varying, non-volatile collection of data in support of the management's decision-making process."
-- Bill Inmon

Subject-oriented: Used to analyze a particular subject area. For example, Sales, Customers, Patients, Students, Products etc.

Integrated: A data warehouse integrates, consistent naming conventions, formats, encoding structures from multiple data sources into a consistent format. Data should be free of issue such as naming conflicts and variation among units of measure.

Time-Variant: Historical data is stored in a data warehouse i.g. trends and changes for reporting. For example, Query of data can be from any time of period. OLTP may hold the most recent records, whereas data warehouse can hold all records.

Non-volatile: Data once entered into the warehouse, data is never change or delete. The data is fixed, read-only, and retained to generate report and it enables you to analyze what has occurred.

A single-subject data warehouse is typically referred to as a data mart, while data warehouses are generally enterprise in scope. Also, data warehouses can be volatile. Due to the large amount of storage required for a data warehouse, (multi-terabyte data warehouses are not uncommon), only a certain number of periods of history are kept in the warehouse. For instance, if three years of data are decided on and loaded into the warehouse, every month the oldest month will be "rolled off" the database, and the newest month added.
-- (Source: "What is a Data Warehouse?" W.H. Inmon, Prism, Volume 1, Number 1, 1995)

A data warehouse is "a copy of transaction data specifically structured for query and analysis".
-- Ralph Kimball, book-"The Data Warehouse Toolkit, Page 310"

“A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.”
-- Barry Devlin, IBM Consultant

Below is the list of point to remember:
  • A data warehouse is a centralized by integrating data from multiple heterogeneous sources.
  • It stores the Large volume of data (GB/TB).
  • Updates infrequent and append-only.
  • Queries are very long and complex.
  • More number of users connect to it (e.g. decision-makers, analysts).
  • It is used to analytical reporting, ad hoc or user querying and analysis for decision making. 
  • A data warehouse is a relational database that contains historical data for analysis.
  • In Data warehouse update or delete doesn't happens and it is kept separate from OLTP system.
  • Data warehouse environment includes an ETL process (Extract, Transport, and Load), an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data, transforming it into useful, actionable information, and delivering it to business users .
  • Maintaining historical records
  • Various data mart is used that is a simple form of a data warehouse that is focused on a single subject (or functional area), such as sales, finance or marketing. Data marts are often built and controlled by a single department within an organization.
  • Click on the link to know about Data Warehouse Architecture / Diagram / Components

+Vivek Kumar 

Friday, May 29, 2015

ETL Testing Interview Questions and answers

ETL Testing Interview Questions and answers.

.1   What is a Data Warehouse?

:    A Data Warehouse is a collection of data marts representing historical data from different operational data source (OLTP). The data from these OLTP are structured and optimized for querying and data analysis in a Data Warehouse.

.2   What is a Data mart?

:   A Data Mart is a subset of a data warehouse that can provide data for reporting and analysis on a section, unit or a department like Sales Dept, HR Dept, etc. The Data Mart are sometimes also called as HPQS (Higher Performance Query Structure).

.3   What is OLAP?

:      OLAP stands for Online Analytical Processing. It uses database tables (Fact and Dimension tables) to enable multidimensional viewing, analysis and querying of large amount of data.

.4   What is OLTP?

:   OLTP stands for Online Transaction Processing Except data warehouse databases the other databases are OLTPs. These OLTP uses normalized schema structure. These OLTP databases are designed for recording the daily operations and transactions of a business.

.5   What are Dimensions?

:   Dimensions are categories by which summarized data can be viewed. For example a profit Fact table can be viewed by a time dimension.

.6   What are Confirmed Dimensions?

:    The Dimensions which are reusable and fixed in nature Example customer, time, geography dimensions.

.7   What are Fact Tables?

:    A Fact Table is a table that contains summarized numerical (facts) and historical data. This Fact Table has a foreign key-primary key relation with a dimension table. The Fact Table maintains the information in 3rd normal form.

            A star schema is defined is defined as a logical database design in which there will be a centrally located fact table which is surrounded by at least one or more dimension tables. This design is best suited for Data Warehouse or Data Mart.

.8    What are the types of Facts?

:    The types of Facts are as follows.

1.     Additive Facts: A Fact which can be summed up for any of the dimension available in the fact table.

2.     Semi-Additive Facts: A Fact which can be summed up to a few dimensions and not for all dimensions available in the fact table.

3.     Non-Additive Fact: A Fact which cannot be summed up for any of the dimensions available in the fact table.

.9   What are the types of Fact Tables?

:    The types of Fact Tables are:

1.     Cumulative Fact Table: This type of fact tables generally describes what was happened over the period of time. They contain additive facts.

2.     Snapshot Fact Table: This type of fact table deals with the particular period of time. They contain non-additive and semi-additive facts.

.10   What is Grain of Fact?

:   The Grain of Fact is defined as the level at which the fact information is stored in a fact table. This is also called as Fact Granularity or Fact Event Level.

.11   What is Factless Fact table?

:   The Fact Table which does not contains facts is called as Fact Table. Generally when we need to combine two data marts, then one data mart will have a fact less fact table and other one with common fact table.

.12   What are Measures?

:    Measures are numeric data based on columns in a fact table.

.13   What are Cubes?

:   Cubes are data processing units composed of fact tables and dimensions from the data warehouse. They provided multidimensional analysis.

.14   What are Virtual Cubes?

:     These are combination of one or more real cubes and require no disk space to store them. They store only definition and not the data.

.15   What is a Star schema design?

:     A Star schema is defined as a logical database design in which there will be a centrally located fact table which is surrounded by at least one or more dimension tables. This design is best suited for Data Warehouse or Data Mart.

.16   What is Snow Flake schema Design?

:      In a Snow Flake design the dimension table (de-normalized table) will be further divided into one or more dimensions (normalized tables) to organize the information in a better structural format. To design snow flake we should first design star schema design.

.17   What is Operational Data Store [ODS] ?

:     It is a collection of integrated databases designed to support operational monitoring. Unlike the OLTP databases, the data in the ODS are integrated, subject oriented and enterprise wide data.

.18   What is Denormalization?

:    Denormalization means a table with multi duplicate key. The dimension table follows Denormalization method with the technique of surrogate key.

.19   What is Surrogate Key?

:    A Surrogate Key is a sequence generated key which is assigned to be a primary key in the system (table).

Data Normalization

Data Normalization

In this section you will learn the concepts of data normalization. The most important thing in database designing is to make sure that the data get properly distributed among the tables. Simply we can say that the designing of  the table in proper manner is called Normalization.

Normalization is a process that is used in relational database design to organize the data for minimizing the duplication. In normalization, we divide the database in two or more tables and create a relationship between them. After isolating the data we perform some addition, deletion or modification on the fields of a table then we propagate and remove the duplicate data from the related tables.

The main goals of normalization process are: 
- To eliminate the redundancy of data
- To make sure that the data dependencies (relationship) make sense.
By these two goals we reduce the space occupied by the duplicate data in the database tables and ensure that the data is logically stored there.

Some of the positive points of the data normalization in database is as under :

Data integrity
To make optimized queries on the normalized tables and produce fast, efficient results.
To make faster index and also make a perfect sorting.
To increase the performance of the database.

First normal Form (1NF)
The First Normal Form requires the atomic values in each column. Atomic means the set of values are not available with in the column. In other words, in First Normal Form table must have at least one candidate key and make sure that the table don?t have any duplicate record. In First Normal Form repeating groups are not allowed, that is no attributes which occur a different number of times on different records.

Second normal Form (2NF)
The Second Normal Form can be achieved only when a table is in the 1NF. We can make the 2NF by eliminating the partial dependencies. As the First Normal Form deals with only the atomicity of data, but Second Normal Form deals with the relationships of tables like composite keys and non-key columns. In Second Normal Form subset of data is removed and is organized in separate tables. This process is applied to multiple rows of a table till the duplicity get reduced.

Third Normal Form (3NF)
The Third Normal Form can be achieved only when a table is in the Second Normal Form. We can make the 3NF by eliminating all transitive dependencies lying among the fields of a record. In Third Normal Form, all columns should depend on the primary key only i.e. remove the column, which is not dependent upon the primary key.
 
Boyce-Codd Normal Form (BCNF)
In the normalization Boyce - Codd Normal Form needs a table to meet the Third Normal Form. In Boyce - Codd Normal Form every non-trivial functional dependency must be a dependent on a superkey.

Fourth Normal Form (4NF)
The Fourth Normal Form can be achieved when a table meets the Boyce-Codd Normal Form. Fourth Normal Form confirms that the independent multivalued facts are efficiently and correctly represented in database design.

Fifth Normal Form (5NF)
The Fifth Normal Form is achieved only when a table meets the Fourth Normal Form. In this normalization it is ensured that all non-trivial join dependencies get eliminated.

Wednesday, February 4, 2015

BI Data Warehouse Architecture / Diagram

Data Warehouse Architecture

Data Warehouse Architecture can be design in different way. Its depend on the specific business requirement that means it’s vary depending upon the specifics of an organization's need.

Below is standard Data Warehouse Architecture:

Data Warehouse Architecture / Diagram
Data Warehouse Architecture / Diagram















Most Data Warehouse Architecture contains the below components or Layers as mentioned below:
  1. Metadata Layer
  2. Data Sources
    1. Production Data
    2. Internal Data
    3. Archived Data
    4. External Data
  3. Data Staging Area
    1. Data Extraction
    2. Data Transformation
    3. Data Loading
  4. ETL Layer
  5. Data Mart - Target
    1. Data Mart 1
    2. Data Mart 2
    3. Data Mart n…
  6. ODS - Operational Data Store
  7. Data Storage Layer
  8. Data Logic Layer
  9. System Operations Layer
  10. Multiple Data Marts
  11. Data Information / Presentation Area
  12. Management and Control Component 

The most common Data Warehouse Architectures are as follow:
  1. DWH Architecture – Basic
  2. DWH Architecture – with Data Staging Layer
  3. DWH Architecture – with Data Staging and Data Mart Layers

Data Warehouse and ETL Testing Concepts

This post contains the topics related to Data Warehouse and ETL Testing Concepts, if you want to be a ETL Tester you should know the below terminology. Click on the link to get more information.


BI CONCEPTS

            WHAT IS DATA WAREHOUSE?
            INTRODUCTION TO OLTP, OLAP, ROLAP, MOLAP
            DIFFERENCE BETWEEN OLTP AND DATA WAREHOUSING
            DIFFERENCE BETWEEN OLTP AND OLAP
            OLTP VS DWH
            OLTP VS OLAP
            DM VS DWH
            METADATA
           
            DATA ACQUISITION:
                        DATA EXTRACTION
                        DATA TRANSFORMATION
                        DATA LOADING
                       
            DATA MARTS AND ODS TYPES:
                        DEPENDENT DATA MART
                        INDEPENDENT DATA MART
                        DATA BASE DESIGN
                        STAR SCHEMA
                        SNOW FLAKE SCHEMA
                        FACT CONSTELLATION SCHEMA
                        SCD (SLOWLY CHANGING DIMENSION)
                        GALAXY SCHEMA
                        FCS
                        TYPE-1 SCD
                        TYPE-2 SCD
                        TYPE-3 SCD

DATA WAREHOUSING LIFE CYCLE (DWH LIFECYCLE)

TERMS To know:
            FACT AND DIMENSION TABLE
            DIMENSIONAL MODELING
            DWH TOOLS
            DATA CLEANZING
            DATA SCRUBING
            DATA MASKING
            NORMALIZATION
            ODS
            STG AREA
            DSSDIMENTION MODELING
            FACTS
            AGGREGATES
            SLOWLY CHANGING DIMENSIONS
            INITIAL LOAD
            INCREMENTAL LOAD
            FULL LOAD
            CDC- CHANGE DATA CAPTURE

ETL TESTING CONCEPTS:
            INTRODUCTION OF ETL - EXTRACT, TRANSFORM, AND LOAD
            ETL ARCHITECTURE/ CYCLE
            COMPONENTS IN ETL
            ETL TOOLS AND DIFFERENT TYPES OF ETL TOOLS
            ETL PROCESS / ETL TESTING LIFE CYCLE
            TEST STRATEGY
            TEST PLANNING
            SDLC METHODS/MODELS
            STLC METHODS/MODELS
            SDLC VS. STLC
            REVERSE ENGINEERING
            QC (QUALITY CENTER AND BUGZILLA)
            ETL TESTING ROLES AND RESPONSIBILITIES
            ETL TESTING RESPONSIBILITIES IN SSIS, ABINITIO, INFORMATICA, SQL-SERVER, DATASTAGE ETC;
            ETL TESTING TEMPLATES (TEST CASE, BUG REPORTING & ETC..)
            TYPES OF ETL BUGS/DEFECTS

ETL TRANSFORMATION TYPE/RULES:
            SAMPLE LOADING FROM SOURCE TARGET
            SORTER
            FILTER
            ROUTER
            EXPRESSION
            JOINER
            UNION
            AGGREGATOR
            RANK
            STORED PROCEDURE
            SEQUENCE GENERATOR
            FLAT FILE
            SQL

ETL TYPES OF TESTING:
            WHITEBOX TESTING
            BLACKBOX TESTING
            GRAY BOX TESTING
            REGRESSION TESTING
            PERFORMANCE
            OPERATIONAL QUALIFICATION
            SMOKE TESTING VS SANITY TESTING
            USER TESTING
            UNIT TESTING
            INTEGRATION TESTING
            MODULE TESTING
            SYSTEM TESTING
            UAT
                       
ETL TOOL AND TESTING:
            DATA EXTRACT
            DATA TRANSFORM
            DATA LOAD
            IMPORT SOURCE
            IMPORT TARGET
            MAPPINGS, MAPLETS
            WORKFLOWS WORKLETS
            TRANSFORMATIONS, FUNCTIONALITY, RULES AND TECHNIQUES
            IMPORT AND EXPORT
            COPING AND RULES
            QUERIES PREPARATION BASED ON TRANSFORMATIONS
            IMPORTANCE OF ETL TESTING
            CREATING OF MAPPINGS, SESSIONS, WORKFLOWS
            RUNNING OF MAPPINGS, SESSIONS, WORKFLOWS
            ANALYZING OF MAPPINGS, SESSIONS, WORKFLOWS
            TASKS AND TYPES
           
RESPONSIBILITIES OF AN ETL TESTER:
            COUNT QUERYING
            MINUS QUERYING
            DUPLICATE QUERYING
            INTERSECTION, ETC
            NO DATA LOSSES
            VALIDATION OF TRANSFORMATION RULES
            BUSINESS PROCESSES TESTING
            TESTING SCENARIOS, CREATION OF TEST CASES AND SCRIPTS
            TEST CASE EXECUTION AND DEFECT TRACKING AND REPORTING
            PREPARATION OF TEST DATA
            DETECT DEFECTS
            DEFECTS LOGGING AND REPORTING
            BUG REPORTING
            TEST PLAN PREPARATION
            TEST CASES PREPARATION
            PREPARATION OF TEST DATA
            MAKE SURE DATA IS TRANSFORMED CORRECTLY
                       
BEST PRACTICES FOR ETL TESTING:

            AUTOMATION OF ETL TESTING
            DESIGN THE TEST CASES
            REPORTING THE BUGS
            DETECT THE BUGS THROUGH DATABASE QUERIES
            PREPARE QUERIES QUICKLY WITH THE HELP OF MAPPING
            PERFORMANCE TUNING AND PERFORMANCE TESTING, REPORT TESTING, UI TESTING

You Should also know:
            ETL INTERVIEW QUESTIONS AND ANSWERS
            REAL TIME EXAMPLE
            SQL
            UNIX  

SDLC (SOFTWARE DEVELOPMENT LIFE CYCLE):
            WATERFALL MODEL
            V-MODEL
            AGILE MODEL & METHODOLOGY
            PROTOTYPE MODEL
            SPIRAL MODEL

Thursday, January 22, 2015

WhatsApp Web launched and now available on the Desktop via Google Chrome browser

WhatsApp Web

WhatsApp, popular messaging app has launched web client known as WhatsApp Web.Now WhatsApp Web can be accessible through Web URL: https://web.whatsapp.com in your Google Chrome browser, see below.
Whats App Web App

As per the WhatsApp blog: - "Web client is simply an extension of your phone: the web browser mirrors conversations and messages from your mobile device -- this means all of your messages still live on your phone."

  • Media can be download directly to your Desktop
  • Media can be upload from your Desktop Hard Drive
Limitation or point to know before use WhatsApp Web:

  1. Working only on Android, Windows Phone, BlackBerry or Nokia S60 devices
  2. iOS devices are not supported (due toApple platform limitations)
  3. Requires the latest version of WhatsApp to be installed on  your smartphone
  4. It is mirroring your phone
  5. The pone will need to be connected to the web/net at all times
  6. Need to have the latest version (Version 2.11.498 installed on my phone) of WhatsApp installed on your phone
  7. Use with Google Chrome web browser only.
  8. You can't create and leave Groups
  9. You can't use New Broadcast
  10. You can't block users

How to Connect your web browser to your WhatsApp Client ?
  • Below are the simple steps or instructions to connect WhatsApp Web:
  1. Open Web URL: https://web.whatsapp.com in your Google Chrome browser (Works only on Chrome)
  2. Open Whatsapp tap on Menu option (which contains options like New Group, Status, etc)
  3. Tap on WhatsApp Web:
  4. Scan the QR Code on Chrome browser:
  5. After successfully QR Code Scanned you should see the page as below:


  • You can also see the status of connectivity on your phone devices.
  • You can disconnect or Log out from the connected computer, refer the below steps:

  1. Open Whatsapp tap on Menu option
  2. Tap on WhatsApp Web
  3. Tap on Log out from all computers


Thanks for visiting :)!!
I will keep updating the more details for the same.
If you have any question please ask here and kindly share it.

Wednesday, January 21, 2015

SQL Server Reporting Services (MSSQLSERVER) - ERROR: 1053

Description: Windows could not start the SQL Server Reporting Services (MSSQLSERVER) service on local computer. Error 1053: The service did not respond to the start or control request in a timely fashion.


Cause / Problem: The state like “started, stopped, paused, etc.” all installed Windows services controlled by Microsoft Windows Service Control Manager.
  • By default, 30,000 milliseconds (30 seconds) Service Control Manager will wait for a service to respond.
  • By editing or creating the ServicesPipeTimeout DWORD value, the Service Control Manager timeout period can be overridden, thereby giving the service more time to start up and report ready to the Service.
When attempting to start, stop or pause a service, one of the following error messages is encountered:
  • 'Error 1053: The service did not respond to the start or control request in a timely fashion.'
  • 'Error 7009: Timeout (30000 milliseconds) waiting for the service to connect.'
  • 'Error 7000: The failed to start due to the following error: The service did not respond to the start or control request in a timely fashion.'
Issue with Operation System:
  • Windows XP 32/64 bit
  • Windows Vista 32/64 bit
  • Windows 7 32/64 bit
  • Microsoft Server 2003/2008 - 32/64 bit
  • Microsoft Server 2008 R2

Solution:

Note: Microsoft Windows Registry Editor may cause serious problems with your computer, edit at your own risk.

Follow the below steps:

  1. Start Run and type =>> “regedit
  2. Navigate to the following registry key=>> HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control
  3. On Right pane =>> locate the ServicesPipeTimeout entry
Note: If the ServicesPipeTimeout entry does not exist, you must create it, follow the below steps to create:
  1. Right Click on Control =>> New =>> DWORD Value
  2. Rename the new DWORD as =>> ServicesPipeTimeout
  3. Right-click ServicesPipeTimeout =>> Modify
  4. Click Decimal, type 120000 (2 minute) =>> OK
  5. Restart the computer

After that you should be able to start the SQL Server Reporting Services(MSSQLSERVER).

-Vivek Kumar 

Tuesday, January 20, 2015

ETL Job or Components

ETL Components:


ETL Job or Components
ETL Job or Components
  1. Data Extraction
  2. Data Transformation
  3. Data Loading
  4. Data Refresh Method
  5. Data Sourcing Method
  6. Data Landing / Staging Design
  7. Job Scheduling
  8. Job Sequencing
  9. Historical Data Migration
  10. Changed Data Captrue
  11. Audit & Verification
  12. Admin
  13. Validation & Cleansing
  14. Aggregate
  15. Logging & Monitoring
  16. Surrogate Key Generation & Mgmt
  17. History / SCD Mgmt
  18. Exception Handing
  19. Data Conformation & De duping
  20. Restart / Reload



Monday, January 12, 2015

List of Popular ETL Tools/Software

List of Popular ETL Tools/Software

  • Commercial ETL Tools:

  1. Informatica – PowerCenter
  2. IBM – Infosphere Information Server
  3. Oracle Data Integrator (ODI)
  4. Oracle Warehouse Builder (OWB)
  5. Microsoft - SQL Server Integration Services
  6. SAS - Data Integration Studio
  7. SAP - BusinessObjects Data Integrator
  8. IBM - Websphere DataStage (Ascential DataStage)
  9. IBM - Cognos Data Manager (Cognos DecisionStream)
  10. IBM - DB2 Warehouse Edition
  11. AB - Initio
  12. Information Builders - Data Migrator

  • Open source ETL Tools: (Freeware)

  1. Talend - Open Studio for Data Integration (Integrator Suite)
  2. Apatar
  3. Clover ETL
  4. GeoKettle
  5. Jasper Soft ETL
  6. KETL (™)
  7. Pentaho Data Integration (PDI, Kettle)
  8. Actuate Corporation
  9. SpagoBI
  10. HPCC System
  11. Jedox
  12. EplSite ETL
  13. GETL ETL
  14. Scriptella ETL
  15. RapidMiner
  16. Anatella

  • Other ETL Tools:

  1. Embarcadero Technologies - DT/Studio
  2. IKAN - ETL4ALL
  3. Pervasive - Data Integrator
  4. Group 1 Software (Sagent) - DataFlow
  5. Sybase - Data Integrated Suite ETL
  6. Expressor Software - Expressor Semantic Data Integration System
  7. Elixir - Elixir Repertoire

-Vivek Kumar