Showing posts with label ETL. Show all posts
Showing posts with label ETL. Show all posts

Saturday, June 6, 2015

OLTP - On-line Transaction Processing

Introduction to OLTP:


What is OLTP?

Definition:
OLTP stand for "On-line Transaction Processing", it is an operational database that supports or facilitates high transaction-oriented applications. Operational data means those data occupied in the daily transactions from a particular system.

It is used in traditional database that includes insert, update, and delete operations, while also perform to obtain information using on inquiry requirement.

Example of OLTP:
Banks, Airlines, Retailers, Railways etc.

Data warehouse (DWH) Introduction - Design & Concepts

Data warehouse (DW or DWH) also known as an enterprise data warehouse (EDW)


What is Data warehouse (DW or DWH):

There is a lot of different definitions for a data warehouse. The simple and popular definition was first defined by Bill Inmon in 1990.

“A data warehouse is a subject-oriented, integrated, time-varying, non-volatile collection of data in support of the management's decision-making process."
-- Bill Inmon

Subject-oriented: Used to analyze a particular subject area. For example, Sales, Customers, Patients, Students, Products etc.

Integrated: A data warehouse integrates, consistent naming conventions, formats, encoding structures from multiple data sources into a consistent format. Data should be free of issue such as naming conflicts and variation among units of measure.

Time-Variant: Historical data is stored in a data warehouse i.g. trends and changes for reporting. For example, Query of data can be from any time of period. OLTP may hold the most recent records, whereas data warehouse can hold all records.

Non-volatile: Data once entered into the warehouse, data is never change or delete. The data is fixed, read-only, and retained to generate report and it enables you to analyze what has occurred.

A single-subject data warehouse is typically referred to as a data mart, while data warehouses are generally enterprise in scope. Also, data warehouses can be volatile. Due to the large amount of storage required for a data warehouse, (multi-terabyte data warehouses are not uncommon), only a certain number of periods of history are kept in the warehouse. For instance, if three years of data are decided on and loaded into the warehouse, every month the oldest month will be "rolled off" the database, and the newest month added.
-- (Source: "What is a Data Warehouse?" W.H. Inmon, Prism, Volume 1, Number 1, 1995)

A data warehouse is "a copy of transaction data specifically structured for query and analysis".
-- Ralph Kimball, book-"The Data Warehouse Toolkit, Page 310"

“A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.”
-- Barry Devlin, IBM Consultant

Below is the list of point to remember:
  • A data warehouse is a centralized by integrating data from multiple heterogeneous sources.
  • It stores the Large volume of data (GB/TB).
  • Updates infrequent and append-only.
  • Queries are very long and complex.
  • More number of users connect to it (e.g. decision-makers, analysts).
  • It is used to analytical reporting, ad hoc or user querying and analysis for decision making. 
  • A data warehouse is a relational database that contains historical data for analysis.
  • In Data warehouse update or delete doesn't happens and it is kept separate from OLTP system.
  • Data warehouse environment includes an ETL process (Extract, Transport, and Load), an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data, transforming it into useful, actionable information, and delivering it to business users .
  • Maintaining historical records
  • Various data mart is used that is a simple form of a data warehouse that is focused on a single subject (or functional area), such as sales, finance or marketing. Data marts are often built and controlled by a single department within an organization.
  • Click on the link to know about Data Warehouse Architecture / Diagram / Components

+Vivek Kumar 

Wednesday, February 4, 2015

BI Data Warehouse Architecture / Diagram

Data Warehouse Architecture

Data Warehouse Architecture can be design in different way. Its depend on the specific business requirement that means it’s vary depending upon the specifics of an organization's need.

Below is standard Data Warehouse Architecture:

Data Warehouse Architecture / Diagram
Data Warehouse Architecture / Diagram















Most Data Warehouse Architecture contains the below components or Layers as mentioned below:
  1. Metadata Layer
  2. Data Sources
    1. Production Data
    2. Internal Data
    3. Archived Data
    4. External Data
  3. Data Staging Area
    1. Data Extraction
    2. Data Transformation
    3. Data Loading
  4. ETL Layer
  5. Data Mart - Target
    1. Data Mart 1
    2. Data Mart 2
    3. Data Mart n…
  6. ODS - Operational Data Store
  7. Data Storage Layer
  8. Data Logic Layer
  9. System Operations Layer
  10. Multiple Data Marts
  11. Data Information / Presentation Area
  12. Management and Control Component 

The most common Data Warehouse Architectures are as follow:
  1. DWH Architecture – Basic
  2. DWH Architecture – with Data Staging Layer
  3. DWH Architecture – with Data Staging and Data Mart Layers

Data Warehouse and ETL Testing Concepts

This post contains the topics related to Data Warehouse and ETL Testing Concepts, if you want to be a ETL Tester you should know the below terminology. Click on the link to get more information.


BI CONCEPTS

            WHAT IS DATA WAREHOUSE?
            INTRODUCTION TO OLTP, OLAP, ROLAP, MOLAP
            DIFFERENCE BETWEEN OLTP AND DATA WAREHOUSING
            DIFFERENCE BETWEEN OLTP AND OLAP
            OLTP VS DWH
            OLTP VS OLAP
            DM VS DWH
            METADATA
           
            DATA ACQUISITION:
                        DATA EXTRACTION
                        DATA TRANSFORMATION
                        DATA LOADING
                       
            DATA MARTS AND ODS TYPES:
                        DEPENDENT DATA MART
                        INDEPENDENT DATA MART
                        DATA BASE DESIGN
                        STAR SCHEMA
                        SNOW FLAKE SCHEMA
                        FACT CONSTELLATION SCHEMA
                        SCD (SLOWLY CHANGING DIMENSION)
                        GALAXY SCHEMA
                        FCS
                        TYPE-1 SCD
                        TYPE-2 SCD
                        TYPE-3 SCD

DATA WAREHOUSING LIFE CYCLE (DWH LIFECYCLE)

TERMS To know:
            FACT AND DIMENSION TABLE
            DIMENSIONAL MODELING
            DWH TOOLS
            DATA CLEANZING
            DATA SCRUBING
            DATA MASKING
            NORMALIZATION
            ODS
            STG AREA
            DSSDIMENTION MODELING
            FACTS
            AGGREGATES
            SLOWLY CHANGING DIMENSIONS
            INITIAL LOAD
            INCREMENTAL LOAD
            FULL LOAD
            CDC- CHANGE DATA CAPTURE

ETL TESTING CONCEPTS:
            INTRODUCTION OF ETL - EXTRACT, TRANSFORM, AND LOAD
            ETL ARCHITECTURE/ CYCLE
            COMPONENTS IN ETL
            ETL TOOLS AND DIFFERENT TYPES OF ETL TOOLS
            ETL PROCESS / ETL TESTING LIFE CYCLE
            TEST STRATEGY
            TEST PLANNING
            SDLC METHODS/MODELS
            STLC METHODS/MODELS
            SDLC VS. STLC
            REVERSE ENGINEERING
            QC (QUALITY CENTER AND BUGZILLA)
            ETL TESTING ROLES AND RESPONSIBILITIES
            ETL TESTING RESPONSIBILITIES IN SSIS, ABINITIO, INFORMATICA, SQL-SERVER, DATASTAGE ETC;
            ETL TESTING TEMPLATES (TEST CASE, BUG REPORTING & ETC..)
            TYPES OF ETL BUGS/DEFECTS

ETL TRANSFORMATION TYPE/RULES:
            SAMPLE LOADING FROM SOURCE TARGET
            SORTER
            FILTER
            ROUTER
            EXPRESSION
            JOINER
            UNION
            AGGREGATOR
            RANK
            STORED PROCEDURE
            SEQUENCE GENERATOR
            FLAT FILE
            SQL

ETL TYPES OF TESTING:
            WHITEBOX TESTING
            BLACKBOX TESTING
            GRAY BOX TESTING
            REGRESSION TESTING
            PERFORMANCE
            OPERATIONAL QUALIFICATION
            SMOKE TESTING VS SANITY TESTING
            USER TESTING
            UNIT TESTING
            INTEGRATION TESTING
            MODULE TESTING
            SYSTEM TESTING
            UAT
                       
ETL TOOL AND TESTING:
            DATA EXTRACT
            DATA TRANSFORM
            DATA LOAD
            IMPORT SOURCE
            IMPORT TARGET
            MAPPINGS, MAPLETS
            WORKFLOWS WORKLETS
            TRANSFORMATIONS, FUNCTIONALITY, RULES AND TECHNIQUES
            IMPORT AND EXPORT
            COPING AND RULES
            QUERIES PREPARATION BASED ON TRANSFORMATIONS
            IMPORTANCE OF ETL TESTING
            CREATING OF MAPPINGS, SESSIONS, WORKFLOWS
            RUNNING OF MAPPINGS, SESSIONS, WORKFLOWS
            ANALYZING OF MAPPINGS, SESSIONS, WORKFLOWS
            TASKS AND TYPES
           
RESPONSIBILITIES OF AN ETL TESTER:
            COUNT QUERYING
            MINUS QUERYING
            DUPLICATE QUERYING
            INTERSECTION, ETC
            NO DATA LOSSES
            VALIDATION OF TRANSFORMATION RULES
            BUSINESS PROCESSES TESTING
            TESTING SCENARIOS, CREATION OF TEST CASES AND SCRIPTS
            TEST CASE EXECUTION AND DEFECT TRACKING AND REPORTING
            PREPARATION OF TEST DATA
            DETECT DEFECTS
            DEFECTS LOGGING AND REPORTING
            BUG REPORTING
            TEST PLAN PREPARATION
            TEST CASES PREPARATION
            PREPARATION OF TEST DATA
            MAKE SURE DATA IS TRANSFORMED CORRECTLY
                       
BEST PRACTICES FOR ETL TESTING:

            AUTOMATION OF ETL TESTING
            DESIGN THE TEST CASES
            REPORTING THE BUGS
            DETECT THE BUGS THROUGH DATABASE QUERIES
            PREPARE QUERIES QUICKLY WITH THE HELP OF MAPPING
            PERFORMANCE TUNING AND PERFORMANCE TESTING, REPORT TESTING, UI TESTING

You Should also know:
            ETL INTERVIEW QUESTIONS AND ANSWERS
            REAL TIME EXAMPLE
            SQL
            UNIX  

SDLC (SOFTWARE DEVELOPMENT LIFE CYCLE):
            WATERFALL MODEL
            V-MODEL
            AGILE MODEL & METHODOLOGY
            PROTOTYPE MODEL
            SPIRAL MODEL

Tuesday, January 20, 2015

ETL Job or Components

ETL Components:


ETL Job or Components
ETL Job or Components
  1. Data Extraction
  2. Data Transformation
  3. Data Loading
  4. Data Refresh Method
  5. Data Sourcing Method
  6. Data Landing / Staging Design
  7. Job Scheduling
  8. Job Sequencing
  9. Historical Data Migration
  10. Changed Data Captrue
  11. Audit & Verification
  12. Admin
  13. Validation & Cleansing
  14. Aggregate
  15. Logging & Monitoring
  16. Surrogate Key Generation & Mgmt
  17. History / SCD Mgmt
  18. Exception Handing
  19. Data Conformation & De duping
  20. Restart / Reload