💾

Data Warehousing and Ab Initio Overview

Jan 29, 2025

Lecture Notes on Data Warehousing and Ab Initio

Introduction to Data Warehousing

  • Data Warehouse (DW) is a database for storing business data.
  • DW is used for analytical purposes, gathering data from multiple sources.
  • It stores both current and historical data for reporting and analysis.
  • Data volumes in DW are larger than in normal databases.

ETL Processes

  • ETL stands for Extraction, Transformation, and Loading.
  • ETL extracts data from various source formats, transforms it, and loads it into the DW.
  • Once data is in the DW, reporting teams generate reports for end-users.

Components of Data Warehousing

  • Data Warehousing involves two main areas: ETL and Reporting.
  • ETL tools prepare the data warehouse; reporting tools utilize the data.

Introduction to Ab Initio

  • Ab Initio is an ETL tool, a Latin term meaning "from the beginning."
  • Provides ETL solutions across various domains: banking, telecom, healthcare, etc.
  • Predominantly used in financial services due to its performance and features.
  • Developed by a US-based company starting in 1996.
  • Ab Initio is recognized for top performance and comes with a comprehensive package (repository, client, server).
  • Current version: 3.2; training version: 2.10.

Ab Initio Architecture

  • Client component: Graphical Development Environment (GDE) on Windows.
  • Server component: Co-operating System (on UNIX server).
  • GDE used for designing, executing, and checking graph status.
  • Graphs are collections of components saved with a .mp extension and related shell script files (.ksh).
  • All files are stored on the UNIX server, known as the host machine.

Graph Execution Process

  • Triggering graphs creates processes on the UNIX server (host machine).
  • Steps: Host Process Creation, Agent Process Creation, Component Execution, Termination.
  • Results are viewed in GDE after execution.

Enterprise Meta Environment (EME)

  • EME is a repository and version control tool, separate UNIX server from host.
  • Used for storing code and managing project paths.
  • Enables lifecycle management (development, SAT, UAT, production).

Working with Sandboxes

  • Sandbox: a working environment for developers, created upon check-out.
  • Code initially stored in sandbox, then checked into EME for version control.
  • Check-out brings missing or updated objects from EME to sandbox.
  • Check-in process transfers updates from sandbox to EME.

Locking and Version Control

  • To modify an object in EME, lock must be acquired.
  • Prevents simultaneous edits by multiple users.
  • EME maintains multiple versions; sandboxes do not.
  • Regular check-in/check-out processes ensure synchronization with EME.

Real-time Examples and Practical Applications

  • Advanced concepts and real-time examples will be covered in actual sessions.
  • Assignments and guidelines provided throughout the course for practice.

Conclusion

  • Course duration: 1 month, with 1.5-hour sessions.
  • Focus on practical, real-time applications of Ab Initio and ETL processes.