💡

Informatica PowerCenter Notes

Jun 23, 2024

Informatica PowerCenter Tutorial and Practical Sessions

Introduction

  • Focus: Learning complete Informatica PowerCenter with practical sessions
  • Importance: Practice session without skipping
  • Subscription: Subscribe to the channel and click the bell icon for notifications
  • Interaction: Add doubts and clarifications in the comments

Agenda

  1. Business Intelligence (BI)
  2. Data Values vs. Data Lake
  3. ETL (Extract, Transform, Load)
  4. Informatica PowerCenter Basics
  5. Informatica Architecture
  6. Client Components
  7. Load Types and Transformations
  8. Slowly Changing Dimensions (SCD)
  9. Performance Tuning

Business Intelligence (BI)

  • Definition: Tracking business performance and making informed decisions
  • Users: Managers, directors, VPs, CEOs
  • Examples: Banking, insurance, automobile, healthcare, software companies, e.g., analyzing online food delivery performance
  • Tools Used: Tableau, Power BI, etc.
  • BI vs. AI: AI involves making machines take decisions; BI provides data for human decision-making
  • Evolution: 1980s - databases for transactions and reporting (OLTP), now OLAP systems and Data Lakes for business analytics

Data Values vs. Data Lake

  • Data Values: Structured data in a table format (OLAP)
  • Data Lake: Handles structured, semi-structured, and unstructured data; larger volume than data warehouses
  • Sources: CRM, ERP, flat files, different databases
  • OLTP vs. OLAP: OLTP for transactions; OLAP for analytics and reporting

ETL (Extract, Transform, Load)

  • Purpose: Load data from various sources to a data warehouse
  • Process: Extract data, Transform data, and Load data into target systems
  • Live data processing: Uses batch or streaming processing
    • Batch Processing: Scheduled data movement (hourly, daily)
    • Streaming Processing: Near real-time data movement (IoT devices, sensors, etc.)
  • Tools Used: Informatica PowerCenter, Informatica IICS, Microsoft SSIS, Oracle Data Integrator

Informatica PowerCenter Basics

  • Tool: ETL tool for data integration
  • Functions: Extract, Transform, Load data into data warehouses
  • Implementation: Can handle data from various sources (SAP, Salesforce, Oracle, SQL Server, etc.)
  • ETL & ELT: Informatica supports both ETL and ELT processes
  • Automated Processing: Scheduling for periodic data movement

Informatica Architecture

  • Components: PowerCenter server, repository, client tools (Designer, Workflow Manager, Monitor, etc.)
  • Server: Executes workflows, handles data movement
  • Repository: Stores metadata for ETL processes
  • Client Tools: Interface for developers to create, monitor, and manage workflows

Client Components and their Use

  • Repository Manager: Admin tasks, folders, and access management
  • Designer: Define sources/targets, map transformations
  • Workflow Manager: Create and manage workflows
  • Monitor: Monitor workflow execution

Load Types and Transformations

Table to Table Load

  • ETL Process: Moving data between tables within Oracle

Flat File to Table Load

  • Scenario: Reading from flat files, loading into database tables
  • Components: Source file directory structure, delimiter settings in source definition

Table to Flat File Load

  • Scenario: Exporting data from tables to flat files
  • Components: Defining flat file targets, configurations for delimiter, optional quotes, etc.

Transformations

Source Qualifier

  • Converts source data type to Informatica native data type
  • Active and Connected transformation
  • Properties: SQL query, User-defined join, Source filter, Number of sorted ports,
    • Distinct: Removes duplicates
    • Pre/Post SQL: SQL commands before/after data fetch

Filter Transformation

  • Filters records in the pipeline
  • Example: Filter out data based on condition

Expression Transformation

  • Apply business logic to each row
  • Ports: Input, Output, Variable
  • Examples: Uppercase conversion, string concatenation, conditional logic

Joiner Transformation

  • Joins heterogeneous sources
  • Join Types: Normal (Inner), Master Outer (Left Outer), Detail Outer (Right Outer), Full Outer

Sorter Transformation

  • Sorts data either in ascending/descending order
  • Properties: Key ports, Case sensitivity, Null handling

Aggregator Transformation

  • Performs aggregate calculations
  • Functions: SUM, AVG, MIN, MAX, etc.

Router Transformation

  • Routes data into multiple pipelines based on conditions
  • Example: Segregate records based on geography

Rank Transformation

  • Ranks data based on a specific port
  • Properties: Top/Bottom ranking, Group By for segmented ranking

Sequence Generator

  • Generates unique sequence numbers
  • Use Case: Surrogate keys in target tables

Union Transformation

  • Combines data from multiple sources into one
  • Behavior: Acts like SQL UNION ALL (includes duplicates)

Lookup Transformation

  • Looks up values in a table to enrich the data
  • Types: Connected (in-pipeline), Unconnected (out-of-pipeline)
  • Multiple Match Policies: Use any value, use first value, use last value, report error, return all values

Slowly Changing Dimensions (SCD)

  • Types: Type 1 (No history), Type 2 (Full history), Type 3 (Limited history)
  • Type 1: Simple updates, no historical data
  • Type 2: Full history (flag, version, and date methods)
    • Flag Method: Active (flag=1) and Inactive (flag=0)
    • Version Method: Maintains versions of records
    • Date Method: Maintains effective and end dates
  • Type 3: Limited column-level history

Performance Tuning

  • Indicators: Long running sessions, timeout issues, CPU consumption
  • Bottleneck Analysis: Analyze session logs, identify bottlenecks in reading, transforming, or writing data
  • Optimization Techniques: Index creation, source qualifiers, filter conditions, caching
  • Advanced Techniques: Pushdown optimization, partitioning
    • Pushdown Optimization: Converts transformations into SQL operations on the database side
    • Partitioning: Parallel processing to handle large datasets efficiently

Additional Concepts

Parameter and Variables

  • Purpose: Pass dynamic values to mappings or sessions
  • Parameter File: Defines variable values used in workflows
  • Usage: Used in scenarios that require flexibility and reusability of workflows

Debugging

  • Purpose: Step-by-step execution of workflows
  • Breakpoint: Set points to pause execution for detailed analysis
  • Session Log Analysis: Identifies specific areas causing issues

Best Practices

  • Modular Workflows: Create modular workflows for better maintenance and reuse
  • Index Management: Proper index management on source/target tables for improved performance
  • Scheduled Maintenance: Regular monitoring and tuning of workflows