🔧

ETL Pipeline Setup for Sales Data

Jun 6, 2025

Summary

  • The meeting covered setting up an ETL pipeline using SSIS in Visual Studio to load sales and related data from various Excel and CSV files into a Microsoft SQL Server "my shop" database.
  • Step-by-step tasks included creating SSIS projects, configuring data sources and destinations, designing and executing data flows, and defining new tables in SQL Server.
  • Attendees ensured tables were properly structured and verified successful data loading.
  • Discussions included handling possible data conversion errors and structuring fields for optimal storage.

Action Items

  • (not specified – Team/Owner TBD): Review data conversion for any future data type issues encountered during ETL loads.
  • (not specified – Team/Owner TBD): Add primary and foreign keys to tables where required after initial data load.

Setting Up ETL Pipeline for Sales Data

  • SSIS in Microsoft Visual Studio was used to create a new project to handle the ETL process.
  • A data flow task was created with an Excel file as the source and SQL Server as the destination.
  • Database "my shop" on server "Acer" was selected as the target; tables in the database were initially empty.

Creating and Configuring Tables

  • A new table "sales" was created with columns mapped from the spreadsheet, assigning appropriate data types (integers for IDs and keys, varchar for names, float for cost/price, etc.).
  • Adjustments were made to field sizes for efficiency (e.g., setting integer types where feasible, reducing varchar lengths).
  • Primary key designated on "transaction ID"; other potential constraints mentioned but not fully implemented in meeting.

Data Loading Execution and Validation

  • Data flow was executed successfully with no errors; SSIS process confirmed as complete.
  • Verification performed by checking the existence and structure of the new "sales" table in SQL Server.

Loading Additional Tables (Category, Customer, Marketing Expenses)

  • Additional ETL tasks were created for other Excel files: "category" and "customer", each with their own new table definitions in SQL Server.
  • A CSV file ("marketing expenses") was also loaded, using a flat file connection in SSIS and mapped to a new table.
  • All destination tables were validated in SQL Server upon completion.

Potential Issues and Error Handling

  • Mentioned the potential for data type mismatches and need for data conversion, though detailed error resolution steps were not shown as the demonstration was educational.
  • Advised that foreign and primary keys could be added to tables after initial data loading.

Decisions

  • Loaded sales, category, customer, and marketing expenses data from Excel/CSV into "my shop" SQL Server database — Rationale: To enable structured storage and access for business analytics.

Open Questions / Follow-Ups

  • Should any data type conversion errors arise in the future, develop a standard resolution process.
  • Confirmation needed on data integrity and if all foreign/primary keys should be enforced post-load.
  • Assign clear owners for ongoing data management and key constraint tasks.