Overview
This lecture provides a comprehensive, step-by-step guide to using Power Query as an ETL tool in Excel and Power BI, focusing on data import, cleaning, transformation, appending, merging, and loading from various formats and sources.
Introduction to Power Query
- Power Query is an ETL (Extract, Transform, Load) tool embedded in Excel and Power BI.
- It allows data extraction from multiple sources: Excel, CSV, PDF, websites, folders, and SQL databases.
- No separate installation is needed in Excel 2016 and later; accessible via the Data tab.
- Used primarily for data cleaning, transformation, and automation before analysis.
Accessing Power Query
- Access Power Query through Excel's Data tab, under "Get & Transform Data" and "Queries & Connections."
- Options include importing data from tables, ranges, text/CSV files, web, and folders.
Data Cleaning & Transformation Steps
- Identify and remove blank rows and columns, handle null/missing values, and correct inconsistent data formats.
- Standardize categorical values (e.g., male/female, M/F) and combine or split columns as needed.
- Use built-in functions: Remove Rows, Merge Columns, Split Columns, Replace Values, and Change Data Type.
- Apply transformations step-by-step; all steps are recorded and can be edited or removed.
- Keyboard shortcuts (Ctrl+A, Ctrl+Shift+Right/Down) help in selecting data ranges efficiently.
Using Power Query for Various Data Sources
- CSV Files: Import and process large datasets (over 1 million rows) that Excel can’t natively open; use Power Query to load, filter, and analyze data.
- PDF Files: Extract tables from multiple pages, combine/apply headers, and append data.
- Image Files: Use "From Picture" to convert scanned tables/images into editable Excel data.
- Web Data: Import tables from website URLs directly into Excel.
- Folder Append: Combine data from multiple Excel/CSV files inside a folder automatically.
Appending & Merging Data
- Append: Stack data from several sources vertically; column numbers and headers must match.
- Merge: Join two tables based on a key column (similar to SQL joins); match and enrich data across tables.
- Use Power Query formulas for complex transformations, such as
Table.PromoteHeaders.
Loading & Refreshing Data
- Use "Close & Load" or "Close & Load To" to output cleaned data to a new or existing worksheet or as a PivotTable.
- Refresh destination queries to auto-update cleaned data when the source changes.
Power Query in Power BI
- Power Query in Power BI works similarly but supports additional sources and has a slightly different interface.
- Perform ETL operations before building interactive dashboards.
Key Terms & Definitions
- ETL — Extract, Transform, Load; process of preparing data for analysis.
- Power Query — Excel/Power BI tool for ETL steps.
- Append — Combine tables vertically.
- Merge — Join tables horizontally based on a key.
- Transform — Change data structure (cleaning, splitting, merging, formatting).
- Promote Headers — Use first row as column headers in data.
Action Items / Next Steps
- Practice loading, cleaning, and transforming sample datasets in Power Query.
- Experiment with appending and merging files from folders and other sources.
- Review provided homework or practice files, and comment or ask questions for further clarification.
- (If interested) Explore and enroll in advanced Power Query or Power BI courses.