Talk to anyone who has used Power Query and they'll tell you how amazing it is. Stories of automating tasks that used to take three hours now taking only three minutes is not uncommon or an exaggeration. Now if you haven't heard of Excel's Power Query tool or you've heard of Power Query but you're not sure if it'll be useful to you, then stay tuned because I'm going to showcase what the fuss is all about. Before we get started, I want to reassure you that everyone on Excel 2010 or later has access to Power Query because it's a tool built by Microsoft for Excel and Power BI.
It's the same tool in both apps so you only need to learn it once and you can use it in either app. Now if you have Excel 2016 or later you'll find the Power Query tools on the data tab of the ribbon in the get and transform group of tools. If you have Excel 2010 or 2013 then you can download the free Power Query add-in and your Power Query tools will have their own dedicated tab in the ribbon. Now Power Query isn't available in Excel 2007 or earlier.
In Power BI Desktop, the Power Query tools are on the Home tab of the ribbon. So why should you use Power Query? Well, when you're spending a big chunk of your time gathering and cleaning data, the time savings generated by Power Query can be huge.
A survey by Kaggle of nearly 16,000 respondents shows that on average they're spending 40% of their time on these laborious tasks. Now keep in mind that the respondents of this survey are working in data specialist roles where the data probably isn't that messy. But many of us performing more ad hoc tasks are likely to spend a lot more time getting and cleaning our data. Now there have been other surveys by Crowdflower that indicate nearly 80% of an analyst's time can be spent on the data gathering and cleaning process.
Personally, I think that's at the extreme end, but even 40% is a considerable amount of time to save. The purpose of Power Query is to automate the getting and cleaning of data. Traditionally, if you wanted to automate these laborious tasks, you'd use Excel's programming language, VBA. But Power Query doesn't require programming knowledge because most of its tools are available from the GUI with point and click ease, as you'll see shortly. Power Query can therefore reduce time consuming tasks that take hours down to as little as a few seconds.
Power Query can get messy data from a huge range of sources, including tables in your Excel worksheet, other Excel files, text or CSV files, even multiple files in a folder, or tables of data from PDFs. as well as JSON and XML data sources, and databases including SQL, Access, Analysis Services, and even your own proprietary systems with ODBC or OLEDB connectors. You can also get data from Azure services, OData and the web, and from cloud-based systems like SharePoint, Exchange, Salesforce and Dynamics.
And if you're using Power Query in Power BI, there are even more data sources available. From there we can clean and transform the messy data into the perfect tabular format using the intuitive GUI full of common data cleaning tasks at the click of a button. And this allows us more time for the fun part of analyzing and visualizing the data. Let's switch over to Excel and I'll run through a practical example. In this folder I've got three Excel files.
containing sales data for the last three years. They could equally be CSV or text files. Power Query can get data from those file types too. Now the data in these files is a bit messy, so let's just take a look at the first one. We can see here that the headers are spread over two rows, which is no good for pivot tables and Power Pivot.
In column L, we've got the shipping mode and container stuffed into one column, which will make it difficult to analyze the data by those two separate criteria. We have order quantity, sale price and discount but we don't have a column for the sale amount so we'll need to add that. I'd like to know the days it takes from order to actually shipping the product so we'll need to add a column for that. And my salesperson data is a little formal for the purpose of the report that I want to build.
So we're going to tidy up this data before we combine it into one table that can be loaded into Excel or Power Pivot. So I'm going to close this file, we'll go back to the... browser I'm going to right click, copy the address because I need the folder path. Then on the data tab of the ribbon I'm going to get data from a file from a folder.
I'm going to control v to paste in the folder path. You can browse to it if you need to. Click OK and Power Query goes away and grabs a list of all the files in that folder.
We can see them here. Down the bottom I'm going to combine and transform the data. Remember I need to use the transformation tools because my data is a bit messy, but if your data is in the perfect format you could simply skip that step and combine and load. So we'll go combine and transform data.
Power Query has gone away and it's grabbed the first file as the sample file. but I could choose a different one if I prefer. Down here we've got a list of the sheets, there's just one sheet in each file, and over here we get a preview of the data.
So that looks good, I'll click OK, and Power Query has gone away and it's combining the data from those three files into one table. And that's what we see here. If I scroll across you can see all of the data, and if I click on the drop down here and load more you can see The data for all three files is available in this one table.
So on the left here we see the final query called data that's combined all of the files and then above that are all the helper queries that Power Query has automatically generated for me in order to combine the files. On the right hand side in the query settings under applied steps you can see all of the steps it's taken in this particular query. So a bit like a macro recorder, It's recorded those steps for me. Now I need to make some changes to my data and I'm going to do that in the sample file. So this is just the first file for 2017 data.
I'm going to collapse the query pane just to give me a bit more room. The first thing I need to do is fix the headers. It's automatically promoted the headers.
We can see that in the step here. I need to delete that step so that I've got my headers in regular rows. Then I can transform the data and transpose it. That way I've got my two headers in separate columns.
Selecting them both I can right click and merge the columns into one. We're going to have just the space as the separator, the column name doesn't matter, I'm not going to keep that anyway. So click ok. Now I can transpose the data back and I can use the first row as my headers. So we've tidied up the headers Next, I want to separate the shipping mode and the container.
So again on the transform tab, we're going to split the column by delimiter. In this case, I've got a space, a hyphen and a space as a delimiter. But you can see there are lots of options for different ways to split your data.
I'm going with delimiter and in here I want a custom delimiter. Remember space, hyphen, space. I can split it at any of these.
There's only one instance of it. but if you had multiple instances of a delimiter then you have options here to handle that. I'll click OK and now I have my shipping mode and container in separate columns.
It's going to double click to rename them so we'll cut that out and then I'll paste it in there and we'll get rid of the dot one and press enter. So that's done. Notice in the right hand side it's recorded all of the steps that I'm taking and we can use them on any new files that are added to that folder. I don't have to recreate any of these steps again. Next I want to add a column for my sale amount.
So I need the order quantity times the sale price times the discount percent. We're going to add a column and we're going to multiply them. Now you've probably realized that it should be the order quantity times the sale price times 1 minus the discount percent. So I'm just modifying the formula.
This is a formula bar just like we have in Excel. What you see here is a Power Query formula and it's quite similar to Excel in a lot of ways. So I'll press Enter. Now we get the correct amount.
Let's rename the column Sale Amount. The other thing I'd like to do is round this number. So let's transform it. Round and round. We'll round it to two decimal places.
and that's better. Let's move it over here so that it's in with the related columns. The next thing I want to do is split out just the salesperson's first name.
So I want to find the text between the salutation and their surname. Now in Excel this would be quite tricky with formulas but in Power Query it's dead easy. I'm going to add a column from examples and I'm going to use the selected column and here I'm just going to type in Robert. I'm just giving it the first example.
and you can see it's correctly detected my pattern and if I'm happy with that I simply click OK. Notice up here it's giving me a preview of the formula that it's written in order to split that text out so I didn't even need to know how to write the formula. I'll click OK.
Let's give it a new name, we'll call this Salesperson. And now I'll go back and delete this column, just select it and press the delete key. I don't need it anymore. The next thing I want to do is calculate the number of days it took from order to shipping.
So selecting the ship date first and then the order date, just holding down control or shift. I'm going to add a column and I want to subtract the days. So it's out of my column here, we'll call this days to ship. And let's bring it over with the dates.
So they're together, pop it in there. And the last thing I want to do is remove not specified from the order priority. I'm only interested in analyzing the data that had an order priority.
So we're just going to deselect it, click OK. And now I only have data. that relates to orders that have been given a specific priority.
Now notice at the bottom it's showing me 999 plus rows. What I see here in the Power Query editor is only the first 1000 rows of data. Power Query only gets a preview of the data so that it can quickly respond to transformations that you apply. But don't worry because when it closes and loads the data, it will get the complete data set.
based on the transformations that you apply here in the editor. Now we've performed all these transformations, it's recorded all the steps so I can use them again on any files that I add to that folder in the future. But this is only on the sample file, remember we're working in the sample file.
If we go to the data query we get an error and that's because the last step here for change type is trying to change the type based on the specific column names. And these are the column names before I made the changes and fixed the column headers. Remember it's now Order ID and Customer ID and so on. So all I need to do is remove this last change type step. Now we can see our data.
I don't need this column here with the file name. So I'm simply going to select it and press the Delete key. And then what we need to do is fix the data types for all of our columns. At the moment it's using the Any Data type. So just with Shift held down, I'm going to scroll across to the right and select all of the columns.
And then on the Transform tab, we can detect the data type. Now this isn't formatting. Often people get confused between applying a data type and applying formatting. Formatting is done in Excel in the grid. Data types are done in Power Query.
And we set the data type so Excel knows what type of data we're putting into those cells. So here, 123 is a whole number. ABC is text, this is the date icon. You can click on these and choose a different data type if it's incorrectly detected it. I'm just going to quickly scroll across and make sure it's detected the data types correctly.
This one here is a percentage so I'm just going to choose percentage data type and I'm going to replace the data type in the current step. Let's keep scrolling across and the rest looks perfectly fine. So we're ready to close and load. We've got all of the data relating to the three files. All of the transformations that I made to the sample file have been applied to all three files so I don't need to do any more cleaning.
I'm ready to simply close and load. I'm going to choose close and load two. This brings up a dialog box where I can select where I want to load the data. Now in earlier versions of Excel you don't have pivot table report or pivot chart.
Here I can choose to load it to a table. which is an Excel table in the grid, or a PivotTable report or pivot chart which will load the data into the pivot cache and that will make it more efficient for Excel. I can just create a connection which will allow me to reference this query in other queries or I can load it to the data model.
Now if you load it to the data model you only want a connection, you don't want to also load it to a table. I'm going to load it to a PivotTable report and I'll click OK. Remember this is loading it to the pivot cache and there's my pivot table ready to build. Let's bring the field list over here so we can see it beside the pivot table.
So here I want to see the date by order priority. I want to see the order date. It's automatically grouped it for me if we go into group months, quarters and years. We'll leave it at that.
That's fine. And I want to know the sale amount. So there's our data, let me just format the numbers.
I have a touch of OCD about this, I like to see commas and no decimal places when we have big numbers like that. Let's insert a pivot chart, we'll put a line chart in. And let me just do a little bit of formatting.
We'll put the legend at the top. Okay, so there's my data for the three years. I've got it with Power Query.
I've analyzed it in a pivot table and a pivot chart. But what happens when I get data for the next year? Well, this is where Power Query really earns its worth. So let me go and grab the file.
Here's my 2020 file. I'm just going to control x to cut it out. We'll go in here and control v to paste it in. So there's my new file.
Now all I need to do is come into my excel file that contains my query and my pivot table. On the data tab I'm going to simply refresh all and there's my 2020 data. Power Query has grabbed the new file from the folder, it's run through all of those transformation steps and it's added it to my pivot cache.
and at the same time my pivot table and pivot chart have updated so you can see that if you're spending a lot of time gathering and cleaning and massaging data that power query can be a life changer i think you'll agree it's amazing i haven't needed to learn any programming languages everything was point and click easy so you can imagine how impressed your boss will be when you're able to get your work done in a fraction of the time Well I hope you're excited to give Power Query a try, whether that's in Excel or Power BI. If you have Excel 2010 or 2013 you'll find the Power Query download link in the video description, along with a link to my course for those who want to get up to speed with Power Query fast. You can download the file for this lesson from the link here and have a go yourself.
If you like this video please give it a thumbs up and subscribe to my channel for more. And why not share it with your friends who might also find Power Query useful. Thanks for watching.