running hangout is live it says ah let's wait for a second oh can you actually uh turn off of your screen sharing making sure that we can turn it back on and i see it live outstanding so good morning good afternoon good evening or in phil's case very very early it's actually his house isn't even awake and he's got little ones out there so very early is it still dark there phil it's very dark yes it's five in the morning i i appreciate you waking up um soon it won't actually be dark at five in the morning. It's going to start being light. This is an interesting story. This whole presentation, this webinar, and this topic that we've chosen is that at the last year's Microsoft Data Insights Summit, now rebranded as the Microsoft Business Applications Platform Summit, one of the people from A-Press came over to me and said, hey Chuck, we're always interested in getting new authors on books in the data science space. What topics do you think would be interesting?
And I said, well, DAX is definitely one of the hottest topics in our webinars, our blog posts, and our content in general. She goes, that's great. Do you have anybody that would be interested in a DAX topic? And I'm like, you know, one of my new MVPs, actually, because you were freshly minted at that point, if I remember correctly.
That's right. Was standing nearby, and I said, I'd like to introduce you to Phil Seamark, because he said he would be interested in doing things like that. How's the journey been?
What is that like? I want you to turn on your sharing, see if we can get you re-shared, and actually tell us all about that journey and talk about what we're going to present today in today's session. It looks great.
I can see it fine. Phil, over to you. Wonderful. Thank you, Chuck. I'm really looking forward to coming to this year's Data Insights Summit.
I already have my flights book and excited to meet as many people as I can over there on my book. Yeah, it was a great opportunity. which I jumped at. I wasn't quite sure how I was going to go. I'd never written a book before.
I hadn't written much before apart from blogs, but yeah, it was good fun. It was a lot of work and certainly as it got close to deadlines, I did have to put a lot of time into it, but I certainly, I learned a lot more about decks than I knew before, which I really enjoyed. I think when you have to put yourself in a position to not only explain a concept to someone else, but explain it in a way that you think they will understand, you definitely polish off a lot of the edges and if it's useful to other people when learning DAX, then that will be a brilliant thing. It's available on Amazon, Barnes & Noble, the APRES site.
It's ideal for someone who is learning. Power BI, but there is also some advanced content as well. In fact, what I'm going to be talking about today in this webinar is covered in one of the chapters in quite a bit of detail.
So if you do enjoy today's session and you would like to know more, then certainly if you go and get the book, it will be recovered in just one of the chapters with quite a bit more explanation and detail. So on there. Let's jump in because we've only got an hour.
I do have about two hours worth of content. Don't worry, I'm not gonna rush all of that through or go the full two hours, but let's see how much of this we can get through. Please feel free to fire questions via the chat window.
Chuck will let me know and we'll look to address that. At the end of the session, I will make the PBIX file that we're gonna be using available for download, including my notes. So don't worry about writing anything down furiously because the content will be available for detailed study. So on that, let's get into it.
What we're going to be doing today is not importing any data whatsoever into Power BI. We're going to be using DAX to fully generate all of the data that we need. The exercise of today is to generate... a fake data model that's semi-representative of a sales model for a company.
So it will look very, what we're going to do is we're going to create a date table, we're going to create a sales table, and we're going to do some things like summarize that sales table and look at different ways that we can optimize DAX. I'll show you how some useful, very free tools such as DAX Studio and SQL Server Management Studio can be used to help make your DAX run faster. So Let's launch into it and go over to my data view and here's my favorite tab for today, which will be create a new table. Now, if you're using DAX for Excel, this new table button doesn't exist for you.
So this is. very much optimized for Power BI, but a lot of the DAX techniques I will share with you will be useful in Excel. So the very first table I want to make is a numbers table. I'm going to call my table numbers and the code I'm going to use is generate series 1 to 100. I don't want to add columns. Here we go. We've now generated our first table, which is a single column table with 100 rows.
The 100 rows are controlled by the first and second parameters of this function. The third parameter can control the skip of the number generation. So if I change that to a two, we can see now that it starts with one and jumps every second number. and that's optional so we can take that out and just have a table with values between one and a hundred now this is going to be a utility style table which i'm not going to use in my end data model but i am going to use it to help create data in a random way let's say i don't like the column name here now i could right click on the column header and simply rename it using the ui or if i want to rename index i can use the select columns function.
I'm going to select the columns that I specify from the nested table and I'm going to call this column name n and the column name I'm going to use is the value from the generate series. If I press enter now it's effectively a DAX way of renaming the column so I have a single table called numbers with a column called n. with values 1 to 100 so inevitably every single data model you're likely to work with is going to need a date table so let's generate let's see how you might generate that in dax so we're going to go new table here i'm going to call it dates and we're going to use variables to construct this so let's start with our first one var Base table equals. When you use the var keyword, what you're doing is instigating a new level of variable scope. Now, I'm quite surprised even still today how some very experienced people who have used Power BI and DAX for a long time, and we're talking some MVPs as well, suddenly go, hey, wow, I didn't never realize you could use variables in DAX.
So if you already knew this, fantastic. If you didn't know this, I think you'll find it a really nice way to help break up. and make your complex DAX statements easier to build and maintain. So we're going to create a variable called base table, and we're going to store into this variable the output of a function called calendar. Calendar is going to return a table.
We're going to have to provide a start date. I'm going to start with the year 2016, the 1st of January. I'm going to end with today and when I click ok I'm going to get an error and that's because for every time you introduce a new level of variable scope using the VAR keyword you need to close out that level of variable scope with a return statement and I'm going to be returning that variable Oh, so we now have a calendar table with a single column that starts on the 1st of January 2016 and will go all the way down to today, which happens to be the 4th of April here in New Zealand. I know it's probably the 3rd of April for most of you. OK, now let's add some columns.
I'm going to create a new variable called add years equals. Now I'm going to use the add columns function to add a column to my base table variable. The column I'm going to add is going to be called year.
And the DAX expression I'm going to use is going to suck out the year value from the date column in the base table. And when I click OK. changes. The reason why nothing is changing here is because we're still returning the base table. So let's return the content of the add years variable now. Click okay.
we can see that we've added a column. Cool. I'm going to paste code in here so you don't have to suffer my typing. Now hopefully this is visible and not too small.
Usually on a projector I have to zoom in here. But what I have done here is added another variable called add months you can call it whatever you like so long as you don't use any other DAX keyword as your variable name and you can't use spaces like you can with column names and table names even if you use the square brackets around the variable names that that won't help so but the variable names aren't exposed anywhere apart from in the DAX calculation that you're working on so what this is going to do is we're going to add two columns in this case we're going to add columns to the add years variable and we're going to add a month id and a month column and if i change this i bring that down to the return statement click the ok tick that's going to add two columns here and one thing you can't do in dax is the sort by and hide features so if i highlight this column and say i would like to sort it by month id and hide this just to tidy up my report They're two things you still need the UI for. What I'm going to do now is add two more columns.
One of them is called addDay, which is using the same pattern of the add columns DAX function. It's going to add columns to the previous variable. the name of the column and then a dax expression which is going to generate the the value for the cell and we have a start of week here as well this this shows a nice tidy way that you can easily block your days into groups of seven and if we finally return the Last variable, click OK, and here we go. Here's a nice, simple date table.
Now, I can copy and paste this and save this code off to somewhere central in your source control system so that when you go and start a brand-new DAX model, you can just paste this in. Alternatively, you could have pretty much the same sort of logic set up in Power Query, or if you're lucky enough to have a data warehouse that already has calendar tables that are optimized and configured for your organization, then they're good too. Probably the main thing I wanted to show you was the DAX pattern of using variables as a nice tidy way to break up what otherwise might be quite a large, complex DAX statement. I could have generated all this in a single statement in one big long line as well.
But I think breaking it out using variables allows me to easily maintain, see what's going on. And plus you can throw in comments as well. So here's one way you can create a date table.
A nice alternative way to create a date table, I'll just throw this in as dates alternative, also uses variables. So we see here a variable of base table establishing the same single column set of rows. But we use the generate function, which is kind of like a...
cross join, Cartesian join, inner join style function to allow two tables to join together. We'll have a look at this a little more detail later in the presentation. But here we can see we have some DAX expressions being assigned to variables and we're returning the row.
Year returns the base year DAX expression. Month ID returns the base month ID DAX expression. And when I click OK, We actually get exactly the same table but just using a different notation. I do prefer the readability of this alternative one and like I said you can extend this to customize your own start of week or anything that's useful to your your own organization. So we have three tables in our model.
We have our dates alternative which we will park, we have a date table and we have a numbers table. So the next table I'd like to generate is a sales table and I'm going to combine my numbers and dates table to generate what hopefully will be a semi-realistic looking sales table with product IDs and quantities and prices and totals. So let's jump into that.
Jumping back to our modeling window, we're going to create a new table and we're going to call this sales equals. First table equals. What I'm going to do is generate dates with numbers. Now, remember, we've created two tables already in our model.
They are called dates and numbers. The generate function is going to smash them together in a Cartesian way, and we're going to get the number of rows. every row from dates combined with every row from numbers so let's see what that looks like return first table okay here we go so when i sort by date the first of january and we see every single column from the dates table and every single column from the numbers table which happens to be one if i scroll down We will see the numbers table hit 100, and then we'll flick over to the 2nd of January. So what we have is a table with 82,500 rows, which is probably reflective of we have 825 days between the 1st of January and today, and that's being multiplied out by 100 rows from the numbers table. But I don't want exactly the same number of sales for every single day.
I want a mix. I want a randomized mix of... sales per day.
So I'm going to filter my table and I'm going to use as a filter condition the value of n which is our last column is less than a random number. So ran between say two and seven. What this is going to do, if I sort by the 1st of January, is for every single day, it's going to return a random number of rows between, I'm always going to have at least two rows, but I'm never going to have more than seven rows.
So we see here for the 1st of January, we have four rows. For the 2nd of January, we have three rows. For the 3rd of January, we have two.
So it's kind of mixed up, not massively, but I could easily tweak this by simply changing the parameters of the two and the seven here if I wanted to create more. Equally I could increase the number of values in my numbers table so if I really wanted to quickly generate two million rows so I could test a DAX calculation on production like data then this is a useful and quick way to quickly generate data. Now I don't want to have all of these columns in my table I just want to grab the date table so I'm going to use the select columns function that I used earlier but in this case it's not designed to just simply do a rename what i'm doing here is i'm picking and choosing the um columns that i want to bring back to my core sales table so i want to suck out the hey phil um i have two requests uh from sot and safat is there a zoom that's actually in that status bar if you do a control plus it doesn't work here does it uh it may do i i don't think it does i don't think it does go ahead and do a control plus for me and no it yeah it doesn't work it So I have been listening to your comments. Oh, it is working. So you actually have Zoom installed.
Okay, because it's not a native function. Okay, go ahead. Sorry. Yeah, I do this in presentations when it's on a big screen and there are people down the back of the room that are squinting at the board.
And we have people in, I think it's Greece and Turkey who are squinting. So if you could zoom in once in a while, that'd be great. Okay, but just as a reminder, I will be making this PPIX file and my notes available.
in links so that you can download this and study it in more detail as well. But I will use the zoom. So that's fantastic feedback.
Thanks, Chuck. Now we have a basis of our sales table. We have a random number of days which we can tweak. I'll stick with the values that I put in.
And now what I'm going to start doing is adding columns to our sales table. So the next column I'm going to add, I can zoom in on this. is going to add a product column.
We're going to use the same pattern that we've been using for the date table. So I'm going to add a column to my variable. I'm going to, in this case, add a column called product. Now here's what I want to sort of focus on for this excerpt.
What I'm doing here is I'm using the var keyword again. Now what this is doing at this point is instigating a new level of variable scope. So we now have what we call a nested variable.
So I've created a variable called make and I'm going to assign it a random number between 65 and 90. I'm going to create another variable called my model, which is going to have a random number between 1 and 5. And then finally, I'm going to smash these two variables together to generate a product code. So let's have a look at what that looks like and then I'll explain. A little more about what that's doing. Okay. Okay.
So you can see I'm creating sort of realistic looking product codes. Triple U, hyphen 1, triple L, hyphen 5, triple T, hyphen M. So the triple nature of the letters is the repeat function is creating three of my make. Now my make is just a number between 65 and 90. Now they are the ASCII codes for the uppercase letters A through Z.
So what we're generating in this case is probably a number quite late in this 85. And we're using the unichar function to convert that back to its ASCII character. Now that's a fantastic function that you can do for things like KPIs and emojis and widgets and thumbs up and thumbs down. It's a great one to explore.
and we're simply concatenating with the my model which is generated here so we'll never have a number between we'll only ever have a number between one and five and we're using the standard microsoft format string syntax which converts the number or places the number into the to the hash so that's how we that's how i'm generating that but probably the most important aspect of this function is i'm using nested variables nested variables can reference variables that were declared and used at a higher level, but you can't do that the other way. And I am using the return statement here to close out this scope. I'm going to use a nested variable in my next statement, and I'll show you that the IntelliSense is smart enough to know that I can't use make or my model. So let's do that. Let's add a quantity column.
oh let's actually yep we've done that so the add quantity statement is a little easy to follow there run it and let's zoom in okay so we're using the pattern of we're adding columns to the variable and it's in my case it's the variable that i just created i'm going to call the column quantity and here i'm instigating a new level of variable scope here with this var statement and that var statement has a return. If I try to access my make and model variables for earlier, var x equals just doesn't exist. Whereas if I had tried this higher up, it works.
That one's closed. So coming back down. Now what I'm trying to do here is for my quantity, I again want to make the data semi-realistic.
So I don't want to have an even number of single purchases, an even number of, or I don't want to have the same number of single purchases to double purchases to triple purchases. So I'm using the logarithmic function to try and tail off the number of purchases made. Now this might be a little easier to show if I can. if I visualize this using a great tool called Power BI.
So let's have a look to see if this logarithmic function has allowed me to distribute the value of ones to fives in this column. So if we zoom back out, jump to Power BI, and go to the report canvas view, have a look at my field that I've just created, quantity, Into a bar chart and quantity on the axis So we can see here that the number of single purchases Was almost double those of double purchase almost again higher than we tail off down to the five so that that that works nicely But you can see How we're using Dex expressions inside the return statement inside nested variables to generate data on the fly that we need and it's very quick because DAX, as you know, is a very powerful and fast engine. So finally, let's close out our sales table with a price and total column.
We throw these in. So we're simply adding a price column, which is going to divide the scientific notation. a random number between 1 and 10,000.
It's going to divide that value by 100 so that we can have some dollars and cents. And then we're going to add a total column, which is using an expression based on columns added in the same DAX statement. So price and quantity as columns only existed a couple of lines up, but this shows that you can access those and use them inside DAX expressions.
So let's have a look to see what that now looks like as our sales table. There we go. So hopefully we might have some double sales here. If I sort by something else, there we go.
So we sold three here for a price of this with a total. Oh, that's quite a nice total. So that's pretty cool. Let's tidy up our model and create some relationships. Sales to date.
And we can hide the numbers table just to be tidy. Wonderful. Okay, so the next thing I'd like to do is imagine our sales table is very, very large, 10 million, 100 million rows. A trap I often see people fall into is they bring in a very large fact table into DAX, which is fine. But what they then do is they create visuals sitting on top of that very large table.
And often the granularity of the visual is much higher than the lowest level of detail in the fact table. So... So what does that mean?
Let's say we have a sales table with lots and lots of sales data that even goes down to the minute of the day and it's very very large but in the report canvas someone has created a line chart that only goes down to the the day at the very least or perhaps even the month. A good technique to make the report engine run faster is to create a summary version of your sales table that's aggregated just to the lowest level of grain needed For the visuals on the canvas and what you'll find is that Dex will obviously have a lot less work to do And your report user experience will be a lot nicer. So our sales table has How many rows? 2,800 that's that's pretty fast anyway, so let's make a summary version of this table down to day Which is going to take in our case as many as seven lines down to one We're going to go to new table. I'm going to call this daily summary.
I'll paste this in as much as we can. I need to, why is this complaining? I'm going to just scratch that and start again. New table, paste this in.
Okay, daily summary is already in our table. I'll just fix this typo then. Columns. Sales days I see what the problem is there's an S Wonderful, so we're using the Dax function called summarize columns.
There are three aggregation functions inside decks They summarize summarize columns and group by they each have slightly different Use cases where they will be the best of the three and I do detail that in my book Generally, for the most common scenarios, they're fairly interchangeable, particularly on smaller rows. My go-to and my preferred one is the summarized columns. It will, in most cases, be the fastest and easiest one to use.
So what this is going to do is it's summarizing our sales table down to one row per day, and we're creating two columns, daily item solve, daily revenue, and here's the DAX expression going to be used. So that's great. If I go back to my relationship view, relate this to my date table, what that now means is if I was to create a sum of daily revenue using my new table versus the original table, it's going to be a lot faster. Remember, every single time, every single dot on a line chart or bar chart has to execute its own distinct DAX calculation.
So the less work that that has to do, the faster your report will run. So let's go daily revenue. Cool. So if I was to copy this visual down, rather than use the daily revenue from daily summary, but the revenue from my sales table. Now these charts, the top and the bottom one will be identical.
However, if we're talking a sales table or a fact table with a a large number of rows the one at the bottom while it returns exactly the same result will have to work a lot harder to generate the same result so this is why when you use a summary table it's going to be a a better experience for your users. And I believe there might be some things on the radar that we saw in a roadmap release about a week ago from Microsoft that's going to make this a little easier to do. I'm really excited to look forward to seeing a lot more detail about this.
Hopefully we'll find out more at the Data Insights Summit. Okay, so now that I've created a calculator table on a calculator table. let's add a calculated measure to this just to show that you can.
So in my daily summary table, I'm going to add a measure, which is going to be the rolling average. I'm not going to go into detail about the pattern here. This is the standard DAX pattern that you are likely to use, but it's running a rolling average over my new summary table that I just created, not the original sales table. So go okay. let's bring that rolling average into the visual and if i put a slicer over the top of this so a slice to make this a little easier to see well let's make it a line chart so we can see the smooth line so the smooth The smooth dark line is the measure that we just created.
So that's showing the calculated measure on a calculated table, on a calculated table that combines our dates and sales calculated table. Okay, so one of the things I always like to address in my DAX is the common question of when should I create a calculation as a calculated column? And when should I create a calculation as a calculated measure?
For a lot of things you do, they... they almost look the same and there is an awful lot of overlap. So the way I like to explain the difference between a calculated column and a calculated measure is the calculations will execute at different points in time. So with a calculator...
I'll get it up real quick. Yeah. Pestovey Andre, I'm sorry if I just destroyed your first name Andre. If there are multiple date fields in the order table such as order date, payment date, shipping date, How do you use the same date table for all of the date fields? Okay, what you can do is take advantage of Dex relationships.
Now, you can create as many Dex relationships between your fact table and your date table as you like. So if you do have three columns in your fact table, such as order, ship, date, you can create three relationships to your date table. You can only specify one of those to be default or the Dex terminology is active.
Now the one you specify as being active means you don't have to write any code and any all your decks calculations that reference columns in both tables will automatically assume that's that's the relationship you want to use. And you can turn them on and off as well and you can control which one is active, but you can only have one at a time. If you want to use one of the other relationships, you have to use you have to write that relationship rule into your decks calculation.
and there's a function called user relationship which allows you to do that um i'll add that to my notes and i'll update that but um that's the mechanism that allows you to have a single date table connected to your fact table multiple times so yes it's possible but you do need to hard code a a user relationship function into your calculation but it works well it it it will work for you so i mean thank you very much that was a great question too by the way yeah i hope that has made sense so so calculated columns versus calculated measures depends on the point in time that they actually well they only they execute at different points so with a calculated column a calculated column will only ever execute at the point in time the data is refreshed or loaded into your model so it executes once the output of the the calculate calculation is put into the column and then that column is written to your data footprint It's as if you got that column in that format from the source system that you were importing from. Now, the calculated column code does not execute again. So the values in each row of that column cannot be changed by anything. Remember that DAX has no concept of update, insert or delete. So they're locked in stone.
Filters can be applied to decide to limit which rows are visible or which rows might be used by calculations. but they cannot change the values inside the column generated by a calculated column. I'm saying lots of C words here. Hopefully I don't trip up over myself.
calculated measures on the other hand when you write your calculation and go save it doesn't execute it's sitting there dormant it's not going to do anything the only time a calculated measure will actually execute is when you drag it onto the report canvas and use it in a visual so if you have a line chart with 10 points your calculated measure will execute 10 times when you drag it onto the canvas if you have a bar chart with a hundred values a hundred columns your calculated measure will execute 100 times over and over and over again each in the the filter context of the the axis where it's sitting in now this is a practical session we're not going to be talking much about the theory of filter context but when calculated measures execute the result of the calculated measure is used to generate the value for the point it's not stored in the physical model so it makes no impact whatsoever on your data footprint size for the model but you have to understand that every time you make a data selection or make a filter change any cal any data point on your visual on your canvas will re-execute that calculated measure so it's busy recalculating over and over and over and over again so what you need to do is make sure that any code you put in a calculated measure is fast snappy and really efficient calculated columns on the other hand you can afford to make those calculations maybe take take 10 or 15 or 20 seconds to execute because the chances are if you have a model that's only importing data once a day and it's happening in two in the morning it doesn't matter okay so let's have a look at this in practice because this the session is called practical DAX after all so what I'm going to do is show you two very very similar measures that do exactly the same thing one is going to be a calculated column one is going to be a calculated revenue so a measure so let's have a look at this you So what I'm going to do to my summary table is add a cumulative value. One is going to be, I'm going to have a column that shows the calculated revenue accumulating, and the other will be as a measure. And you'll notice that the pattern is very similar. Both start with a calculate function. Both use exactly the same DAX expression, which is to sum a revenue column.
We're going to be using filter to work out how to do the cumulative. nature of this this measure so let's add these column let's add these um calculations so first of all i'm going to take the calculated revenues column and over here a new column and paste that in and this is going to be the early revenue and dates and dates and go okay and just to make sure jump to my data view daily summary now has a new column that is accumulating by the amount of daily rooming so that's good now that executed at that point stored that calculation will not execute again let's go and grab my calculated measure version i'm going to add a measure to the same table add this guy in okay fix my pluralizing okay remember this code is not going to execute yet it will not execute until i use it in my report canvas so let's let's add that to our report canvas so we have i'm going to my date table i'm going to create a table in this case and i'm going to add my calculated Let's turn this into days. Calculated revenue as column and calculated revenue as measure. And if we zoom in, what we'll see is these are producing exactly the same value.
If I scroll all the way to the end, which I won't do because there's 800 days there, they are exactly the same value. So in this particular case, there's no difference. And if this is what I was trying to achieve, I would be far better to create this as a column rather than a measure.
So what's the difference between the two? Okay, let's copy this, turn it into a line chart. And you see the lines are sitting on top of each other. We have the two items there. If I throw a filter onto the canvas, and let's filter by year.
Turn this into a list. years. If I click 2017 What's happening here? The calculated revenue of column, the filter is just simply throwing out some rows and we're just showing on the screen the values that were calculated five minutes ago.
It's not thinking, all it's doing is retrieving some data and just sticking it on the screen so it's very quick. However the measure is recalculating but in the context of the filters and the rows it's allowed to now see. So the reason why the numbers are now different is because value for the 1st of January 2017 for the measure this is all it's got to work with it can't see any of the rows of 2016 so it has to start with zero but it is calculating it is taking however long it takes to execute that calculation for every single cell in this column now what I'm going to do is jump to show you some quite useful tools to help optimize and see what's going on under the covers and we're going to have a look at the the difference in speed between these to calculations so a tool that i really really recommend is dax studio and i've got an icon here on my desktop we're going to fire this up if you haven't used this already it's a i highly recommend it it's a free tool it's optimized for helping you write better decks Actually, there's a couple more questions. I was actually answering a question for Hero about splitting columns, nothing to do with what you're talking about.
And Sunil is asking, why is the grand total changing? So I actually wasn't looking at your screen. I don't know what he's talking about. Oh, the very bottom?
Yeah. So the total line? Yes.
So the reason why they are different is because they are calculations in their own right. The grand total in the middle column is actually summing, I think, every single column in here. because this is what's known as an implicit calculation. So we're not just simply placing on the value from the column. There is a, DAX is creating an implicit sum of the value in the row.
So the very large number here is just a sum of every single value above it, which is a static value. Whereas over here, this calculation is a calculation in its own right, and it should be. probably match the very last total there yeah because this is doing what the dex calculation asked and it's doing a sum of here with respect to these filters um but these filters are being they there are taking on the explicit filter coming from the slicer again i do cover this in quite a lot of detail in my chapter on filters from my book um but that's my quick fire answer i could probably spend an hour just talking about I think you got it.
Rathbardo actually answered it in situ as well. I think we're there. Thank you. Fantastic. Excellent.
Let's jump back to DAX Studio. If you've ever used a query IDE, this should be quite familiar to you. One thing I really like about DAX Studio is that you can connect directly to your DAX model, my Power BI desktop.
I'm going to do that by clicking here in my connection dialog. I'm going to connect. So I am in fact connected to my Power BI desktop file.
I can write DAX against this. Evaluate. I want to evaluate my dates table. And that's going to, when I run this, that's going to return my dates table.
I can filter my dates table to only show me dates. It's month, year equals 2016. Run that. And this is a great place to test and build your DAX calculations because once you have it right in here, you can paste it back into your Power BI window.
And there's some neat features like just simply formatting your query. This will tidy your very, very long strings into nice even breaks. It'll introduce line breaks, et cetera. You can output the results of the queries to files. If you want to just do one-off ad hoc extracts of your data model, that might be a useful thing to do.
You can clear cache when you run. But the features that I really quite like DAX Studio for are understanding what's going on under the covers. So let's have a look.
at the two calculations that we just created, the calculate, the cumulative measure by columns and measures. sorry i'll just grab those as dax calculations and paste them in and you'll see what i mean so let's evaluate this paste this in into this window sorry so what this is going to do when i clear cation run and hit the old run button this is going to return to the screen exactly the same table that we just generated in power bi desktop so we have a column of dates and we're showing our calculate revenue as column and calculate revenue as measure but let's see which how each one performs so if i comment out second column rerun we're saying we're only querying the calculate revenue as column let's turn on our server timings and our query plan i'm going to run this now and not focus on the output, I'm going to jump to the server timings. If we zero in down here, this corner, what I'm really interested in is the output of this section of the result panel.
And it's telling me that it took 10 milliseconds to satisfy that query. And of that 10 milliseconds, it spent nine milliseconds in the formula engine and one millisecond in the storage engine, SE for storage engine, FE for formula engine. Now, If you want your DAX queries to run quickly, you want them to be doing as much work as they can in the storage engine.
And all the storage engine does is simply go to the memory and retrieve the data and present it. It doesn't do anything. Whereas the formula engine actually performs calculations and does your number crunching. So let's have a look at how the other calculation, the measure, stacks up. So we comment out our calculated column.
and show the calculated measure, clarification run, we're going to get the same result set, but when we look at the server timings we'll see some quite different numbers here. So it wasn't 10 milliseconds, it was 220 milliseconds. We spent a large number of amount of that time, 98 percent in the formula engine and that's because To generate this data, it had to be doing this calculation over and over and over again.
How many rows were there returned? I think it was 816 from memory. It's just going to tell us 825 rows. So our calculated measure actually executed 825 times, each in a slightly different context because it had to consider the day that was being passed into it and the total amount of time it took.
to satisfy those 825 calculations is 217 milliseconds. Now that's not bad but compared to 10 milliseconds I know which I would rather prefer. So getting back to the original question when should I use a calculated measure? When should I use a calculated column? We know there's lots of overlap but if you're bringing in data to your model that you know that you're not you don't want to have it changed by filters that you're going to present to your user then by far a calculated Column is the better one to use however, if you want your numbers to be responsive to be dynamic to respect filters and selections then yes definitely use a calculated measure but use tools like Dax studio where you can see and understand how much of the time the calculated measure is Spending in the formula engine versus the storage engine then you could rewrite it you can often write a calculation in six different ways often there are new dex function functions coming out that at the surface look like they do as a previous dex function and it's not clear why we're getting a new function when we've had another one for years that seems to work fine but the new function can satisfy the result by by using the storage engine much more effectively and do the spend a lot less time in the formula engine so this is a this is a great tool to um to use to to get into um optimizing your calculation so hopefully that makes sense and um is useful what you can also see for the tc call gurus amongst us in the execution plan are these pseudo t-sequel statements saying exactly what the calculation actually did so in this case it select date sum of daily revenue there's probably a group by somewhere along this statement if i if i double click on it maybe not another great use of deck studio is Being able to run DMVs against your model, some useful DMVs that I like to run, perhaps object memory usage.
If I double click on this line here and run this as what looks like a T-SQL statement, this will spit out a result set that I can export to a file or import to a Power BI model that shows for every single object inside your model exactly how much memory it's using. So I have used this quite effectively on models where I will point it at a Power BI desktop file, and it will say, hey, you've got these columns over here that are using 90% of your memory. I'll double check to see whether those columns are being used anywhere, and if they're not, I'll be able to delete them safely and shrink the data model down.
In an extreme case, I worked with an analyst who sucked in every table they thought they needed to use. And when... when they went file save as their power bi desktop file was 800 megabytes it was a very big model i used this dmv and i i only identified probably two or three identity columns these were highly unique columns and very large tables we knew we weren't going to be needing them or using them for the report we deleted the columns We resaved the Power BI file and it was 40 megs.
It was a massive difference. It was so much faster to use, so much faster to save, so much faster to publish and import data for. So I do highly recommend some of these DMVs as alternative ways to make your models faster.
Another one I like is the tabular model schema measures. When we run this DMV, this is going to output a result set that will show you the calculated measures in the table. We can see their names and we can see the formulas that we used as well. So this could be a useful calculation to help you build out your documentation. You can run multiple queries at a time.
What else can you do? There's all sorts. DAX Studio is fantastic.
It's free and they update it regularly. So I highly recommend you download it if you haven't already got it. If you have got it, use it more. The other...
tool that I like to use when I'm optimizing is SQL Server Management Studio. If you're already using T-SQL or other SQL Server tools, you probably have this already. It is free, but it is an 800. Actually, on that topic, I'm going to interrupt just for a second. Sunil's asking, because you're actually going to cover that right now. You're on the cusp of that, is calculated column may take, oh, we just scrolled off the top.
Calculated columns may take a lot of time. How do you go out and take a look at the performance heuristics and actually go out and optimize that? Now, take it away, Phil. Okay, so you can use Dex Studio to suck out the calculations and run it in Dex Studio, and that'll give you the breakdowns of the query, the execution plan, and the timings, as we've just shown.
Or you can use SQL Server Profiler, which we'll jump into. So what I need to do is jump back to Dex Studio to get the port number that Power BI is listening on. And if we zoom down into the status bar, we can see that we're using 61776. If I close and reopen Power BI Desktop, it'll be on a different port. There are tricks that you can use to lock this down.
Let's grab that 61776. See how my memory is at this time. 61776. Now, SQL Server Management Studio is about 800 megs if you want to download it. It is free. And when you do fire it up, you want to connect to analysis services server type. That's a fancy way of saying.
my power bi desktop model and you do need to put in the port number that's relevant for your instance let's connect to this and we can actually see my model i can even run mdx who here likes mdx you all do fantastic states so this is running mdx against a power bi model not useful for anything at all except for the fact that um not returning anything doesn't matter the main thing i wanted to share with you about sql server management studio is a fantastic tool called profiler now that i'm connected to my power bi desktop model what i can do is hit profiler i can turn on a bunch of events such as query events begin beginning there are other ones i could turn on let's stick with that This is clear. If I jump back to my Power BI model now and start doing some things, you'll see the events are appearing here. And let's go to my calculated column and maybe make a slight change.
Let's just break this calculation by doing this. Run. There we go. So in the background, I have captured using the profiler a lot of detail about that calculation, including the DAX itself. So this is the DAX that Power BI runs against it.
And I can see some server timings and there are other events I can turn on to capture the inner steps that the calculation actually took. Now, this is really getting into an advanced topic. All I wanted to share and highlight.
was that it's available and here's how you can go to get into that and there are some fantastic blogs particularly by um sql bi marco russo burdo ferrari about how to interpret the execution plan and how you might use that to try and get your measures using more of the storage storage engine and less of the formula engine so we have five minutes left so let's jump back to our model are there any other questions um up at that point how are we going there chuck We're actually doing really good. There's a couple questions about the exam 778. Sure. Have you taken that yet? I have not taken that yet. No, no.
I'm just too cheap. I know it's a fantastic... You know, it hasn't been updated, so it's going to be missing a ton of the latest information.
So I can tell the people that are asking, you probably don't need to actually study anything that's been introduced in the last eight months because it hasn't been touched since then. i am aware that it's quite broad it covers a lot of aspects of power bi including some power query syntax so if you are used to just using power query using the ui the fact i mean and let's face it you're doing 90 hours you definitely have to know as to map yeah yes but if you if you can pretty much do 98 of whatever you need using the buttons on the toolbar and power query if that's all you do then you probably need to brush up on your um basic and really common um I agree. There will be some helpers. I think there are some sample tests and ways that you can just brush yourself up before you do that. Devin Knight at Pragmatic Works has a huge blog post on it.
So, Kieran, check out Devin Knight's blog post. And he's one of our MVPs. Okay, go ahead. I don't want to steal your last nine minutes away from you. No, I'm always jealous when I look at the people who post on Twitter that, hey, I've passed it now.
So, I do need to get around to that. So, okay. So, how long have we got?
Five minutes or nine minutes? My clock is wrong, as we know. So probably whatever your clock is, we don't want to go late.
No, I don't want to go late. It's saying 57 minutes past. And the next thing I'm going to launch into is quite long.
I'll just share you the screen of my notes, because this is what I'm going to post into the channel, probably the YouTube channel. So if I scroll up to the top, we can see all the calculations that I have used. I know this is a bit boring. But as we come down, I have got comments here.
Today we've managed to get up to here. These are the calculations. So after here, I have got some quite neat decks that talks about how you might use different summarized approaches and techniques. Again using variables, we're going to look at using natural and a join.
So these will be concepts more familiar to T-SQL people. Summarize, cross-join, alternative techniques for ranking, double grouping. So when you want to do an average on an average, that's often not as straightforward as you might expect using DAX.
And it's where the group by function comes into its own. So you can see that used here. Again, I try and explain what the calculation is going to do. I'm available. You can definitely contact me on Twitter, LinkedIn, email for questions.
Obviously my book is available. Probably the last section of these notes, which I find really helpful, are showing different techniques of inner joins, some hacks around how to get inner join working when they're coming from two different lineage tables. And finally, a neat example about how you can pivot perhaps staff and days into a visual that you can present to see how many people we have active at any particular time.
So we're really showing how you might use DAX to pivot columns to rows or rows to columns. So I will make sure that that's available attached to this webinar. So unless there are any other questions, Chuck?
There aren't. We're actually doing really, really good. And I'm going to actually post in the last link for the exam.
And that's actually by one of my co-workers named Dusty, a good guy out of Florida. And we're actually going to have Phil close it. So thank you very much, Phil. No, no worries. No, it's been good fun.
Okay, we'll go ahead and stop the broadcast. Hang on a second for me, unless you've got the Google ones out. Okay, thank you.