Transcript for:
DAX Basics and Concepts

Okay, it's 10 o'clock, so I guess it's time to start. Welcome, everybody. I'm... Welcome, everybody. I heard... No, no, no, no, no. It was just I heard a noise. I wasn't asking for you to say welcome again. Remember, you are American, I'm not, so I'm not used to this kind of, yeah, welcome, I'm super excited of being here, that's not my style. Before we start, let me understand a bit of, one thing which is important and I was forgetting about it, I need to take a picture of the room. What's wrong with you? That's a Windows phone. Hopefully it worked. It's a beautiful room full of PC. Now, before we start, I need to understand just a bit of things. So a couple of questions about you. The first one is very easy. Have you ever used DAX? Who ever used DAX? Most of you. Did you author, let's say, complex measure or just some average and stuff like this? Do you author complex measures with DAX? Few of you. That's perfect. Okay. I don't need other questions. I don't need other information about you. We are going to deliver two sessions, which in reality is a single long session of four hours, which is basically an introduction. This morning we will see the basic stuff and you are expected To do some typing on the screen just to get used to the language And try several formulas. There is nothing complex. So don't expect to have tests. or something like it, just type in some DAX code to see the effect of what is happening. Whereas this afternoon is much more theory. So we will need to dive into the concepts of evaluation context, understand how they work. And so there is much less lab, much less things to write, but much more important concepts to understand. That basically means don't eat too much during lunch because it's very easy to start sleeping during a theory uh last question how many of you are bi consultants and how many of you are data analysts let's go for the bi consultants data analysts oh that's scary okay because i'm used to speak with bi consultants so i would probably need to manage things now let's get started who am i i'm alberto Ferrari. I'm coming from Italy, but I guess you hear from the accent. Coming from Italy. Mentally means a couple of things. The first one is that I'm very used to loud people. So if you have any kind of question, don't raise your hand and wait for me to give you permission to talk, because that is the perfect way to die for starvation. Just start speaking loud enough so that I can hear and interrupt me at any time. I'm totally used to that and I love interaction with people. So don't be shy. If you have any question, just ask. And the second one I forgot, so it will come sometime later. I'm a BI consultant. I founded with marcosqlbi.com. If you work or if you play with DAX, it's very likely that you will land to sqlbi.com to learn. because there we publish all the articles, papers, and thoughts that we have in mind with business intelligence. And then we have SAS, MIS, MDP, MCP, a lot of different names, but these are not important. One thing that is important is that Definite guide to dax because if you are serious about dax so if you want to learn something about dax that is the books to buy the book to buy it it's not an easy book it's pretty hard but it's a really necessary if you want to author complex tax code that is as of today i think it's the most comprehensive guide to the tax language that's it let's start speaking about dax That is the language of power people. Of analysis services, of Power BI, all the tools that implement the analysis services The analysis services inside it. The best definition that I ever heard About DAX was given from a student a few years ago in Ireland. At the end of my course, I asked Them, now, what do you think about DAX? And he answered, DAX is simple, but it's not easy. The concepts of DAX are simple. The ideas, the code, everything is straightforward. Let's say don't expect DAX to be an easy language. Say in other words, if you are totally new for DAX, Don't expect to learn DAX in four hours. That is simply impossible. What I expect to teach you in these four hours are where are the hard parts where you will need to spend time studying. So I, through the new buys, I hope to set a good background so you have an idea about what DAX is and to understand where you will need to study more. For the ones of you who already know DAX, well... My hope is just to give you a more solid, a more sound background about what evaluation contexts are and what is the right way of writing DAX code. It's a new programming, it's a new programming language, and it has a small problem. It's not a small problem, it's a characteristic of DAX. That is a... Typically, people learn DAX the same way you learn the Excel formula language or Visual Basic or SQL or any other programming language. That is, you do what everybody does. You go on Google, you look for some formula that resembles somewhat what you want to compute, and then you start using it. Good chances are that the formula does not totally compute what you want, and at that point, you need to modify it. With Excel, that is easy, and that's the normal way of learning a language. With DAX, this technique does not work. And I'm telling you because I did it at the beginning, and it took me a few months before I said, OK, I need to start again from scratch because I'm not going anywhere. DAX requires you to study theory. And I know that Excel people and data analysts don't like the theory of programming language and this stuff, but it's really, really needed. If you don't understand theory, you will end up with formulas that typically compute the correct value. And then for some special case, they compute something totally different, and you have no idea what is happening and what number is coming out. And you can guess people taking decisions based on numbers that you don't trust. It's not a good scenario where you want to live. So it's much better to spend some time, learn the theory. I will be very pedantic from time to time, hopefully from time to time. And when this happens, just... try to follow me don't think that you understood everything because as soon as we start mixing formulas and writing more complex code you will see that theory was important okay that said this is a typical dax piece of code can you read it from the bottom or should i zoom in it's good okay this is a typical expression in dax there's no need for you to understand what it does right now But it's just to show you that that is a functional language. Functional language means it's like the Excel formula language. Also Excel has a functional language, meaning that the full execution of code is just a single function code. In that case, you see there is a sumX, which is the first function code, and then sumX has two parameters. SumX has, that formula is wrong, sumX of a filter. Filter, it contains an error. That's totally stupid. It filters the values of the date year and then for each date year it checks if the date year is greater than 2000, then it computes the value, otherwise the value multiplied by 90. But the point is not understanding the formula, it's more looking at how it looks. It's just a single function code and then the parameters of the function code define the logic of the behavior. The execution index starts from the innermost. You go to the innermost parameter and that is the first part that is executed. Then this result is used by outer function until they reach the first one. The second point why this formula is important is because people coming from Excel typically write all their formulas into a single line. So everything is written in a single line, a text box that is nearly impossible to read. And they tend to do the same with DAX. It turns into a nightmare. It's nearly impossible to read a dax expression if it is not Properly formatted. Formatting is pretty simple. You just write the function name and then all the parameters Indented inside it. Otherwise if you don't follow Some formatting rules, using dax is nearly impossible. Now, dax is a programming language and all the programming have some data types. DAX is strongly typed. If you are coming from Excel, you probably are not used to that idea of defining a data type for every column, because in Excel you typically have your worksheet, and on that you put values into cells. Every cell can contain whatever you want. DAX does not work this way. DAX is based on tables. Tables is divided into columns, and each column has a strong data type. The data type can be a number of columns. or any other data type. Basically, there are only two important data types, which are numbers and everything else. Numbers are integer, decimal, currency, daytime, and Boolean values. They call them true, false values. I'm used to call them Boolean. And these are all considered numbers. Other data types, like strings or pictures, binary objects, whatever kind of data type you might have, these are other data types. The difference is somewhat important because there are functions that work only on numbers and there are functions that work on any data type. If a function works on any number, then it works on any of the first set of data types. So you can, for example, take the sum of a boolean, doesn't make much sense, but you can do that if you want, whereas of course you cannot take the sum of a string. Since the people that created DAX They basically took the Excel formula language and the SQL language and MDX. They shake everything together and then DAX came out from the recipe. There are functions that work only on numbers just because they wanted all functions to be compatible with Excel. So if you are familiar with Excel, you will be in a very cozy environment because it basically has the same set of functions and the same behavior. DAX has a somewhat strange type handling. Can you still read it? That seems really, really small. Can you read it from the bottom? Say yes. Say no. No. Of course no. I cannot read it from here, so I wonder how you could read it from there. I'm scared that this might happen more than once because they edited those slides after I sent them. Now it's much better. Okay. What does operator overloading mean? It's meaning that even if you have a lot of data, you have a lot of data. if DAX is strongly typed, you can write complex expressions mixing data types at your will. And conversion happens when needed. So you can have two columns, one column containing One column containing a string and if you want to sum them That permits you to do that. It tries to do conversion in an Automatic way. So for example, you can take Five and four that are strings. You see they are enclosed in Double quotes but because you use the plus operator, plus Transforms the two strings into numbers and returns you a number As a result. So the sum of two strings is a number, whereas the ampersand, which is the string concatenator concatenates to string and you can pass it pass to it to different numbers they are converted to strings and then concatenation happens the idea the idea of Microsoft is that conversion happens when needed and you don't need to worry about the data types of your columns this might be useful in the case where you don't know the data type of the columns in the data model you are working with now I have nothing against people that don't know the data type of color Of their data model. But if you are not used to data Types and if you are not used to know your data model, then There are a lot of jobs which are interesting, but doing data Analysis and working with the data model is probably not the Best for you. What you should do is know Your data model, know exactly the data type of each column And never, ever trust automatic conversion. If some kind of conversion needs to happen, You know it and you do and you handle it by yourself. Don't trust the engine because that means you are introducing Errors in your code that might happen because of conversion Failing. And errors will be your worst Enemy, especially speaking about performance. The presence of errors in your code is going to kill the Performance of any dax measure. So you basically don't want Them to happen. So I had to show the slide and now that you know the behavior you can forget about it and never use it in your life. I told you this was going to happen again and again. You know the feeling when you prepare a Zydeck and you find it totally modified by somebody? It's not nice. Okay. Among the many data types, there is one which is worth a bit more attention, which is the daytime. are handled in DAX, I think, the same way they are handled in Excel. I'm not an expert at Excel at all. So, day time are stored as a floating point. It's just a number that contains an integer part and then a decimal part. The integer part is the number of days after the 30 of December 1899. The slide is right. It's not the 31 of December, it's the 30 of December. So, 1 means the 31 of December of the previous century. I honestly have no idea why the team decided to use zero for the 30th of December, but that's it. Whereas the decimal part is the fraction of a day. So if you want to represent 12 hours, you divide 1 by 2, you write 0.5. That is 12 hours in the DAX language. And you can trust this behavior. So you can basically create a... Create a measure, create a calculated column that sums and subtracts numbers from dates, and that is going to work in a straightforward way. If you want to remove the fractional part of a date, then you simply remove the decimal part from a number, and as a result, you get the date without the time. Keep in mind that, again, speaking about date times, any calculation on date, like sum, subtraction, or summing dates, is expected to work after the first of March of 1900, so of the previous century. Any calculation that happens before might lead to errors and mistakes. You have an idea why they introduced this strange behavior? No is the totally legal answer. Many of you are much younger than me, unfortunately, for me, but... In the 80s there was a software that was called Lotus 1-2-3. It was a very popular spreadsheet. And the guys who wrote Lotus 1-2-3 introduced a bug. They thought 1900 was a leap year, whereas 1900 is not a leap year because 19 cannot be divided by 4. So centuries are leap years only 1 out of 4. The other ones are not leap years. That was a bug. They didn't know it, and so they introduced the problem. When the first version of Excel came on the market, they said, oh, we need to be compatible with the competitors. So we are going to introduce the same bug. And so Excel believes that 1900 is a leap year. And since the first version of Excel, that bug was deliberately introduced in all the versions of Excel. And now we have it in DAX, in Power BI, and all the stuff because we are compatible with a software that does no longer exist in any place on the planet. and i believe that at some point somebody will say okay let's say that 1900 is a leap year let's change the rules because we need to be compatible with excel now let's move to the demo machine instead of looking at the slide let's move to the demo machine Keyword. Okay. I will basically do some demos and write some code. If you want, if you think it's useful, you can replicate Everything on your pc. I will try to not run too fast. If you prefer just to look at the screen and follow the Lecture, that is fine for me, too. You're not expected to follow a lab in a very specific way. So do nearly whatever you want. But what i wanted to show you How to create, for example, a column. In order to do that, you should have on your desktop probably a Folder that contains only one pbix file, that is a power bi Desktop file, that you can use. You can simply open it. Where is it? Open it. And we will run all the demos and all the code only on top of This data model. This is a very simple data model. Oh, I forgot to ask a question. Who has never used Power BI? Only a few of them. I'm not going to explain to you what Power BI is and how to open Power BI Desktop, so just try to follow what I will do, just because all the remaining people would be worried if I start explaining all these details. So I will simply use Power BI, expecting that you know where to go and what to do. We will use Power BI for all the DAX code. Model contains date, customer, sales, store, product, subcategory, and category. So it contains many, many tables with a very simple set of relationships. The only complex one is the one starting from sales and going to product, and then subcategory and category. And the data set is coming from Contoso that sells good in stores and on the internet. It does the same job that before was done by Adventure Works, but Adventure Works was selling bikes and this is selling electronics and other stuff. So it's a very standard sales data model. Now, you can easily create new columns by writing a new column either in the data view, so when you see all the data. or stride in the report view. I typically do that in the data view because so I can immediately see the result of the column that I compute instead of creating the column and then using in the report. I prefer to write the code stride here. And for example, you can create a calculated column here in the sales by creating a new column. You need to provide a name and for example, let's say use Name sales amount where sales amount you can click on unit Cost and it provides sales unit cost. It's probably easier to zoom Both for you and me. Because getting older, I'm no longer able to read small scripts, small things. Sales unit cost, and then you typically, not unit cost, but sales unit price. And then you multiply by sales. quantity this is going to create a calculated column and the code that you write in your formula says oh I want to multiply unit price by the quantity the general syntax the complete syntax of referencing a column is slightly more complex than this. You should enclose into single quotes the table name and then into square brackets the column name. So the standard syntax is... Single quotes and then the table and then square brackets and the column name. In reality, the single quotes can be avoided whenever your table is a good name, So it doesn't start with a number and it doesn't contain spaces. In that case, you can avoid the quotes, which are annoying when looking at the complex formula. And if you want, you can also avoid putting the table name And just write unit price multiplied by quantity this is legal too it simply multiplies unit price by quantity and if the table name is not included in the specification of the column it simply search searches in the current table for that column as it already happened this This is something that I tell you once and I ask you to forget immediately. Whenever you write a reference to a column, like in this case, please always, and I really mean always, put the table name in front of it. I can I cannot tell you the reason right now. It will come later today. But there is a very good reason For that. It's not dax. It's not any problem with the dax language. The formula is totally clear. The problem is huma. When i read an expression, i expect to find the table name In front of any column. And then if you hit enter, all Sales amount is already there. So we do sales amount without without space. And now you see that we created a column, a new column in the model, that is basically the quantity multiplied by the unit price and contains the sales amount. This is very simple. It's very Excel-like. typically all the users coming from Excel love calculated columns because they are simple they show the calculation in a very simple and easy way but what I wanted to tell you is that we are creating a calculated column calculated Calculated columns are computed at refresh time. So when you refresh your model, calculated columns are Computed. And this is fine because they Might include some very complex code that is executed only once At process time, at refresh time, and not at query time. But they have a drawback. That is, they use ram. They use space in your model. So if you create a calculated Column on a table that contains a few hundred thousand rows, That is not an issue at all because the table is very small and the model works. just fine but if you have a very large table with a few hundred millions or few billions rows and you create a calculated column there expect it to use a lot of ram and ram is expensive you don't want to waste ram just by creating calculated columns so use them with care the second point and it is the first pedantic moment of the day is this point You see that the sales amount value is computed for all the rows. And in order to get the value of sales quantity, the value which is highlighted there, I write sales, that is the table name, and then quantity, the column name. There is a small piece of information which is missing. So when I... Write sales quantity, and I want to transform it into a value that is a number. What is the information that is missing? Or try to think it in a different way. If I ask to anybody of you, can you please give me the value of sales quantity, what would you ask me? Which row? Sales quantity doesn't mean anything unless you have a current row. If you have a current row, then sales quantity has a different value for every row. Of course, I chose quantity that doesn't have different values, but if you take, for example, a unit cost or other values, every row has a different value. When you write a calculated column, you don't need to specify the current row, so the row from where you want the value to come, because this is somewhat implicit. it's It's a calculated column. It's computed row by row. So for every row, this will have a different value. As we will see later, this is not always the case. In case you want to get the value of a single row, you will always need to specify for which row you want the value to come from. Otherwise, that will return you an error. Okay? Oh, I'm not used to that. Let's go to the slides again. So the standard syntax is table name in quotes, then column name in square brackets. The square brackets and the column name are the only mandatory part, but get used to write the code in the right way. And we have... Just created a calculated column probably a different one than what is Here in the slide you can create a sales amount column by multiply sales amount Sales quantity by sales net price that creates a column and the column is then Visible for all the rows now calculated columns are basically simple ducks expression that compute values compute the value for the current row and whenever you write product price what you really mean is the value of the price column in the product table for the current row please remember that current row because it's going to be important later from later now calculated columns are easy to author they are easy to understand especially for excel users but they are not the only way of creating calculation index there is another kind of calculation which is known as measures let's see that with an example and again you can do that with me while i while i write the code now we create a calculated column that computes the sales amount imagine that we want you want to compute the gross margin so you want to show the gross margin in a report let's go to the demo machine please thanks now we have the sales amount and we have Net price. Imagine that we want to compute the gross margin. The gross margin can be computed as another calculated column. So I create a new column and I call it gross margin. Well, the gross margin is basically the sales amount. No, sorry. The gross margin is the net price, so sales net price minus the sales unit cost. And that is a good number. It says that for each of those products I have $4.9 of gain. And of course for different rows I will have different values. I can use the gross margin in a report. So I can go on the report where I have calendar year, I have the month, I have the quantity. And in the sales I can take for example the sales amount. and i can put the gross margin and this number have some meaning because i have the sales amount which is the amount sold and i have the gross margin how much i gained from the sales that i did imagine that now we want to do something slightly different i don't want to see the gross margin as a value, I want to see it as a percentage against the sales amount. So I don't want just a number, I want to show it as a percentage. I can follow the same technique. So I go... here on the gross mark on the sales table and i create a new column and this time we call it gross margin percentage gross margin percentage or whatever you prefer and i compute it by taking the sales gross margin and divide it by the sales amount The one without the space because we need to use the column we created a few seconds ago. The one with the space is a different measure. I get a result. It's 0.49 and blah, blah, blah. In this way it doesn't mean anything. It's much better if we format it as a percentage. Okay. Okay, now it says we sell at 49.05% and that is fine. For each individual row we computed the value and looks like useful. Now I can go to the report, I have the gross margin and I simply click on the gross margin percentage. And the result, as you see, it's not really sexy. Because what is happening is that it's not computing the gross margin per month. It's not computing the gross margin per color or whatever you put in your report. It's summing the gross margin percentage that happens. happens row by row. So this number has no meaning at all. The point is that in such a case, I need to do a calculation that cannot be done row by row and aggregated later. What I need to do is do the calculation in a different way. First, I aggregate the values, and then I perform the division only later. To do that, I cannot use a calculated column. I need to use the other way of computing values in DAX, which are measures. Measures, which if you are used to Excel 2013, in Excel 2013, 13 are called calculated fields then likely they gain again some brain and they went back to measures measures are created instead of using a new column you go to a new measure here it is you can create the new measure from here the user interface is the same you don't see the measure in the columns because the measure is not computed row by row And you can compute the measure, our gross margin percentage. Let's add measure so we know it's a measure. And this time I can start to write the same expression, which is sales amount, sales, sales amount. divided by sales, no sorry, it's sales gross margin divided by sales amount. That's the very same expression that I used a few seconds ago. The point is that if you try to write it, as a result, you get an error. The error, as it happens for all the errors coming from Microsoft, is so long that you typically don't even read it because it contains somewhere the description of the problem, but it's probably easier to look at the code and try to understand what is wrong. What is wrong here is that we are writing a measure. We are no longer writing a calculated column. And so if I write sales gross margin, you should ask me, oh, you want sales gross margin. For which row do you want sales gross margin? and where whereas i had a current row in a calculated column so i could use the column name to get a value in a measure there is no concept of current row the measure is computed at the report level over an aggregate of values so you can you simply cannot get a value for the gross margin and in that case what you need to do is surround your column reference with an aggregator so you write sum of sales gross margin divided by sum of sales amount this is going to state something totally different that is i don't want to get the value of gross margin i want to get the sum of the values of gross margin for all the role that are in the current cell of the chart I'm using. So if I add just some around the two, then at this point I should be a bit more happy, just because the measure at least computes correctly. I can format it as a percentage, and then I can go in my report, add the gross margin, hopefully, yes, it's a measure. And now the gross margin shows as a correct, And of course, it will compute the value no matter how you make the report. So this is just to say that you can easily live with Power BI without ever writing a single line of DAX, as long as the only kind of aggregation that you want to perform is a simple sum, count, or distant count, the calculation that happened to be striped inside Power BI. But as soon as you want to compute something like a simple percentage, then it's time to roll up your sleeves and start writing some sort of DAX code. The difference between measures and calculated column is that measures, calculated columns are computed at refresh time and stored in the model, whereas measures are not stored in the model, meaning that just the source code of the measure is stored in the model. But then the calculation happens at query time when the users start to build reports based on that. Now let's go back to the slides. So you see that calculated columns, if you try to compute the gross margin as a calculated column, it simply does not work because you get the sum. And let me replay, put everything as it should. I'll do that once and forever, so if you have any questions, you can ask it now, because I don't want to stop every time, because I need to fix the slides. No questions at all? So I have a question for you. Yes and no. Or in reality, you already know the answer. It's always the same to nearly all the questions you might do. That is, it depends. Saving space is always a good thing because you are reducing the amount of RAM the tabular needs to scan whenever it needs to query your model. So having a smaller memory footprint typically means having a faster engine. In the case of calculated columns, if you think about performance, many people think that using a calculated column is good to improve performances. In reality, this is very seldom true. Most of the scenario a measure works. the same if not faster than then imagine a measure works faster than a calculated column so the if you need to do any kind of calculation using a measure is your best option if the calculation you need to do is very very heavy very very complex then aggregating everything into a calculated column might make sense imagine for example that you work in the the stock market and you want to compute the moving average over 200 periods of some value the value of the stock or the number of customers or whatever so you want to compute a moving average and then you are typically you typically want to use this number in a chart that has the day as the granularity so you need to compute this value for hundreds of times where every time you need to compute the measure for 200 different values it's a very heavy calculation and you require that value for many many times into a single a a simple chart. In that case, it makes sense to pre-aggregate the moving average at the day level into a calculated table or into some table so that to have the value already present and when you draw the chart, everything is much faster. But that is a very special case. In most of the cases, when you do some aggregates, several simple calculations using a measure is your best option. Because the performances are good, are typically comparable, if not faster, than the same with a calculated column. And then you rely on calculated columns only when you are in trouble. So performance are bad, and then you pre-compute values earlier. now measure the difference between measures and calculated column is that they are written using that they do not work row by row instead they work at the aggregate level and so they don't have the concept of current row as well as an example we have seen the gross margin and the gross margin as a percentage a quick question for you am i running too fast or too slow or just fine okay if at some point i'm running too fast just slow me down and if i'm running too slow i typically see because people start to sleep and so it's easy to understand what is happening so the gross margin percentage that we computed was computed as a measure and that's it we already did it so we can go there this is what we were speaking a few seconds ago so when to use what when to use a measure and when to use a calculator Column. You typically use a calculated column whenever you want the value to have a physical representation. So imagine that I want to be able to slice customers by doing ABC analysis or I want to compute a column in the customer table that divides customer by age range. Then I want to use the age range in a slicer or in a report or to make filtering, in that case a calculated column is needed because I want to be able to say, okay, now show me all the customers that are in a given age range, and to apply this filter, I need to have a calculated column. Whereas in all the other scenarios, typically a measure is your best friend. Keep in mind another small detail, that you might be used, you might... We already used to use a tabular without ever writing, not tabular, sorry, Power BI without ever writing a measure, because Power BI pre-aggregates everything for you, using sum or whatever. So here, you don't see that, but on the demo before, you probably have still on the screen, when I put gross margin in the report, I get the sum of gross margin in an automatic way. This works as long as you use Power BI. but if you plan to use the beautiful feature that this morning you have seen that is the ability to connect from excel to power bi then excel does not do any kind of automatic pre-aggregation and if you want to see the sum of gross margin then you need to author a measure that shows the sum of gross margin so that then excel will be able to show you a sum make sense now When you author measures, and we will spend most of our time writing measures instead of calculated columns, you typically use aggregators. Aggregators are functions that compute some kind of aggregation over a set of values. So typically aggregators are sum, to get the sum of something, average, as you might imagine, to get the average, mean, and max. Those functions have two. limitations the first one is that they only work on numeric columns this somewhat makes sense you cannot take the sum of a string you cannot take the average of a string you might want to take the minimum or the maximum of a string but this simply does not work because this function have this behavior in excel and for compatibility with excel they have the same behavior index but that is not a strong limitation you typically live well with that there is another limitation which is much worse And that is, all these functions are able to aggregate a single column. You can easily compute the sum of sales amount, or the sum of order price, or whatever column you want to aggregate, but if you want to do the aggregation of price multiplied by quantity, that simply does not work. You cannot do that using sum. And that's a much worse limitation, because a lot of times, imagine you want to compute a weighted average, the average you want to computer Whatever kind of calculation you might want to prepare, then you might be in trouble because you cannot aggregate expressions. You can solve this in two ways. The first one is create a calculated column that multiplies order by quantity and then you use sum to aggregate it. But this means that you create a calculated column. And we learned that calculated columns are bad, so we don't want to create a calculated column. The other option is to get rid of these aggregators and instead use a... one of the X aggregators. The X aggregators have nearly the same name of the standard aggregators, just with an X at the end. And I typically call them aggregators or iterators because this is what they do. Aggregators, the X aggregators receive always two parameters. The first one is a table, and the second one is an expression. And then they iterate row by row on the table and compute the expression. As an example, look at the total sales computed in this way. And by the way, if you want, you can, oh, we can write it on the demo machine. So we try writing a sum with an aggregator. Demo machine. Demo machine. Okay, here we are. Okay, we can create a new measure that computes the sum of sales amount. But instead of computing it with the column, it computes as a standard measure. So we simply go there, create a measure, and we need to use a different name. On the slide, I called it total sales, which is fine. And total sales uses sumX. SumX is an iterator, so it requires two people. parameters. The first one is the table that you want to iterate. In this case, it's sales. Let me zoom in a bit so we can read better. And then for each row in the sales, we compute sales quantity multiplied by sales net price. This calculation will iterate row by row on the sales table, and for each row compute quantity multiplied by price. Okay? A very quick question for you. Do I have a current row when I'm here evaluating this expression? Let's make it simple. Who says no? Who says yes? So few people. I have a, first of all, I need to have a current row because if I didn't have a current row, I could not retrieve the value of sales quantity. So a current row need to be there. The point is only understanding who provided that current row. And the current row is provided by sum x sum x is an iterator so it goes on the first line of sales and computes the value then goes on the second line and computes the value and this happens row by row for all the rows in the sales table no matter how many hundreds of millions are there so i I have a current row and I typically say that I have, here I refer to the quantity value in the sales table for the currently iterated row by some x over sales. Make sense? Yep. And if you want to multiply, time is coming. That is coming in a few seconds. Okay. There was somebody there ready to die for starvation because you raised your hand while he was much more direct. Okay. So, if you hit enter at this point, we create the measure. The measure cannot be visible here, but we can put it on the report. Under sales, we have it somewhere. Here it is, total sales. You click on total sales and you get as a result total sales. Now this sales amount probably was multiplied by something different. Here I used the net price while here on sales amount I used unit price. Now it's easier to modify the measure. So go to total sales and use unit price. just to make sure that the numbers we are computing are correct. And now the two numbers should be the same, 117 and 117, at least. For this calculation, everything is simple. Now, in this case, using... an iterator we are able to create to compute a total sales without having the need to create a calculated column that means if we want and if we are very precise we can simply kill the sales amount column because basically we don't need it we don't need to have a calculated column if all we want to do is a perform an iteration of our table or aggregate some kind of expression okay now let's go let's go back to the slides we will use iterator a lot Now, we need to run through a set of functions. I'm going to run this part very, very fast, so we save more time for later. Just because I want to show you very quickly the set of functions that are available. You have function to count values. They are coming, again, from the Excel experience. You have count that only counts numeric columns, and then you have countA that counts anything, but it forgets about counting blanks. and And then you have COUNT BLANK that only counts the rows that contain a blank. A blank is the equivalent of a null or no value. So a value, a colon containing nothing at all. In my life, I think that I have never used COUNT, COUNT A, COUNT BLANK because they simply count the number of values. Whereas there is COUNT ROWS, which is much more interesting because COUNT ROWS does not count the different values, but count the number of rows that exist inside a table. And it's very, very common to use. And then DISTINCT COUNT. If you look at them, there are the same kind of functionalities that you can get by using Power BI when you choose the aggregation that you want in a report. So either you choose the aggregation in Power BI using distant count or count or simply count, or you create a measure that computes the value for you. Then there are logical functions, standard Boolean logic like an or not if any Error. Keep in mind that in DAX you can use a Boolean Logic in two ways. You can use the standard Excel syntax, AND AB, or you can use the double ampersand, which is the logical AND in a more C-sharp like. In C-sharp and in modern programming language, they use the double ampersand for AND, the double pipe for OR, and the exclamation mark for NOT. At the beginning, I typically was using the double ampersand because it made me feel much cooler. Oh, I'm a developer of C-sharp, I'm not an Excel programmer. Over time I started forgetting about this as an operator and I started using the functions instead because they tend to make the code easier to read if your formula is a very complex one, provided that you intend the code in the right way. The only limitation of the function is that you can only pass two parameters, and so if you have n of three parameters you end up having n, n, n, n, n, and the code becomes much harder to read. Then there are information functions. Information functions provide you information about the expression you want to test. So if you want to check whether a measure is a blank... Sorry? Yep? Louder. I'm getting older, so I'm no longer here. Yes? So if you use the ampersand, I typically need to repeat the question because they are recording and otherwise they don't hear you. If you use the ampersand you can put many many conditions one after the other in a logical land in a very easy way. So that's the only reason why the double lamprasana. The double pipe are useful. But if the condition is a very Simple one, it's totally fine to use them. If the conditions start to be very complex, then formatting The code becomes much harder with the operators. Then there is a set of information that is used to information function that tells you whether an expression is a blank, is a number, is text, is non-text, or is an error. IsError is somewhat useful because it tells you whether an expression that you provide as a parameter returns an error or not. All the other ones are useful in case you don't know the data type of your columns, and you already know what I think about people not knowing the data type of columns. So, these functions are there, but they are not really useful, and I think that I have never used them in my entire career. Then there is MAX and MIN. They introduce a beautiful feature, well beautiful, a very useful feature in 2015, that is the opportunity of computing the MAX, using MAX to calculate the maximum of a column. but if you pass two parameters to max then you can take the maximum of two values you can always get the same using an if statement if a is greater than B then return me a otherwise return B but that is turns out to be a bit more complex and a bit harder to read and write, whereas Max and Min are much more simple to use. If you are used to Power BI, oh, by the way, one question, how many of you use both Power BI and Excel Power Pivot? Too many, oh my God. Get used to the fact that many of the features that are in Power BI are not present in Power Pivot. So for example, Max, I don't know if Power Pivot uses Max Power Pivot 2016, because basically in Power Pivot they implemented one release of the Analysis Services engine, and then they stopped there and they do not upgrade it over time. So currently the version of Analysis Services that runs in Excel 2013 is the previous version of Analysis Services. Power BI always contains the latest version, so all the new and cool features are in Power BI. And Power Pivot for Excel in 2016 contains not the oldest one, but an intermediate one that understands some of the functions that are in Power BI, but not all of them. I don't honestly remember if Power Pivot understands Max or not. Just try it out. But I get used to the fact that DAX has many different versions, and Excel and Power BI are not intended to be compatible at all. I don't know why, again, the reason why Microsoft decided for it this way, but that's it, so you better get used to it. That's an interesting question. Is there any reason to use Power Pivot instead of Power BI? Well... If you want to open an Excel pivot table, so to pivot over your data model, Power Pivot is the easiest way to do that. Today you can create the model with Power BI Desktop, publish it on Power BI and then connect using Excel to the Power BI file which is on the cloud. It might be convenient for simple operations, but in case you want to do the analysis while you are flying from one city to another on a plane and the Wi-Fi is there but it's terrible so you typically don't want to use the cloud, then you are in trouble because you cannot connect to Power BI when you are on a plane. So Power Pivot makes some sense. What I really wish Microsoft does is the ability to from Excel in a simple way to connect to a Power BI Desktop file which I create on my model. That means that at this point I'm able to do all the modeling stuff using Power BI Desktop and all the analysis stuff using Excel, separating the two worlds. if you want to merge the two operations in the same, then Power Pivot is a good option. I have no idea what the future of Power Pivot will be and how many products will remain doing nearly the same thing. Because if you think Power BI Desktop and Power Pivot do nearly the same thing, Power BI Desktop is the new kid in town, while Power Pivot, as of today, looks like the old that does its job, but in a different way. Yes, but the cloud model. So you can connect to a Power BI model which is in the cloud, and you can connect from Excel to the Power BI model which lives in the cloud. If you have the same model on your local PC, then there is no easy way to connect from Excel to that model. There are ways to do that, but I cannot speak about that unsupported feature here. so you can easily do that but it's not totally supported and it's not user friendly so if you need to do it once a day that makes sense but if you need to open the same workbook again and again it turns out into a nightmare because it's not totally supported by the engine Is the feature of connecting to Power BI through Excel a pro version? I don't think that James did say anything about that. So, I guess the correct answer is that I know the answer, but I cannot give you an official answer. And the best thing that you can do is ask to the guys at Microsoft, because they might have changed their mind in the last few weeks. And so what I think might well be wrong. That's a problem of being recorded. Now, going back to Excel, to DAX, we have a lot of mathematical functions. I'm not going to explain you how the logarithm or the square root work. These are the kind of functions that when you need them, you use them and they work exactly as you expect. There's nothing strange and nothing complex about them. There is a function which is extremely useful, which is divide. You know that whenever you divide one number by another one, division by zero might happen. And so you typically use a pattern that says if the column that I want is greater or different than zero, then perform the division, otherwise return zero, a default value or whatever, a blank or whatever you prefer. If you avoid this if statement, good chances are that this will lead to an error. And remember that errors are the evil for... performances of DAX. Divide is an easy way to perform the same division without having the if statement and writing by the way it is that can be a very complex expression only once because divide tries the division and if any error happens then it returns the default value instead of returning the standard one. And the last thing which is extremely useful and new in Power BI But this is also supported in Excel 2016 is the usage of variables. Variables are extremely powerful and they are extremely useful to use. Even if I can, I understand that if you are new to DAX right now, you look at them and say, well, why are they so useful? It will come later in the day. But basically, you can define a variable wherever you have a DAX expression. You define the variable saying total quantity is the sum of sales quantity. And then you refer to the variable inside your code. So the pattern is always var and then return, where var defines the variable, and return returns the return. result of the expression. Inside the return part, you can use the variable multiple times. It is important, first of all, from readability of the code, because this might be a very complex expression that you don't want to repeat multiple times in your code. You write it only once, and once it's there, you can use it as many times as you want. And the second very important point, is that variables guarantees you that evaluation happens only once. So if you use sum of sales quantity, sum is a very fast operation and you typically don't worry about performances. But if that was a much more complex expression, then repeating its description multiple times might lead to the evaluation of the expression multiple times. And this of course is going to kill the performance of your measure. Using variables guarantees that evaluation happens only once. It happens here and here it's just used as a value which is stored in the cache of the model. Sorry? Not only, I need to repeat the question, are all data types supported for variables? Yes, all data types are supported for variables and much cooler you can also store a full table inside a variable. So you can write an expression that computes a complex table, does a filtering or some kind of summarization, you store it in a variable and then you use it, iterating over it and doing further calculations. Variables are really really useful even if at the beginning typically people don't use them, but when you will get used to it to them they are really really powerful. That's a very interesting question. The question is what's the difference between creating a variable and the measure basically. The difference is huge but we will speak about that later today. So not this morning, this afternoon when we speak about evaluation context I will be able to explain you the difference. Basically if you write the measure here multiple times you might be in danger of evaluating it multiple times. In reality, the optimizer is smart enough to avoid this, so it evaluates only once because it detects sub-trace in the expression. In the latest version of DAX, in the previous, it was not. But then the evaluation context under which it is evaluated is different. And we will use that later, I guess, early phase this afternoon. And then we have a set of text functions, which I'm not going to again explain as I did before, because I'm not going to explain you how to replace search upper value do. Value computes a value, concatenates simply, concatenates two strings, and is the very same as the ampersand, and so on. They are very similar to Excel ones, and if you need them, you simply use them to compute the values that you might want to compute. And then there is a set of date functions that at the beginning look very useful because you can extract the date, get the value of a date, get the end of month, extract hours, minutes, seconds, or sum values to dates and do calculations like those. These functions look useful at the beginning because people tend to think that time intelligence can be done using these calculations. So if I want to compare the sales of the current year against the sales of the previous year, or the current month with the same month in the previous year, you can use these functionalities. In reality, time intelligence is done in a totally different way, and we will see that this afternoon. So these functions have some limited usage. I don't remember having used many of them apart from extracting parts from a date, the date, the month, the month name, or values like those. So again, I want just to show you that they are there. but then you can when you need them you google or you bing for them and then you search for whatever for syntax sounds stuff like that finally there is a set of functions which are useful to follow relationship and that answer a question that somebody asked a few seconds ago how if i want to compute for example the sum or compute values with columns which are not in the current table but are in another table provided that there is a relationship between two tables you can use both related and related table and it's easier again to move to the demo machine wow it worked now thanks and and try using related and related table for example on the sales let's go to the model on the model you see that sales no not you sales is here sales has a relationship with product with store with customer and so on but basically it has a relationship with product and imagine that i want to compute this time the sales amount computing the sum an iteration over sales get the quantity and multiply by the unit cost or by the unit price which is not in the current table but it's in a related one okay i can easily try to modify my measure i guess it was called total sales And you see that total sales iterates over sales and computes sales quantity multiplied by sales unit price. What happens if I write product? Unit price. If I try to write product unit price, even though there is a relationship between sales and product, I simply cannot access a column which is in a different table, so in a table that is not the current one over which I'm iterating. If I want to do that, I can do that, but I need to surround the function with related. Related let you follow a relationship, a many to one, so go from sales to product, grab the value of the product, the single line containing the product referred by the current sale, and then use the value of the unit price if i do that this way i create my measure and that works just right so if i want again get rid of the unit price column in my fact table and just relate just use a relationship to grab the value from the other table. Related works through any chain of relationship from the many side to the one side. Are you familiar with many side and one side of relationships? Say yes or no so I hear. Cool. So, Related works from the many side to the one side because you need to grab a single value. So, from the many side, sales, there is only one product for a given sales and Related gives you access to that value of the individual product. And it works just fine. fine so it works on any chain of relationship provided that they are always a many to one many to one many to one sometimes you want to do the opposite imagine that you want from the one side to get access to the many side of relation Now, related will be in trouble because related expects to return you a value. And if I am on the one side, imagine that I'm on a product, and I want to get access to the many side, I will not get a single row. I will get many of them because there are many sales for any given product. So what I need to do in that case is not use related, but instead use related table. If I go, for example, on product, we can go to... to the end and create a, imagine that we want to create a calculated column that tell me how many sales I did, just counting the number of sales for each given product. I can easily do that by creating a new column. Let's call it number of sales. I hate when this happens. Number of sales. How do I compute the number of sales? Any idea? Sorry? Distant count? Count? Count of what? Count is, the idea is nice, it's just not count. You need to use count rows. Count counts the number of values in a column. Count rows count the number of rows in a table. And then I use related table. sales this is going row by row to go on the sales table and simply count the number of rows and then store the result in my column and now I have for any given product the number of sales over all time over all whatever okay Oh That's a big You still have not understood I'm Italian. Could you zoom in? It's hard to hear. Zoom! It's much better. That I can understand. It's more Italian. So, number of sales is just simply count rows, number of sales. Now, what if I want to compute for every product the amount sold? The pattern is very similar. By related table sales, I can get access to the sales of that current product. But then I don't want to simply count the number of rows. I need to create a new column. This time I zoom earlier. Let's call it sales of product. And then I use sumX over related table sales. And then I get sales quantity multiplied by sales unit price. That's an interesting question. Why do I, I'm doing a column? Because I want to compute for this given product the amount sold for that product. This might be useful imagine, imagine that I want to perform something like ABC analysis. or per-rater analysis and I want to be able to identify the product that sold most and then divide them in categories, so the most sold product and the intermediate ones and the product that never sold anything, then I need a column because Consolidate the value once and forever. Or if i want to classify Products based on the number of times i sold them, having that As a calculated column is useful as an intermediate step in Computing something more complex. So these are just the sales of That given product and the product needs to be given. Whereas if i want to show the sales in a report, then i don't Need a calculated column. I can simply put the product on the on my I can report the sales amount and i get the value there. It's useful to have it as a calculated column only if it's Part of a more complex calculation that need to happen Later. Make sense? I guess the last parenthesis was wrong. Real quick on that. If you're doing a filter, I'm going to treat the calculated table, I mean the calculation. Again, another interesting question. Is it going to change the value if you put it in the report and then you further filter the report? And the answer is no. The reason is very simple. Calculated columns are computed that way. refresh time so they are computed once and forever once they are computed this is their value if you put if I put the sum of this calculated column into a pivot table or into a reporter and then I slice the report by year I will always see the total sales of the product for every year the same value. I will not see any kind of filtering. So I have that sales of product, and if I put it aggregating by sum, it's in the product table, sales of product. You see that the number is always the same. It does not depend in any way from the data. make sense because the number has been computed on the product table and no matter how you filter the data date is not able to filter the product and in any case they would never filter the product in the correct way because I pre-computed it as a calculated column. If I do the same code as a measure, then everything is different. It's computed at query time, and in that case, calculation will depend on the filtering that you put on the report. Okay? Now, let's go back to the slides. So, you can use related to move values from one place to another. In this case, you can use related to move category and subcategory from the product table to the sales or from the categories to the sales. Or you can use related to do further calculation like sumX and compute the sum of sales. Speaking about related table, I started introducing the idea of a function that does... It does not return a value, but returns a full table. If you are coming from an Excel background, this is a kind of function that is somewhat new because it does not return you one value to put into a cell. Instead, it returns you a full table containing rows of some table. And then you typically need to do something else on this table. You cannot simply take the result of a table and store it into a cell or show it in a report. You need to do something else. DAX is extremely powerful and has a very rich set of table functions. We don't have time to see them all, but at least to see the basic ones and get some feeling of how they work and... what you can do with table functions. The basic table functions are only four or five. You have filter, all values, distinct and related table. We already spoke about related table, so you know what it does. I just want to introduce the other ones because you will use them very, very often and in most of your calculation. The result of table functions is typically used for other functions. You never publish it as a final value and of course the power of table function is that you can combine them combine them in more complex way let's start discovering filter filter is very simple for the next few demos or for the next few slides i will not use the contoso model i will use a much simpler data model that only contains one table with these rows. So it contains orders, oh, there I can read it. It contains the city, the channel, the color, size, quantity, and price, just those columns, and only eight rows. And imagine that I create a measure that does a SUMX over filter orders where order price is greater than one. I put that into an Excel pivot table. This is not using Power BI, it's an old slide using still Excel. but there are many of them similar. And you see that for each cell, I see a different value. Let's see how it works. For example, to compute the value of this cell, 192, which is the value for green. When it's time to compute that value, the first thing that the engine does is say, oh. You are probably, you are interested in getting only the green value and the internet value because the slicer is filtering only internet and on the rows I put the color green. So I'm not seeing the whole model but only the two rows which are at the bottom that contain green and internet values. Then remember that DAX is a functional language so in order to follow its behavior you go to the innermost function because some X will iterate the result of filter. Filter will start iterating orders. So the first function that is called is filter, and filter says, oh, I'm gonna iterate the orders. Remember that filter is an iterator. So filter iterates the order, and for each line it checks if order price is greater than one. On the first line, this is fine. On the second line, the order price is one. One is not greater than one. And so, that line is not returned by filter so the result of filter is a table in this case containing a single row that is then fed to some x that computes quantity multiplied by price make sense and the question for you there are two questions for you in reality the first one is do we have a current row here. I guess you can... I need to get used to do that on both sides. There are just too many here. Who says yes? Who says no? Blank is not a valid answer. So you will need to take a decision at some point, either yes or no. Who says yes? No. if i didn't have a current row i could not even write order price so i need to have a current row and the current row here which is providing the current row there louder the filter iterator that's fine filter is iterating over orders and is providing the current row so this is the price of the currently iterated row by filter over orders okay and do we have a current Throw here in the, let me see, I can reach it from here. We have a current row introduced by some x over the result of filter. Okay? So we have two iterations happening, one after the other. The first one is some x, it is a filter iterating orders. The second one is some x iterating the result of filter. Okay? For the ones of you who come from a developer experience, typically people look at this and say, oh, iteration over a table containing millions of rows, this is going to be extremely slow. No, DAX is designed to work this way. So you can iterate billions of rows. without ever worrying about performance. Performance issues in DAX never come from iteration. They typically come from other aspects. But feel totally free of iterating large tables. DAX is going to help you and write good code for you. So finally, the result is put on the pivot table. Looks like I missed some of the wrong slides. Okay. So filter is typically useful to add conditions. So it basically starts from a scenario and adds more conditions. It restricts the number of rows. a table by iterating over it and removing the rows you are not interested into. You might use it to compute the sum of products which are red, to compute the sales of whatever, whatever kind of filter you want to produce, filter is your best friend. But FILTER restricts the number of rows that your calculation performs. There is another function which is extremely useful and you will use it many, many times, which is ALL. ALL that you can see here on SUMX over all orders quantity multiplied by price. ALL is a simple function that always returns a full table. So it ignores whatever filtering happened and always returns the full table. So if you write a measure that is sum X over all orders of quantity multiplied by price, then we follow the execution as it happened before. Imagine that we want to compute that number, 789. Initially, you have the same filter because you have a filter for green and a filter for internet, which are applied on the table. But then sum X requires the result of all. All wakes up and says, oh, guess what? There is are filtering on the sales table or on the orders table but i'm gonna ignore everything and return the two full table anyway so all always returns the full table and once you have the full table then sum x iterates over the eight rows and computes quality multiplied by price that have yes the filters contained within the all search say all orders but what if all contains a filter expression within it. All cannot contain a filter. So the question was, what if all contained a filter? You can do a filter of all. That makes sense because you remove a filter and then you restrict it. You cannot do all over a filter just because all requires a table as an expression. And besides, it wouldn't make much sense to add the filter and then kill it a few seconds later or a few nanoseconds later. You see that the number is the same in all the cells in the pivot table that show all orders. Of course, using all in this way, it doesn't make much sense to compute the same value for all the rows in a pivot table or in a report. But it becomes useful if you want to show values as a percentage. Imagine that you want to compute a value like the percentage of sales against all the sales. Then to compute that percentage at the numerator, you simply compute the sum. Of sales at the denominator you cannot compute sum of sales otherwise the Result will always be one so you compute the sum X over all sales of whatever you Want to compute and so you have a numerator one number and that the Denominator the grand total of all sales so all is not useful to create a measure Like in this case that shows always the same number but you can easily imagine That if you want to compute percentages it makes a lot of sense okay Related table returns only the rows in a table which have a relationship with the current row whereas a related all that always returns all the rows up if we switch for a second on the demo machine here on number of sales If we switch for a moment on the demo machine, who's gonna list? Oh, here it is. Okay, number of sales does a count rows of related table sales. If instead of having related table, I write all sales, then the result will always be the same value no matter which row i'm in because it always returns the full sales of the model okay let's go back to the slides thanks So, SUMX ALL ignores any filter that exists on the model. ALL is extremely useful. You can use ALL in two ways. You can use ALL with a table, and that returns all the rows of a table. And then ALL has a very special functionality that we will use again later. But you can use ALL passing a single column as a parameter. And if you pass it to only a single column, it returns a table. Contains all the possible values of that column, all the distinct Values of that column. So all customers, customer name Returns me one table containing one column and many rows, one Row per each name of my customers. Duplicates are removed. Only the distinct values of all the distinct values. If you want, you can use all with multiple columns, and in that case you get a table containing, in this case, channel color and size, and you get all of the possible combinations that exist of channel color and size. It's not very widely used, all with multiple rows, but it's useful to know that you can do that, and it makes sense to have it as a functionality. In this case, it's only used inside a calculation. So it's not stored in the PBIX. It's computed during the calculation of your model, and then it's used to restore the value. Yep. It's coming in a few seconds. We are going to speak about values in a few seconds. And then we are going to spend most of the afternoon spending time about values and what it computes. of the yes in that case you get all the existing combination of the three values so if i do that if i do all of customers gender and customer name i will get only the combination where i have a female name with female but i will not have combination of male name with males okay hopefully whatever Sometimes you want to do all or many, many, many, many, many columns and you want to only avoid a few of them, maybe only one. So you can want to do all of channel, color, size, quantity, price, amount and you don't want to put city there. In that case, instead of writing the list of everything, you can do the opposite and say, well, let's use all except. All except orders, order city does all on all the columns except on the city. Again, it's not very commonly used. I just wanted to show you it exists because we will use it probably later today. Of course, you can mix filter. You can write a filter of all of a table. And as an example, if you do a... some x over filter of all orders and then filter the orders where channel equal internet and then compute orders amount. Amount is a column which is not visible on the screen, which is the multiplication of quantity by price. if we follow the execution of this code the first function that imagine that we want to go there we still have the filter for green we have a filter for internet so ideally we should see only these two rows but then all kicks in and all it's the innermost function the first innermost function so it's the one which is executed all returns always all the orders and so it's going going to return all the rows in the table. then FILTER takes all the orders and returns only the ones where channel equal internet. So it will return those four rows, and this happens for all the rows. Finally SUMX computes orders amount and aggregates the value. So again you see that we have the same value for all the rows. rows that contain our internet. First we remove any filter and then we apply the filter that we want to apply. There is another function which is called distinct that returns this thing is very similar to all it's very similar to values this thing returns all the distinct values of the column you pass as a parameter and it looks It looks like it's very similar to all. The difference between distinct and all is that all ignores any filter, whereas distinct obeys any filter. So if I create a report and I slice by color, and then I take the count rows of distinct product names, I will get only the number of products of that given color. Whereas if I do a count rows of all product names, I will always get the controls of all the products regardless of the filtering that is happening. Okay? Now, speaking about the number, the values returned by these tables, it's important to understand how many values does a column have. I know the answer is pretty simple. You simply count the distinct values of a column and then you are done. You don't need to know anything more. But imagine that you have a table like this one. amount that contains the amount sold and then product id one two three three and then four and then we have a product table it's very clear i'm italian by the product table but what i consider a product a coffee pasta and tomato anything else is not a product and the product id is one two three if you look at the sales table of course there is a relationship between the two two tables. For some rows I have 1, 2, 3, 3, but for the last one I have 4, and 4 does not exist in the product table. So what would you expect at this point? The previous versions of Analysis Services Multidimensional would complain about, oh, the data is wrong, I will never be able to process, and so you are going to get wrong results. The latest version, the current version, Tabular and Power BI, is much more relaxed. and says well if your data is wrong that's your problem i will show you some numbers anyway but i need to to follow the relationship imagine that i slice by product so i put the product in the pivot table or in the report and i want the sum of amount i do i expect to see this 14 somewhere if i start slicing by product 14 cannot be reached so theoretically i shouldn't see it Problem is that the total will be wrong. The total will not be the total of sales that I can grab from a different report that does not follow the relationship. So in order to solve this ambiguity and give some meaningful result, the engine does one trick, which is important to understand. That is, if the relationship is broken, because you reference values which are not there, it adds a new row. to the related table so to the table which is the target of the relationship that contains blanks for all the rows and then creates the relationship between the wrong row and the blank one makes sense and it's pretty clever because in this way if i slice by product i will see coffee pasta tomato and blank so any user can see the blank and see the number which is of values unrelated to anything else. The point is, at this point, how many values does product has? It might have three values or four values, depending whether I count or I do not count the blank row. Now, some functions count, take into account the blank row, some others do not take into account the blank row. For example, all considers the blank row as a valid value. If I get all product, product, product, I will get coffee, pasta, tomato, and blank. So the result will be four. Distinct, I never remember what distinct does. Distinct does not return, does not take into account the blank rows, whereas values takes into account the blank row. All takes into account the blank row, and there is another function that is all no blank row. It's pretty simple to understand what all no blank row does. All returns all the values but does not return the blank row. sense it's much less complicated than it sounds right now you will get used to it of course if your model is perfect then all those functions return the very same value if your model is not you might be in trouble and need to define which function to use yep only one blank row doesn't matter how many wrong rows are you having your black in your table there is only one blank row which is added. So if you have a hundred values different, all non-existing, they all match to the single blank row which is created in the model. The next slide, hopefully it's useful to see the difference, I created a few measures. One does a count rows distinct, the other one does a count rows of values, then count rows of all and count rows of all no blank row. And then I put everything in a pivot table. and you see that count rows of distinct since distinct does not consider the blank row you see one one one then blank and aggregates for three whereas if i do a count rows of values since values considers the blank row as a valid row i have one one one one and then four as the total the same happens for if i do a count rows of all or if i do a count rows of all no blank row It is not so important to remember all these details, but it is important to know that this happens because for sure at some point you will compute some ratio between count rows and count rows of all or count rows of values or count rows of all, no blank row. And if the result is 99.99999%, it means that you are using a function that considers the blank row somewhere and the function that does not consider the blank row somewhere else. So either you fix your model, which is always the best option, so that all relationships are followed in the right way, or if you have issues with your model, at least you use the function that computes the current number. We already use related table, so I basically did this part of the demo earlier than what I did later. So if you want to compute number of sales, you do a COUNT rows of related table sales. and if you want to compute the sales per product then you simply do account some x over related table and then you use whatever expression you you need to use okay one thing which is well it's intuitive at the beginning but they need to be well understood when your dax code start to be a bit more complex is that When I write related table sales, the result is the set of rows in the sales table that have some value. Now, you might be used to SQL. In SQL, if I write SELECT start from sales, as a result, I get all the rows in my sales table. And that's simple to do. If I do a SELECT start from customers, I get as a result the values of the customers in a table. DAX is a bit more complex than that. INDUX all the table functions, return tables, but these tables are not just the values. These tables are the real tables. So imagine that I write an expression which is slightly more complex than the previous one. I do a SUMX over filter of category where I count rows for each category the related table product and I check if it's greater than 10. Is there any brave that is going to tell me what this function is going to compute yep the first filter returns the categories that have more than one product More than 10 products, yeah. It's very hard to read at that size, believe me. So it filters category, and for each category, it controls the related table product. That is fine. The result of this is a set of categories. But it's not just the category names. It's not just the category. They are the categories. So the result of this filter is Then used inside sum x. And when i write sum x of a Related table sales, this related table to which Relationship is going to follow? because the result of the first Filter is a table containing categories, these are the same of the given categories, so the currently iterated category, by some x over the result of filter. Am I losing you? No? Okay. I always try to look at faces, and it's hard to understand whether you're concentrated or not understanding. So filter returns a table. The table iterates all the... the relationships that it has in the model. In SQL, relationships are not part of the model. You can define them as foreign keys, but then you need to specify relationship in every query. In DAX, relationships are part of the model, and you don't need to specify. specify them. Whenever you use a function that returns a table, the table return has all the relationships with the rest of the model and you can trust it. So you can use related table sales and knowing that these are the sales of the current literated results. So the full function basically returns you the sales of categories that contain more than 10 products. Make sense? It's intuitive, I mean it's not hard, it's not complex to read and use, it's only a bit delicate to you understand and use. The last thing that I want to show you about calculated tables is that calculated tables, sorry about table functions is that the result of any table functions can then be used inside your model. So if we switch for a second to the demo machine. If we switch for a second to the demo machine. Oh, you're right, sorry. I was looking there thinking that it was the wrong place. Now you see that we have new column, we have new table, sorry we have new measure, we have new column, but we also have new table. If you go to new table you can create a new table and create for example a table containing red products. And red products might be a filter of the product table where product color equals red. Using an American keyboard is harder than expected. Okay. And you see that I have created a table which is called Red Products. It's there in my model. The point is that this table only contains red products. All the rest is gone. So it's a subset of the products. The table that I create in this way belongs to the model. Starting from this table, I can further create relationships with other tables. I can summarize, I can do nearly whatever I want. Calculated tables are not always useful, but when you need them, They become extremely useful. You typically use them to Pre-aggregate values or to filter or to do summarizations Because keep in mind that the set of table functions that we Have seen so far is just a basic set of table functions. In reality there is a much more, much richer set of table Functions that you can use to summarize, to grab values, to Compute different expressions. And table functions are besides besides very useful if you want to test and experience. you can for example create a table and you want to learn how values work or how all works, you can do all product color. This is a table expression and if you confirm it as a result you see you get all the values of the product color. if you do all product color all product brand then you get as a result all the combinations of color and bread and when you are learning ducks when you are playing for the first time with table functions it's very useful to use that just to test some features yes The question is, would it be wise to use it to normalize a flat file? Well, you already know the answer, which is, it depends. Let's say that the best way to perform operations on the data is not that of using calculated tables, but that of using power query. or however how they call it today so the query editor which exists in Power BI that you can use to do most of the ETL steps because in doing it this way you avoid storing in the model the flat file which probably is a very large one but you only store in the model the already normalized and already well-defined data model that is that much easier so it's smaller and probably faster to use With that said, if we are speaking about a simple prototype or a single model that you want to do quick and dirty, then this is somewhat dirty, but it works just fine. So if you have a small model and you want to do things the fast way, this is totally fine. You can use a calculated table and that works just fine. So calculated tables are, that's very usual. If your model is... Already well-defined calculated tables are nearly useless because the data is already written in the correct form. That said, a lot of times you work on models that are not perfect. Data might come from reports, data might come from the web, and you have the option of using Power Query or using calculated tables. I would prefer using Power Query, but if I want to do something quicker and I'm familiar with DAX, then using a calculated table is totally fine. Then I wouldn't build a data warehouse on top of that. So I would not do that in a corporate data warehouse. But for a small Power BI piece of code, that works just fine. OK. Sorry? No. The question is, do calculated tables inherit relationships? And the answer is no. Calculated tables, once computed, are tables in the model. If you want, you can create relationships. So you can go in the diagram view and, well, now I cannot create any relationship because I have only color and brand. But if I had a color table or a brand table, I can bring it wherever I want and build relationships in the model. but the result of a calculated table is stored in the model and loses its lineage so it's no longer able to follow relationship you need to rebuild them okay yes Does a calculated table refreshes when you recompute the data? Yes, as soon as you refresh your model everything is recomputed in the engine. Calculated tables are not available in PowerPivot. If you need something like it in PowerPivot what you need to do is materialize a query. So create a query that queries the model, materialize it as an Excel table and then take this Excel table and link it back to the power pivot model. In that case you obtain something which is similar to a calculated table. Yeah? You can use data coming from multiple tables? Totally. You can use data coming from many tables and as I said we are seeing only the basic functions in DAX but you can imagine that I can write something more complex like add columns. To all product color and product brand and I compute sales... Where is it? Sales... Sales amount. Sales amount. And oh, I need to use a name, yep. Sales. That's American keyboard. Sales. You have an idea how hard it is when I connect it remotely with somebody else that I see the keyboard in Italian, but then it writes in American. Okay, now there is a measure called sales amount, okay? And so you see that I created a query that computes by call, computes the existing combination of color and brand, and returns the sales of that given color and brand. Then this table might be useful then to do other stuff. Now, add columns is one of them. many, many, many, many table functions. There are a lot of them that you can combine to really express very, very complex queries. So you might take the top 10 sellers products, or you really can use DAX as a fully fledged query language. You start learning the basics of table. functions but then writing more complex query makes it even more powerful let's go through the slides so we have the last slide that goes to a conclusion then we will go to it and we will spend time later speaking about the evaluation context all what we have seen so far i guess it's simple say yes it's very simple it is or the complex part it's coming later as soon as we introduce evaluation context calculate and all this stuff so with what you know so far you can already write some interesting calculations but we have just scratched the surface later after lunch we will start speaking about evaluation context So the idea of DAX is that the modeling part is simple. You just have tables, columns, relationships, calculated columns that somewhat resemble Excel and measures. And that's it. There is nothing more to learn. So the building blocks are pretty simple to grasp and to start using. There is nothing more. There is a lot more to learn, which is coming, as I said this afternoon, about evaluation context. Evaluation context are where DAX start to rise its complexity. and of course it makes it much more interesting. So for all the ones of you who are coming this afternoon, we will... is there anybody who is not coming this afternoon? One? Okay, that's fine. So we will go on this afternoon with the remaining part and you will not need to read that later. For now, thank you and I guess we can go to eat something and meet here later to go on with the evaluation context.