Alright, so with that are you all ready to tango with tableau prep? Man, there's a lot of you today. My name is Mara Key. I'm one of the product managers on the Tableau prep team. It's been an honor working on this product.
And it's been an honor working with all of you to help inform the making of this product. We had over, I believe, over 4,000 beta participants. And a lot of our features really changed throughout the alphas and the betas.
I have lots of really great stories about that if you want to sit down and hear about them during the happy hour. So I'm gonna sort of walk you through a little bit of the reason why we went out and built Tableau Prep and then we'll go through and do a quick hands-on training just to give you a little bit of a feel of what Tableau Prep is and what we're bringing in and how it might help you, your team, your organization. with your data prep and data cleaning needs. So while I'm doing this, if you need help installing the product, if you need help getting the data for the hands-on training, please raise your hand and we will get you one of the installations. So there are two individuals in this section that need the one individual back there that still needs the installs.
And please be patient. We have a limited number of those, but we'll get it to you as soon as we can. All right.
So I have a question for you guys. How many of you have ever cleaned data? That should be everybody.
If you've ever analyzed data, if you've ever done anything with data, you've cleaned data in some way, shape, or form. How many of you have done this in Tableau Prep? Oh, all right, like 1% or so?
I hope to change that today. So how many of you have ever had to write complex calculations or expressions in Excel? Yes?
No, not today. We're not doing anything with Excel today. How many of you enjoyed that?
Was that fun? Some people enjoyed it, okay. There were a couple moments where it was fun.
But sometimes you didn't really know what was happening. happening to your data as you were applying some of these calculations or expressions along the way. And getting started, it was really hard to understand how to go about doing that work. Now, a lot of our customers A lot of the individuals that we've talked to typically depend on the tools that are available to them through Excel, and a lot of them get really frustrated. Others will use scripts, whether it's a SQL script or Python, and I have a computer science degree and I don't really like doing this.
That is the honest truth. I like writing other code, but writing code to clean data is hard. And you take a lot of time to do that.
You typically have to be an expert in this area to be really successful with it. And then an even smaller percentage will use specialized tools to go ahead and prep their data. Now, we at Tableau believe that if you are analyzing your data, you should have access and the ability to clean your data along the way.
Because what happens is people get to Tableau and they say, oh, wait a minute, this data is not really ready to be analyzed. there's a lot more prep that I need to do. And so data prep has been a blocker for a lot of our customers for a really long time, even though there are some data prep functionalities that we offer as part of the data pane and the bringing of the data and that experience. So when we decided to go out and build Tableau Prep, we were really aiming at those Tableau users, those analysts who are working with the data every single day.
and want to go back and make manipulations or joins or work with different disparate data sources to really arrive at the analysis, to really answer the questions that they felt were important. So in essence, we were aiming to help business users get their data ready for analysis. Our end goal at Tableau is to make your analysis easier than ever.
And with Tableau Prep, we're helping you easily clean, manipulate, and get your data ready for analysis. So with that, I want to jump into the hands-on. activity.
Now this is intended to just give you a quick peek into what we have built in. We're only gonna walk through like maybe 5% of what this product can do. So if you have questions along the way please raise your hand and let me know. There are two mics in the middle here.
I'll also try to leave some room at the end for us to answer some questions. Okay so Here's our first scenario. We work at a nonprofit organization that records donations along with an indication of whether or not a donor is likely to contribute again. So we're going to analyze the donations by donor potential, but before we do that, we need to prep our data. So go ahead and open Tableau Prep and click.
to connect to your data. So I'll do this along with you guys so you all have an example to follow. So here's the connections dialog. I'm just gonna click the plus sign and I'm gonna connect to, oops, the CSV file, nonprofitdonationsouth.csv.
So my text file, nonprofitdonationsouth.csv. All right. So here's the input experience. You can get a quick view at what different columns you have as part of your data. And you can also see some of the sample values.
Like for example, I know that state is actually really showing me state because I have values for different states like Kentucky and Georgia and Mississippi and whatnot. Now, what I'm going to go do is I'm actually going to go to this plus sign here, and I'm going to add a step. And when I do that, I get the three coordinated views that Francois mentioned earlier.
Now, just to sort of go over what these views really are, at the top is what we call the flow pane. It allows... you to see all the ways in which you've manipulated and changed your data along the way.
In the middle is the profile pane. It's here that you can see summaries and how different values relate to each other. For example, simply clicking on a value will highlight how that value is reflected across all of, like what the distribution is across all of my fields.
And at the bottom is a data grid, which should be a familiar spreadsheet view of your data. Now, is everybody with me at this point? Awesome.
Okay. Okay. So if you... Oh really? Could I get some people to take a look at their versions of prep?
Could you raise your hand if the data is not populating for you? So we'll just pause for a moment. So what I did, just to quickly repeat it, is I brought in my, I connected to my CSV file, and then I brought in the input. I clicked the plus sign and inserted a step.
That's what I did. Yeah, you got it? So I'll just quickly walk through it one more time. So I'll add my input. I'll go ahead and add a step.
And this is the view. So if you're still having problems, please raise your hand and we'll get you some help. Alright, so I'll keep going and explain some of the work that we're going to do here.
So, if you take a look at donor potential, notice that we have different classifications of donors. There's high, there's medium, and there's low. But there are multiple classifications within a classification. Like, for example, high HI is the same as HIGH.
What I need to do is I need to group these values together. And with Tableau Prep, that's super easy. Let's look at the medium example. I know that Mead and Medium are the same thing.
So what I'm going to do is I'm going to double-click on Mead. Oops. Double click on Mead and just make the adjustment right there.
Click Enter. And just like that, Mead and Medium are now grouped. Because I know that those two values are the same.
Everybody with me so far? Sorry. Okay, the data is still not loading. Could I get a quick show of hands for the people whose data is not loading? So, quick note, if you're having trouble seeing the data, Make sure that this bar here has sort of stopped moving before you say add a step.
I hear that that might be what some of you are going through. Okay. So what we did here, just to quickly review, is we went in and manually grouped mid and medium. Now, there are still some errors in my data.
Right? There's still low LO and low. And then there's also still the problem with high HI and high HIGH. But I don't want to do this manually.
And in Tableau Prep, you don't have to, because as Francois showed you, we've introduced brand new smart algorithms that allow you to group these types of issues and make adjustments along the way. So here's what I'll do. I'll go into this drop-down menu for donor potential, and under the group and replace submenu, I'll go ahead and group by pronunciation.
And just like that, high is now grouped, so is low. I'll just pause for a moment and make sure that I take questions or catch up anyone that is still trying to get their data to load. Yes? So you have to be inside what we call the edit experience to know that they're grouped. So for example, if I just say done, actually no, that's not true.
Yes, if you don't see the paper clip, they're not grouped. And then in order to know what's within the group, you have to be within the edit experience. Yes.
Question? So your question is, if I'm not changing the underlying data, what will people see when I share it out? So what people will see if you're not, so if I give you this flow, but I don't give you the CSV files, all they'll see is the flow that you created, and then when they go into this experience, the different panes, the profile pane will be empty, the data grid will be empty, but the changes that you applied along the way, these guys right here, that indicate the two steps that I just took, they'll be able to see that.
Yeah, yes. Okay, so her question is mine is blue. Why is yours orange in the flow pane?
Mine's orange. Yes, so we determine the colors So there's a special color algorithm that we're using to help you separate your different chains So automatically, it's going to be blue. So when I first started, it was blue. And then I deleted some nodes to re-show what I did, and it turned orange.
So there are new capabilities that are coming in the future that will allow you to change the color to reflect it to be whatever you want it to be. All right. So I'm going to keep going. So at this point, my data looks roughly...
prepped and what I can do is I can actually view this data, preview it in Tableau Desktop if I want to. So all I have to do is simply click preview desktop and the way that I got that menu is I right clicked and said preview in Tableau Desktop and what it will do is it will run the flow and it will open Tableau Desktop. And I can go ahead and construct a quick viz to see, hey, does my data look the way that I want it to look?
Is it ready for analysis? So, yeah, city. Let's put, like, duration, amount.
Let's do some color magic. But anyway, so you can open your data in Tableau directly from Tableau Prep. That's really all that I was trying to do. And you don't really have to do that here.
The other thing that you can do is actually create an output. So I just went back into this plus sign right here, this plus sign menu, and said Add Output. And from here, I can either create a hyper, TDE, or CSV. And that creates the data source that contains all of the different, where I've cleaned and transformed my data the way that I want to. Or I can go ahead and publish my data up to Tableau Server or Tableau Online.
All right, any questions here? Everybody with me so far? Yes. Great question. So his question is, if you are working with data that's being constantly refreshed, will this flow be automatically applied to it?
Is that your question? That's a really great question. So today in Tableau, prep, what you have to do is actually you have to go back into the application, press this refresh button up here, and then rerun your flow. And what that will do is it will grab your data, grab the brand new refreshed data, whether it's from... Oracle or Snowflake or whatever.
And it will, once you press this play button, it'll run your whole flow to however many outputs you had. And it will deliver them to whatever source you had them going to, right? Whether it was your local file or OptiTableau server.
That's a little bit of a manual process and we're working on ways to make it a little bit more automated. But this is the functionality that we have. One thing that I would say, It is the top requested piece of functionality.
And if you know Tableau, you know that we're on it. OK. I'm going to, yes. Sorry? The final step.
So what I did was I went into, I'll just delete it, this plus sign menu here. And I just said add an output. And at that point, I can either specify that I want my data source to be a hyper, TDE, or CSV file, and I can also publish my data as one of those three formats up to Tableau server.
All right. The next example is even more fun, so I'm going to jump into that quickly. But before that, let's quickly catch up the slides. So we went ahead and grouped by pronunciation. I showed you where the changes are being captured along the way.
The other thing that I want to quickly talk about is you can save your flows. and share them with whoever you need to share it with. You can either save the raw flow file, which only contains information about what you've done to clean and prep your data, but you can also share it as a TFLX, which is equivalent to a TWBX, which packages your data, especially for file-based connections, along with your flow.
And the way that you do that is you just go into the file menu and you either save or export. Package flow. OK. So in the next exercise, we're going to quickly walk through the join experience.
So here's our scenario. We have a data file with the number of internet users by country, and we want to compare those against a country's population when it's available. So we're going to need to join to combine. the two different data files and we're going to have to experiment a little bit with some of the join capabilities that we have here to clean our data along the way.
All right, so first we're going to connect to our two Excel files. So go ahead and connect to your Excel files and I'm just going to quickly wipe this clean. So I'll go into the connections dialog, Microsoft Excel. And I'll just select the two Excel files and say open. Now you should be seeing this view where the two tables from the two different Excel files are now in Tableau Prep.
Everybody seeing that? Let me know if you're not. Awesome. OK.
So the next thing that I want to do is I want to actually create a join. Now, can anyone guess how you create joins in Tableau Prep? I know Francois gave it away a little bit.
Like, he stole my thunder and stuff. I don't think he's in the room right now. No, he's not.
But how do I create a join in Tableau Prep? Just drag and drop. So one of the really cool things about Tableau Prep is that we've introduced some of the same paradigms that exist in Tableau Desktop into the Data Prep experience.
That's sort of the power, that's the secret sauce behind Tableau Prep, and really Tableau, is that it makes it really intuitive to do the work that you need to do. So I'm going to go drag and drop to join my data. So let me know if you're seeing this screen right here.
If you're not, please raise your hand and we will help you. All right. There's one person here.
Okay. So right away, we have an error message. And this error message tells us that we have to have at least one join clause in order to create a join.
So I'm just going to go to the left here. And I'm going to add my join clause. So country and country area sounds about right. Okay. But we have still a problem.
Does anyone know what the problem is? Sorry? They don't match, but why don't they match?
Sorry? Country codes, yes. There are country codes trickled in to this country field. So I need to fix that in my join experience because right now I'm kind of stuck. So before I do that, let me quickly explain what's going on here.
So on the left side here, you can see all the things that make up your join. You can see the join clause at the top. In the middle here, you can see the type of join that you've created and you can switch between if you really want to. And then at the bottom here is a summary of your join. So you can see how many rows came from this table and were joined from this table, and how many rows resulted in the join results.
You can also see what's excluded. So right now, everything is excluded. Now in the middle here is where you can see the values from your join clause.
So here's country and country area. And right now, everything is in red because everything is excluded. And then to the right is the profile pane and data grid. So you can see what's happening to your results as you go along in the join experience.
OK. So we know that the issue is with total population, right? So what I'm going to do is I'm going to actually go into the summary of joins and click on the bar associated with total population, so the blue bar for me and probably for most of you.
So now you should be seeing a view that looks something like this. Okay, so here's my favorite part. I need to clean this data. How would you guys clean this data, typically?
Sorry? Yeah, go into Excel and write an expression of some sort that cuts off the last five, right? What I really need is I just need to split this data in two.
And I need to split it by the separator hyphen. Okay, let's do that. So I'm going to zoom in so you can all see what I'm doing. So here's the drop-down menu.
I'll go into the drop-down. And under the split values submenu, I'm going to select custom split. Let me know if you're seeing this dialogue. If you're not, please raise your hand.
Alright. Oh, you're not seeing it yet? So I'm going to walk through it one more time, just so you guys are with me.
So I went into the country drop-down, and under the split values sub-menu, I selected custom split. And I got this dialogue right here, this custom split dialogue. Okay, so now I'm going to zoom in here so you all can see.
I know that I want to split by the hyphen, and I know that the pieces that I want to keep are just the last half. I just want to keep the country names, right? So I'm going to select split off the last and I only need to create one field from that.
And then I'll press split and just like that I now have my brand new split field. So you should all be seeing a field. I see someone celebrating that makes me really happy. That's why this part is my favorite. There's also other really awesome stuff that I don't have time to show you guys today, unfortunately.
But, so I've created my custom split, and if you need help with this stage, please raise your hand and we'll get you some help. I want to make this name more meaningful to myself, so what I'll actually do is I'll just double click and call it country name. And now my country name field.
Notice what happened when I did the custom split. It actually created a cleaning step right beforehand. So a long total population had just created a cleaning step called clean1. And those changes that I just did are actually captured here.
So I did a... What I did was a custom split, but it's actually a calculation under the hood. And I can see that here. And then I also renamed the field... From country split one to country name.
But let's go back to the join experience. We still have a problem. And that's because we are still joining on country and country area. But we have a brand new field that we just created.
Oh man, sorry about that. I accidentally did a command Z on that. My bad.
Command Z, as you know, just undoes and I accidentally did that when I was trying to zoom. So I have country name, so I'm going to switch country to country name and just like that my join looks a little better. So if you're seeing this particular view, awesome.
If you're not, please raise your hand. Okay, one individual up here. So just to walk through it again. We went ahead and we split country so that we could only keep the country names and we labeled that field country name and what I did was I went in to the apply join clause and I switched the join clause for clean one from country to country name and I have this resulting view. Yes.
Yes. So the join, so we moved away from the nonprofit data set. So the join is on a data set around total population and number of internet users. So you should be just working with the two Excel files and joining on country. All right, so I'm going to keep going.
So notice that here in the join clause, we still have some red going on, which means that there are still some values that are being excluded. And I can actually see that here in my summary of joins. Clean one, which is this step right here that was created when we did the custom split. has 15 values that are excluded. And then number of internet users has about 10 values that are excluded.
Now, I want all of the internet users, so I actually need to create a write join. Did you guys see what I did there? All I did was previously an inner join. All I did was I just clicked on the right-hand side and it created the right join. If I wanted, I could create a full outer join.
I could just play with this all day to make sure that I'm capturing the right type of join. And now the number of internet users went from 10 to 0, because they're now being included as part of my join. Everybody still with me?
Let me know if you're not. OK. But I still have some excluded values. Okay, so take a look. It's basically country names that either don't exist or are mismatched, right?
So, for example, take a look at Bahamas. So under country name, it's Bahamas the. And under country area, it's just Bahamas. So of course it's excluded. But I can fix this very easily.
So I know that the issue is in clean one. So I'll actually go back into this summary of joins. And I'll click on all of the excluded values.
And when I do that, I get the same view back where we did the original split. And I'll just find Bahamas the, double click it, and just delete the pieces that I don't need, and press Enter. And now if we go back to our join results, I should no longer have Bahamas, the, because the Bahamas are now included as part of my join. So the number of excluded values went down from 15 to 14. Yeah, let me walk through it again. So notice how here that there's a mismatch for the Bahamas.
What I did was I went into the clean one operation. Inside the summary of joins, I looked at the excluded values. And then under country name, I just double clicked on Bahamas the and deleted the pieces that I didn't need and pressed enter.
And if I go back to my join results, I can see that the Bahamas are now included as part of my join. And the amount of excluded values went down from 15 to 14. All right. So at this point, I want to point out one more thing that I will leave for all of you as homework. There is one more country that has a mismatched value. So I would encourage you on your free time just to sort of practice this to basically follow the steps that we did with Bahamas The and correct this particular country.
All right, everybody with me here so far? Awesome, okay. So I'm going to go back to my slides. I'm going to talk through a couple things before we before I take some questions. So this was just a sneak peek into what we are offering.
You only just saw maybe five to ten percent of what this product is capable of. Now you saw a really cool portion of it because the joint experience is actually one of one of my personal favorites. I know I shouldn't have favorites but I do. But there's There's a lot of really awesome instructor-led classrooms that are coming up.
There are some in New York, Atlanta, DC, San Francisco, Massachusetts, Chicago. So if you are in one of those areas, I would encourage you to attend. But there's also instructor-led virtual sessions that you can sign up for.
And here are the dates if you are interested in those. But if you really want to learn more, This is your URL, tableau.com slash learn slash classroom. The other piece that I want to say is that we have a lot of really awesome free online content that has been created for Tableau Prep.
So you can go on there and watch those videos. There's even those skill pills that were made by some of my fellow teammates. So there's a lot of really awesome tools out there that you can use to learn a little bit more about Tableau Prep.
And then we are a new product. We're only about two and a half months old at this point. We are releasing on a monthly cadence and your feedback is really important to us.
So far, you know, we have added a Snowflake connector, some of the zoo animals in Hadoop, as well as, you know, stats file connectors and brand new experiences that will make your data prep needs a lot easier. And the best way to engage with us is through the community forums and through the ideas forums. So if you have ideas, if you're running into issues, let us know and we would be happy to work with you.
And as always, please vote. Vote those ideas up because we pay attention. And then that's my email.
Please feel free to email me with your questions. I'm happy to help out. And that's all that I had and I'm happy to take questions as well.
Yes, there's one question there and we'll be running the mic so there's two mics and we'll try to spread them out. Yes, I think like I want to give a huge round of applause to my awesome helpers like I wouldn't be able to do this without them so thanks guys. Yes, what's your question, sir? So it looks like when we run, we're clicking through the different pieces, different data sources, different cleaning steps, it looks like it runs the entire query for the whole data set. Some of the ones I've poked around with, I've got some pretty big...
data sets, is it possible to pause that? Or does it query everything all at once, every time? So what it's doing is it's actually creating a cache.
So it's creating like a temp table. And the reason why we do that is to make sure that you're seeing the results. of your transformations along the way.
When you have really large data sets, what we'll do is we'll sample your data in this interactive experience while you're building out and cleaning and whatnot, and then we will apply your operations to your whole data set once you've created the output experience and clicked run. Okay. Yes. Other questions? This is actually going to be the last question because we're out of time, but you'll be demoing afterward for happy hour, everybody.
But here we go. No pressure. I'll make it good.
All right, so the question we have is when we compare our data, one of the issues we run between our systems is some... Sometimes the client name, perfect, but somebody will put LLC behind it. Sure.
Other system will have no LLC, will have ink. You know, the comparison between the two columns, is there a way to filter out easily within here, or is it better to do that on the source? data before even uploading?
So, yes, there are easy ways to do it. I would have to talk to you about specifically your case. So if we take the LLC example, where I want to filter out the label LLC, is that the example? Yeah, because what we'll run into is it'll be comma space LLC.
Next one will be space LLC period. You'll have so many different variations trying to filter all that out of the data. It's not not just like common. So it's one of those things where I'll go through and replace with nothing in Excel just to get rid of them all.
So then both tables match and can line up. Sure. Yeah.
So one of the things that you can do is you can align the data values by removing punctuation. Right. So it might help to remove all. of the periods and the spaces and then go in and create like a calculated field that says replace LLC with blank. So that's one thing that you could do.
There are filtering experiences that were we're working to introduce. Like for example, a wildcard filter would be very useful for you, where we give you a certain type of, where you put in a search string that says, anything that contains LLC, please take it out. So we're working on improved methods to allow you to continue to prep your data and additional filtering experiences that make that easier.
Perfect, thank you. Yeah, you're welcome. Hey, so I'll be at the happy hour if you have questions.
I would love to hear about them. So. So thank you all so much. Really appreciate your time.