Transcript for:
Generating Data Sets with AI in Excel

Hi everyone, welcome back to the channel. If you're new here, my name's Deb Ashby, I'm a Microsoft MVP, and today we're diving into something really cool. Using AI to make your work in Excel easier and faster.

Whether you're handling complex data, managing spreadsheets for work, or just need a quick way to generate sample datasets, then this is the tool for you. And the latter is exactly what we're going to cover in today's video. So... imagine this scenario. You're working on a new project and you need a data set, but creating one from scratch sounds like a complete time suck, which trust me, from experience, it is.

You also don't want to waste time browsing through the internet looking for free data sets that you can use that are not only relevant to what you're doing, but also are available in the public domain so you're not infringing on anyone's copyright. This is where ChatGPT is your best friend. And I have to say, since the release of ChatGPT, this is probably the number one reason that I use it.

And that is to generate sample data sets so I can practice my Excel skills or create tutorials like this one. And there are many people out there who are always looking for data sets. It's one of the number one comments I get underneath my TikTok videos. Can we have the data set? Where can I find data sets?

And these days, I always direct them to ChatGPT. So let's dive in and take a look at how we can create a really meaningful data set that we can use to practice our Excel skills. So the first thing we're going to do here is we're going to fire up ChatGPT.

And you do that simply by opening a browser of your choice and going to chatgpt.com. Now I'm using ChatGPT 4.0 or 4.0, however you want to call it. And this is the free version.

I haven't paid for anything. Now, if you've never been into ChatGPT before, This is what it looks like. It's a really, really simple interface. We have some suggestions in the middle here. And then at the bottom, this is where we type in what we call a prompt.

And a prompt is basically what you want ChatGPT to do. And there is a ton of information out there related to prompt engineering and how best to word your prompts so that you get relevant results. Because the more detailed your prompt, the more accurate the results are going to be that you get from ChatGPT. And I always say with software like this that's in its infancy, it's not perfect at the moment.

So sometimes you have to tweak and change a few things. But in general, I find this a great time saver. So let's start to construct our prompt. And I'm going to start out just by giving it some context.

So let's say I work in HR and I need a sample data set to practice my Excel skills with. So I'm setting the scene so that ChatGPT has a little bit of context. Now I'm going to type in what I actually want ChatGPT to generate. And a little tip here, when you're working in this prompt area at the bottom, if you want to get onto a new line, so you don't have everything just written in one big long paragraph, you need to press Shift Enter, and that will move you down. If you just press Enter, it's going to send your prompt through, which we don't want to do at the moment.

So now let's type in the rest of our prompt, and I'm going to be really specific about what I want. So this is my prompt. Please create me a sample data set that has the following columns. Employee name, job title, department, extension number, hire date, employment status, salary, and manager. So I've given it the column headings that I need.

And then I've said, I would like this data to be in tabular format so I can work with it in Excel. And I always like to add this on the bottom of my prompts. If I didn't put that, a lot of the time what ChatGPT will return is the information but delimited.

So we'd have the employee name, comma, then the job title, comma, then the next column, so on and so forth. Now that's fine. You can copy that and paste it into Excel, but then you need to go back into Excel and you need to do some data cleaning because it's all going to be in one column separated with commas. Now it's not difficult to break it up.

You could use text to columns, you could use various different functions and formulas. but we want this to be as easy as possible. So I always like to specify tabular format. I found that's a lot easier.

So let's send it through and see what it produces. Okay, we're looking pretty good. So I can see it's given me 50 rows of data.

Now I actually want a larger data set than this. I want 200 rows. Another thing I can see in here is that ChatGPT hasn't quite got the context right.

It's given me all my columns. I'm perfectly happy with that. But let's take a look at the first record where it says Alison Hill. Her job title is HR Specialist, but her department is Operations. If we take a look at Noah Rhodes, his job title is Software Engineer and his department is Finance.

So these two columns don't quite seem to match up. I would expect an HR specialist to be working in HR and a software engineer to be working in IT. So it hasn't quite got it right. So let's make some modifications. And the good thing about ChatGPT is that you don't have to retype everything in because it remembers what you've already typed.

So I've set the scene. It has the context. I don't need to type it again. So in my prompt, I could say, that's great.

However, I would like 200 rows of data and I would like the job title column and the department column to be contextual. And then I'm going to give it a little example, i.e. if someone has the job title HR specialist, they work in HR. Please try again. Let's send this through and see what it does. So this is looking a lot better.

If I scroll down, I can see that I now have 200 rows of data, which is perfect. And if we take a look at the first employee, Cody Larson, they're an operations manager and they work in operations. Cynthia is an HR specialist and she works in HR.

So now everything is tying together a little bit better. And like magic, just with a couple of prompts, we've created ourselves a really nice sample Excel data set that we can now use to practice formulas, functions, and other Excel utilities. So now that we have this, what do we do with it?

How do we get it into Excel? Well, for things like this, notice that we have this little download button at the top. So we can download the entire table into a CSV file.

So if we click this, I can see it's now downloading at the top. Let's double click to open it. And there is our lovely data set.

So all we need to do now is a little bit of tidying up. I'm going to select all of the columns. Let's double click to widen those out. Maybe I want to put this into an Excel table. Control T. Yes, my table has headers.

Let's click on OK. I can go in. I can change the table style.

So maybe let's go for this one. And then maybe I want to change the formatting on the salary column. Let's just do comma formatting with zero decimal places.

But in a very short amount of time, we've managed to create ourselves a really nice 200 row data set. And if you need a much larger data set, you just tell ChatGPT that. Now, what about if I wanted to generate a data set that I can use to practice pivot tables? Because pivot table data is a little bit different. With this type of data, we have unique records.

So each record here, each row is a different employee with different information. Whereas for pivot tables. We use aggregated data. So numerous records might have columns with the same values. For example, we might have a list of sales for certain stores within specific regions.

There won't just be one sale per store per region, there will be multiple. That's the type of data that we put into a pivot table. So can ChatGPT cope with that?

Can it create us data that we can use in a pivot table? Let's give it a go. Now, when we go back to ChatGPT, you can just carry on in this thread if you wanted to. I tend to like to clear the chat when I'm doing something different. So if we go up to the top here, we have a new chat button.

If we click that, it's just going to clear out everything. So now I need to set the context again. I'm a data analyst and I need you to generate some sample data that can be used in a pivot table.

Let's press shift enter. Now, this time I'm not. going to give ChatGPT any of the columns.

I'm just going to let it do it itself and see what it comes up with. But I do need to tell it the type of data that I want. Please use data related to coffee shop sales.

I would like at least a thousand rows of data in tabular format. Let's send this through and see how ChatGPT handles it. And here we go. Let's have a look. Now underneath it says...

I've generated a day set of a thousand rows of coffee shop sales data that can be used in a pivot table. The data includes fields such as the date and time of sale. Perfect. Store location.

Great. Coffee type. Size.

Payment method. Quantity sold. Total sales amount and the day of the week. You can now use this data for pivot table analysis in a tool like Excel, Google Sheets or a Python environment.

And if we take a look at this data, this is actually excellent data. So here I haven't even provided the columns that I want. I've just said, you decide.

And I think what it's come up with is actually really brilliant. The only thing I might change here is in the store column, it's just given me some very generic store names. I think it's store A, B, C and D it's added in here.

So I could modify my prompt and just add in some fictional store names to make it a little bit more interesting. So let's do that. I'm going to say, great.

Always good to praise ChatGPT. Can I have the same data set, but instead of store A, B, C, and D, replace with jarbucks, coffee bean, aromas, and coffee pit? Let's see if it can do it. And there we go, like magic. This amazes me every time it works.

So you can see that it's done exactly what I've asked. So now it's time to download the data set. We're just going to download the table again, because it's a lot of data. It's going to download as a CSV file, which we can then double click to open, and then we can do add tidying up.

So let's just double click to widen out these columns. And if I press control down arrow, you can see that, yes, we have a thousand rows. It says a thousand and one, because remember we have that header in there.

Now, I might want to do something with column A because we have The date and the time in there, I don't necessarily want the time, but that could possibly be quite useful. I might want to split these up across different columns though. Now I could go in and ask ChatGPT to do that. I could say where we have the date and time, put those in two separate columns. That could be the next things I ask ChatGPT to do.

Alternatively, I could just do it in Excel. I'll leave that for you to decide. Once again, I'm going to do control T. Let's put it into a table.

I'm going to turn off banded rows. And now I have a data set that I can use in a pivot table. So if we click on summarize with pivot table, let's put it on a new worksheet. And maybe I want to do an analysis by store. So let's drop that down into rows.

And I want to see the total sales. Let's drop it into values. And there we go.

Maybe I want to see by store and coffee type. Let's drop that down. And we can see how cool this is.

So for me, ChatGPT is absolutely awesome for generating data sets that we can use to practice with. Have you tried it out? Let me know down in the comments.

And if you enjoyed this video, make sure you give us a thumbs up and maybe consider subscribing. That's it for now. I will see you guys next time.