My God, okay, it's happening! Everybody,
stay calm - what's the procedure everyone? So, Python for Excel is here and it's ready to test in the Beta channel of Office
365. Excel has been getting a lot of updates, but this... This is going to put it in a different
league. So, you might be thinking, "That's not for me. I'm not a programmer, I'm not a geek. I just
do my daily Excel stuff. I don't need this." Wait until you see what I'm going to show you, and
I recommend that you watch the whole thing and take some time to digest it, because I didn't
want to give you a stroke. So, I've sprinkled all the cool stuff throughout the video.
So, sit back and let's start from scratch. Okay, so when you open up Excel and you
have Python, you are going to find it in the Formulas tab. There is a Python section here.
You can get started by inserting your own custom Python formula or exploring Python
samples. And then, start by taking this tour. But we are going to get started together
right now, and I'll show you how this works. The first thing we need to do is get into Python
mode. One way we can get there is by going to Insert Python and then inserting a Custom Python
Formula. Now, keep your eye on the formula bar and see what happens when I select that. The box
changes into a Python box. Now, another way you can get in Python mode is by just typing in "=PY".
This function creates a Python formula. So, the moment I press Tab, I'm already in Python
mode. What happens now? Well, let's say as our first example we want to give some insights
from this dataset. If you're working with data, the first step you need to do is to send your data
to Python, because it needs to be able to see it to work with it. The way we can do that here is
by just referencing the cell. I'm just going to select it, use the keyboard shortcuts, Ctrl + Shift
and the down arrow key, to select the whole range. And notice what we get in the formula bar. We
have our data range, "headers=True". That was automatically picked up, and everything is wrapped
inside an XL function. Now, when I press Enter, nothing happens. What Enter does is it adds
new lines to my Python script, because Python scripts could get long. So, it makes it easier to
work with these scripts. So, for me to be able to run this, instead of Enter, I need to press Ctrl
- Enter, and I get what is called a DataFrame. If you know Python, you know what this is. If not,
it's basically a condensed version of your table. It's a two-dimensional data structure that's
all sitting in a cell. Now, take a look at this. Here we have a dropdown. What is being returned
here is a Python Object, that's our data frame. But I can switch my view and return Excel Values.
When I do that, I get my data set back. But this takes up a lot of space, so I'm going to switch
it to a Python object and get my DataFrame. Now you can think of this DataFrame as an Ikea
table. Right, when you buy a new table from Ikea, it's all condensed in a single package. If
you want to get a glimpse of what's inside, you can click on this card, and you get
to see the beginning rows of your data set and the ending rows. This DataFrame is
a fundamental structure of a specific Python library that's called Pandas. The Pandas
Python library is great for data analysts, and you need to know the name. I know it sounds
weird, but you need to know it if you want to look up other functionalities in Python's library.
But I'm going to explain more about libraries and Pandas later. What I want to show you right
now is some of the cool things you can do here. So, for example, I'm going to go to another
cell, go to Python mode by typing in "=PY". I'm going to reference my DataFrame
and then use the Describe method. So, I'm just going to type in ".describe". So,
I'm in Python mode now, not Excel mode. I'm going to open and close brackets, press
Ctrl + Enter to run this, and I get a DataFrame back. But I want to see the Excel Values.
I'm going to switch my view to Excel Values, and I get to see some information about my data
set. So, I have the Quantity and Sales columns here. This is the count of each column,
the mean, standard deviation, and so on. Now, if we take a look at our data set, we
have date, product, quantity, and sales. Now, let's say I wanted to describe the Product column.
I'm just going to go and adjust my function. So, right after I reference the DataFrame, which is
the data set here, I'm going to put in my column "Product". The way you reference columns is
by opening up the square brackets, similar to referencing Excel tables, and then typing in, in
quotation marks, the name of the column headers. In this case, "Product". And I'm going to close
the square bracket, press Ctrl + Enter, and I have some information about products. So, I have
96 products, 15 are unique. The top is "Laptop Bag Red" because that occurs 15 times. It's a
quick way of getting insights from your data. Now, if you're planning on using the data that
you uploaded to Python and doing different types of calculations, it helps to give it a name. This
way, you don't always have to reference the cell. The way you do that is by going to your first
DataFrame and then just typing in any name, as long as it doesn't have spaces in it. So,
I'm going to call it "df" for DataFrame, equals this data set. So, when I run this right now,
nothing changes. All I've done is just give this a name. But when I come to do other types of
calculations based on this DataFrame, I no longer need to reference it by referencing the cell.
Instead, I can just type in "df", and that's it. Now, talking about shortcuts, another shortcut that I
like is to avoid these square brackets. So, whenever I can avoid them, I will. And the great thing with
the Pandas library is that it transforms your headers into attributes. So, as long as you don't
have spaces in the headers, you can actually type them out like this. So, I can mention the DataFrame
'dot' the header 'dot' describe, and when I run this, I get the same thing. But if they happen to have
spaces, you need to use the square brackets. Now, there are lots of functionalities in the
Pandas library. So, some simple ones that you're familiar with in Excel is, for example, the SUM
or the AVERAGE function. So, let's say I wanted to use Python and sum the sales column. I'm going to
go to Python mode, but this time using my shortcut keys Ctrl + Alt + Shift + P. Now, this sounds like
a complicated shortcut, but it's actually easy to reach for on the keyboard. Then, I'm going to
mention the DataFrame. I gave it a name, it's df, 'dot', I want to sum the sales column. So, I'm going
to go with ".sales", it doesn't have any spaces. I can reference it in this way. .sum method. Run
this, and I get the total of my sales column. So, let's just double-check this. That's 174510.
So, if I sum this, 174510. Now, if I want to get the average, I'm going
to use mean here. And that's the average. All of this you can do with simple Excel
formulas, right? Why do we need Python? So, let's take this a step further. Let's say I
want to get my total sales values for each date. Notice, my dates are repeated, but
I want to aggregate them for each date, similar to what you would do in a pivot
table. I'm going to go to Python mode, then reference my DataFrame, which was df,
.groupby I want to group everything by the date column. And I want to have the total values
from sales. I'm going to go with .sales, .sum, and I'm going to press Ctrl + Enter. And
I'm going to get my series returned as a Python object. But if I want everything as Excel
values, I'm just going to spill them, and I have the total sales for each date. And if you don't
believe me, let's just double-check this. So, this is for the first of June, and I have these
four numbers here. That's 9680. That's 9680. Now, what if I change my mind and I want
everything on a monthly basis? Well, check this out. I'm going to go inside
groupby, and I'm going to use a class called pd.Grouper. My key equals the date
column, and my frequency equals, I'm going to put M here, and close the bracket. Run this,
and I get my sales on a monthly basis. So, total sales for June should be 38060. Let's
just double-check this. And we have 38060. Now, okay, you might say, "Well, a pivot table
can do this." The great thing about this is that these are formulas. So, if I change this value
to let's say 6000. Keep an eye on the June value here. And I press Enter, all of these refresh
immediately. I'm just going to press Ctrl + Z to go back. Something to keep in mind is that you
always get the last expression returned. So, you could be writing a lot of code, but whatever
you end up seeing in the cell is based on the last expression. Let me show you something even cooler.
I'm going to give this a name. I'm going to call this "forchart," and we are going to put a chart
in a cell. So now, I've called this series "for chart," and I want to plot this series. I'm going
to do forchart.plot. I want to put a line chart. My x-axis is going to have my dates, which are
in the date column. My y-axis is going to have my sales values. And the kind of chart I want
to plot is a line chart. I'm going to run this, and I get a tiny little line chart in a
cell. You probably really can't see this, so let me go and merge the cells, and we get
to see our monthly sales values. How cool is that? If I wanted an area chart, I'm going to just
change this to "area," and I have an area chart. If I wanted frequency to be on a weekly basis,
I'm going to change this to "W," and I get my weekly sales data shown as an area chart. I prefer
line charts. I'm going to change that. So, notice that the last expression is the one that plots
the chart for me, and that's what's returned. Now, if you're wondering how I'm coming
up with this syntax, how do I know these classes and the methods? Well, there is
great documentation on this in the Pandas library. You have lots of information here on
this webpage to help you get started and lots of code to play around with. Plus, you've got
AI. There is Bing Chat and ChatGPT that can help you when you get stuck. Just double-check
your results to make sure it's what you want. Now, before I move on and I show you some even
more cooler stuff, there is an important thing to keep in mind, and that's the calculation logic
of Python Cells. Python Cells calculate from left to right and top to bottom. So, notice here,
first, I've created the DataFrame, and here, I'm referencing the DataFrame. I'm on the
right side. Here, I'm below it. But if I copy and paste this here, I'm going to end up with
a hash value. I get diagnostics pop up here, and it tells me "name 'df' is not defined."
That's because Python Cells calculate from left to right and top to bottom. I
would have to bring my DataFrame here for everything to calculate fine. So,
I'm just going to demonstrate that. I'm going to Ctrl + X and Ctrl + V this,
and now, everything calculates fine. Now, let me show you some cool examples from
the default libraries that we have. So here, I'm using pd.melt function to restructure this
data set. So, I have quantity and sales in two separate columns. When I use pd.melt, I
can bring them in the same column. So, this is similar to "unpivot" in Power Query.
And look at the formula bar here. It's very simple to write this. I just need to mention my
non-value columns, which are date and product, and my value columns, which are quantity and
sales. And it does the whole restructuring for me. Now, another cool one is query. Take a look
at this. I'm going to go into Python mode, reference my data set, which is df, right? That's
my DataFrame that I uploaded to Python. And I'm going to use query. Then, I'm going to put
this in. What this does is it takes a look at the sales column. It grabs everything
that's above 2000 and takes a look at my product column to see if anything contains
the word "black". So, this is similar to the QUERY function in Google Sheets, which is also
similar to the syntax from SQL. When I run this, by default, I get a DataFrame back.
But I'm going to show the Excel values, and I get everything that has the
word "black" and sales is above 2000. Now, let's move on to something else. We can
create charts like this. I have small multiples returned as a single object. Take a look at
this one. This is a bit more complicated, but I also wanted to show you that you can write
longer scripts and get Python to do what you want. So here, I just have dates and students from 2018
to 2022. And what I wanted to do here was this: I want to return the last four years, and
I wanted to arrange the charts beside one another. I also wanted to make sure that
the y-axis is showing the same values, right? So that I can easily compare each of these
with one another. Now, I've even added labels to this. And you can do all of that in Python here.
You can set the title, set the label, and so on. And yeah, I didn't do all of this on
my own. I didn't know I had to flatten the axis, so I asked ChatGPT for help, and it
helped me get what I wanted right here. Now, this is my favorite part of the video. What
we're going to do is this: We have an Excel table, and we have some feedback in each row. What
we want to do is grab any URLs that we have inside each row. The tricky part is that
they might be in the middle of the text, and the pattern isn't so obvious. So, sometimes
they start with www, sometimes we don't even have that, they just end with .com. Sometimes
we have https and some might just have http. Some might not even have a link. What we want
to do is just grab any URLs in there. So here, you can see we're grabbing both of these
in the cell, and then we're omitting this one because it didn't have any links. Then
we have this and this. How can we do this? The great news is you can import
any Python libraries or modules that might not be there by default. So,
the ones we used are there by default: the Pandas library for data analysis,
Matplotlib for data visualization. But Python has so many different libraries
and modules. There are ones for AI, for web, for regular expressions. So this RE that I'm
importing here works with regular expressions. All I have to do is go to Python mode and import the
library or module, and I've returned "re loaded." That's just because I want to see something
in the cell aside from none, which is what I would see if I don't have this. Everything still
works, but I'm going to see none or 0 in the cell. Right, so I'm just going to press Ctrl + Z
to go back. Now, after I imported the library, what I did is create a data frame
based on the table. And the way you do that is similar to before, except
you just have to make sure that you have the right referencing for the table
that you are including everything. So, I have "#All" to make sure that I'm also including
the header. Now, when I hover over the DataFrame here and click on the card, I can see that
the column header is a part of my DataFrame. Now, the next step is to write some Python code to
get this done. What is done here is that I've combined the patterns using this. This is based
on the patterns that I can see in my data set. This makes sure that anything that ends with
.com is included, that anything that has https, http, or www is also included. Then, there
is a function here that grabs the URLs, and this function is run on each single cell, so on
each single text that I have in my DataFrame. You can also see the syntax here. Wherever you
come across the hash sign, this means that this is a comment. This is a good way of also testing
your code. You can comment things out to see what is the last expression that's returned or just
add in some documentation to your code. Now, the end here, we're just filtering out any
rows that don't have any URLs. Now, the great thing is because it's referencing the table, it is
dynamic. So, if I go in and add another URL, so let's say youtube.com, this is going to
be automatically added to my URL list here. Now, if you're wondering whether you can connect
Python to your Power Queries, yes, you can. You don't need to have the data physically loaded in
your workbook. They can be there via Connection Only, and you can create a DataFrame if you want
based on these and use Python code on them. Now, this video is getting super long, so I'm
going to leave this topic for a future video. As you can see, I'm super excited about this.
It just opens up a whole new dimension in Excel. What do you think about it? Let me know
in the comments. Thank you for being here, thank you for watching, and I'll
catch you in the next video.