Transcript for:
Python for Excel Lecture

My God, okay, it's happening! Everybody,  stay calm - what's the procedure everyone? So, Python for Excel is here and it's   ready to test in the Beta channel of Office  365. Excel has been getting a lot of updates,   but this... This is going to put it in a different  league. So, you might be thinking, "That's not for   me. I'm not a programmer, I'm not a geek. I just  do my daily Excel stuff. I don't need this." Wait   until you see what I'm going to show you, and  I recommend that you watch the whole thing and   take some time to digest it, because I didn't  want to give you a stroke. So, I've sprinkled   all the cool stuff throughout the video.  So, sit back and let's start from scratch. Okay, so when you open up Excel and you  have Python, you are going to find it in   the Formulas tab. There is a Python section here. You can get started by inserting your own   custom Python formula or exploring Python  samples. And then, start by taking this   tour. But we are going to get started together  right now, and I'll show you how this works. The first thing we need to do is get into Python  mode. One way we can get there is by going to   Insert Python and then inserting a Custom Python  Formula. Now, keep your eye on the formula bar and   see what happens when I select that. The box  changes into a Python box. Now, another way you can   get in Python mode is by just typing in "=PY".  This function creates a Python formula. So,   the moment I press Tab, I'm already in Python  mode. What happens now? Well, let's say as   our first example we want to give some insights  from this dataset. If you're working with data,   the first step you need to do is to send your data  to Python, because it needs to be able to see it   to work with it. The way we can do that here is  by just referencing the cell. I'm just going to   select it, use the keyboard shortcuts, Ctrl + Shift  and the down arrow key, to select the whole range.   And notice what we get in the formula bar. We  have our data range, "headers=True". That was   automatically picked up, and everything is wrapped  inside an XL function. Now, when I press Enter,   nothing happens. What Enter does is it adds  new lines to my Python script, because Python   scripts could get long. So, it makes it easier to  work with these scripts. So, for me to be able to   run this, instead of Enter, I need to press Ctrl 

  • Enter, and I get what is called a DataFrame.   If you know Python, you know what this is. If not,  it's basically a condensed version of your table.   It's a two-dimensional data structure that's  all sitting in a cell. Now, take a look at this.   Here we have a dropdown. What is being returned  here is a Python Object, that's our data frame.   But I can switch my view and return Excel Values.  When I do that, I get my data set back. But this   takes up a lot of space, so I'm going to switch  it to a Python object and get my DataFrame. Now you can think of this DataFrame as an Ikea  table. Right, when you buy a new table from Ikea,   it's all condensed in a single package. If  you want to get a glimpse of what's inside,   you can click on this card, and you get  to see the beginning rows of your data   set and the ending rows. This DataFrame is  a fundamental structure of a specific Python   library that's called Pandas. The Pandas  Python library is great for data analysts,   and you need to know the name. I know it sounds  weird, but you need to know it if you want to look   up other functionalities in Python's library.  But I'm going to explain more about libraries   and Pandas later. What I want to show you right  now is some of the cool things you can do here. So, for example, I'm going to go to another  cell, go to Python mode by typing in "=PY".   I'm going to reference my DataFrame  and then use the Describe method. So,   I'm just going to type in ".describe". So,  I'm in Python mode now, not Excel mode.   I'm going to open and close brackets, press  Ctrl + Enter to run this, and I get a DataFrame back. But I want to see the Excel Values.  I'm going to switch my view to Excel Values,   and I get to see some information about my data  set. So, I have the Quantity and Sales columns   here. This is the count of each column,  the mean, standard deviation, and so on. Now, if we take a look at our data set, we  have date, product, quantity, and sales. Now,   let's say I wanted to describe the Product column.  I'm just going to go and adjust my function. So,   right after I reference the DataFrame, which is  the data set here, I'm going to put in my column   "Product". The way you reference columns is  by opening up the square brackets, similar to   referencing Excel tables, and then typing in, in  quotation marks, the name of the column headers.   In this case, "Product". And I'm going to close  the square bracket, press Ctrl + Enter, and I   have some information about products. So, I have  96 products, 15 are unique. The top is "Laptop   Bag Red" because that occurs 15 times. It's a  quick way of getting insights from your data. Now, if you're planning on using the data that  you uploaded to Python and doing different types   of calculations, it helps to give it a name. This  way, you don't always have to reference the cell.   The way you do that is by going to your first  DataFrame and then just typing in any name,   as long as it doesn't have spaces in it. So,  I'm going to call it "df" for DataFrame, equals   this data set. So, when I run this right now,  nothing changes. All I've done is just give this   a name. But when I come to do other types of  calculations based on this DataFrame, I no longer   need to reference it by referencing the cell.  Instead, I can just type in "df", and that's it. Now, talking about shortcuts, another shortcut that I  like is to avoid these square brackets. So, whenever I   can avoid them, I will. And the great thing with  the Pandas library is that it transforms your   headers into attributes. So, as long as you don't  have spaces in the headers, you can actually type   them out like this. So, I can mention the DataFrame  'dot' the header 'dot' describe, and when I run this,   I get the same thing. But if they happen to have  spaces, you need to use the square brackets. Now, there are lots of functionalities in the  Pandas library. So, some simple ones that you're   familiar with in Excel is, for example, the SUM or the AVERAGE function. So, let's say I wanted to   use Python and sum the sales column. I'm going to  go to Python mode, but this time using my shortcut   keys Ctrl + Alt + Shift + P. Now, this sounds like  a complicated shortcut, but it's actually easy   to reach for on the keyboard. Then, I'm going to  mention the DataFrame. I gave it a name, it's df,    'dot', I want to sum the sales column. So, I'm going  to go with ".sales", it doesn't have any spaces. I   can reference it in this way. .sum method. Run  this, and I get the total of my sales column. So,   let's just double-check this. That's 174510. So, if I sum this, 174510. Now,   if I want to get the average, I'm going  to use mean here. And that's the average. All of this you can do with simple Excel  formulas, right? Why do we need Python? So,   let's take this a step further. Let's say I  want to get my total sales values for each   date. Notice, my dates are repeated, but  I want to aggregate them for each date,   similar to what you would do in a pivot  table. I'm going to go to Python mode,   then reference my DataFrame, which was df,  .groupby I want to group everything by the date   column. And I want to have the total values  from sales. I'm going to go with .sales, .sum,   and I'm going to press Ctrl + Enter. And  I'm going to get my series returned as a   Python object. But if I want everything as Excel  values, I'm just going to spill them, and I have   the total sales for each date. And if you don't  believe me, let's just double-check this. So,   this is for the first of June, and I have these  four numbers here. That's 9680. That's 9680. Now, what if I change my mind and I want  everything on a monthly basis? Well,   check this out. I'm going to go inside  groupby, and I'm going to use a class   called pd.Grouper. My key equals the date  column, and my frequency equals, I'm going   to put M here, and close the bracket. Run this,  and I get my sales on a monthly basis. So,   total sales for June should be 38060. Let's  just double-check this. And we have 38060. Now, okay, you might say, "Well, a pivot table  can do this." The great thing about this is that   these are formulas. So, if I change this value  to let's say 6000. Keep an eye on the June value   here. And I press Enter, all of these refresh  immediately. I'm just going to press Ctrl + Z to   go back. Something to keep in mind is that you  always get the last expression returned. So,   you could be writing a lot of code, but whatever  you end up seeing in the cell is based on the last   expression. Let me show you something even cooler.  I'm going to give this a name. I'm going to call   this "forchart," and we are going to put a chart  in a cell. So now, I've called this series "for   chart," and I want to plot this series. I'm going  to do forchart.plot. I want to put a line chart.   My x-axis is going to have my dates, which are  in the date column. My y-axis is going to have   my sales values. And the kind of chart I want  to plot is a line chart. I'm going to run this,   and I get a tiny little line chart in a  cell. You probably really can't see this,   so let me go and merge the cells, and we get  to see our monthly sales values. How cool is   that? If I wanted an area chart, I'm going to just  change this to "area," and I have an area chart.   If I wanted frequency to be on a weekly basis,  I'm going to change this to "W," and I get my   weekly sales data shown as an area chart. I prefer  line charts. I'm going to change that. So, notice   that the last expression is the one that plots  the chart for me, and that's what's returned. Now, if you're wondering how I'm coming  up with this syntax, how do I know these   classes and the methods? Well, there is  great documentation on this in the Pandas   library. You have lots of information here on  this webpage to help you get started and lots   of code to play around with. Plus, you've got  AI. There is Bing Chat and ChatGPT that can   help you when you get stuck. Just double-check  your results to make sure it's what you want. Now, before I move on and I show you some even  more cooler stuff, there is an important thing   to keep in mind, and that's the calculation logic  of Python Cells. Python Cells calculate from left   to right and top to bottom. So, notice here,  first, I've created the DataFrame, and here,   I'm referencing the DataFrame. I'm on the  right side. Here, I'm below it. But if I copy   and paste this here, I'm going to end up with  a hash value. I get diagnostics pop up here,   and it tells me "name 'df' is not defined."  That's because Python Cells calculate   from left to right and top to bottom. I  would have to bring my DataFrame here   for everything to calculate fine. So,  I'm just going to demonstrate that. I'm   going to Ctrl + X and Ctrl + V this,  and now, everything calculates fine. Now, let me show you some cool examples from  the default libraries that we have. So here,   I'm using pd.melt function to restructure this  data set. So, I have quantity and sales in two   separate columns. When I use pd.melt, I  can bring them in the same column. So,   this is similar to "unpivot" in Power Query.  And look at the formula bar here. It's very   simple to write this. I just need to mention my  non-value columns, which are date and product,   and my value columns, which are quantity and  sales. And it does the whole restructuring for me. Now, another cool one is query. Take a look  at this. I'm going to go into Python mode,   reference my data set, which is df, right? That's  my DataFrame that I uploaded to Python. And I'm   going to use query. Then, I'm going to put  this in. What this does is it takes a look   at the sales column. It grabs everything  that's above 2000 and takes a look at   my product column to see if anything contains  the word "black". So, this is similar to the   QUERY function in Google Sheets, which is also  similar to the syntax from SQL. When I run this,   by default, I get a DataFrame back.  But I'm going to show the Excel values,   and I get everything that has the  word "black" and sales is above 2000. Now, let's move on to something else. We can  create charts like this. I have small multiples   returned as a single object. Take a look at  this one. This is a bit more complicated,   but I also wanted to show you that you can write  longer scripts and get Python to do what you want.   So here, I just have dates and students from 2018  to 2022. And what I wanted to do here was this:   I want to return the last four years, and  I wanted to arrange the charts beside one   another. I also wanted to make sure that  the y-axis is showing the same values,   right? So that I can easily compare each of these  with one another. Now, I've even added labels to   this. And you can do all of that in Python here. You can set the title, set the label,   and so on. And yeah, I didn't do all of this on  my own. I didn't know I had to flatten the axis,   so I asked ChatGPT for help, and it  helped me get what I wanted right here. Now, this is my favorite part of the video. What  we're going to do is this: We have an Excel table,   and we have some feedback in each row. What  we want to do is grab any URLs that we have   inside each row. The tricky part is that  they might be in the middle of the text,   and the pattern isn't so obvious. So, sometimes  they start with www, sometimes we don't even   have that, they just end with .com. Sometimes  we have https and some might just have http.   Some might not even have a link. What we want  to do is just grab any URLs in there. So here,   you can see we're grabbing both of these  in the cell, and then we're omitting this   one because it didn't have any links. Then  we have this and this. How can we do this? The great news is you can import  any Python libraries or modules   that might not be there by default. So,  the ones we used are there by default:   the Pandas library for data analysis,  Matplotlib for data visualization.   But Python has so many different libraries  and modules. There are ones for AI, for web,   for regular expressions. So this RE that I'm  importing here works with regular expressions. All   I have to do is go to Python mode and import the  library or module, and I've returned "re loaded."   That's just because I want to see something  in the cell aside from none, which is what I   would see if I don't have this. Everything still  works, but I'm going to see none or 0 in the cell. Right, so I'm just going to press Ctrl + Z  to go back. Now, after I imported the library,   what I did is create a data frame  based on the table. And the way you   do that is similar to before, except  you just have to make sure that you   have the right referencing for the table  that you are including everything. So,   I have "#All" to make sure that I'm also including  the header. Now, when I hover over the DataFrame   here and click on the card, I can see that  the column header is a part of my DataFrame. Now, the next step is to write some Python code to  get this done. What is done here is that I've   combined the patterns using this. This is based  on the patterns that I can see in my data set.   This makes sure that anything that ends with  .com is included, that anything that has https,   http, or www is also included. Then, there  is a function here that grabs the URLs,   and this function is run on each single cell, so on  each single text that I have in my DataFrame. You can also see the syntax here. Wherever you  come across the hash sign, this means that this   is a comment. This is a good way of also testing  your code. You can comment things out to see what   is the last expression that's returned or just  add in some documentation to your code. Now,   the end here, we're just filtering out any  rows that don't have any URLs. Now, the great thing   is because it's referencing the table, it is  dynamic. So, if I go in and add another URL,   so let's say youtube.com, this is going to  be automatically added to my URL list here. Now, if you're wondering whether you can connect  Python to your Power Queries, yes, you can. You   don't need to have the data physically loaded in  your workbook. They can be there via Connection   Only, and you can create a DataFrame if you want  based on these and use Python code on them. Now,   this video is getting super long, so I'm  going to leave this topic for a future video.   As you can see, I'm super excited about this.  It just opens up a whole new dimension in Excel.   What do you think about it? Let me know  in the comments. Thank you for being here,   thank you for watching, and I'll  catch you in the next video.