Transcript for:
Managing SharePoint Files with Pandas

what up guys what up what up guys i am lou and uh we got another video guys another another video this is from a request from a viewer and ultimately guys we're going to be talking about how do we download a file let's say from sharepoint in this case going to be sharepoint but this really applies to any other source doesn't matter download a file from sharepoint we're going to save this in memory we're going to modify the file and in my case I'm going to use pandas to modify the file. After I finish modifying the file, then I'm going to go ahead and take that updated version and save it back to SharePoint. So all we're doing is pulling the file that we want. We're going to modify it and then push it back. And not everything's being done in memory as well. So it's not like we're downloading it, saving it locally. Then we got to read it. put in memory to modify and then we save it locally then upload it again we're not doing all that so this method would work would be very ideal especially if you want to publish this code in a um you know aws lambda or azure functions or things of that nature right so but but guys but but before we get started hit that like button give me a follow give me a follow as well guys on twitch So I've been putting some effort on Twitch. That channel, I'm going to be talking more hardware related tech. So in this channel, programming. On my Twitch channel, it'll be more hardware. I'm going to be reviewing two mechanical keyboards this Friday. I'm reviewing different stuff. Headsets, mouses, keyboards, GPUs, motherboards, the list goes on, right? So I'm very hardware more specific. But go ahead and give me a follow there, man. I appreciate it. Uh, let's get started. Alright guys, so... One of the things that, um... So, we're not gonna start from scratch. you're right because i already got code that's going to do a majority of what we're trying to do so there's no need to start over from scratch so i'm going to be using this office 365 api file that i put together you could access this through my um github account so go ahead and find imlu coding and there's a repo called python sharepoint office 365 api So if you take a look at it, you'll, you know, pretty much this is what I have. You can just copy and paste this. If you need help on how this should be configured, your environment variables, go ahead and look at the video. Here's the video right here. If you go to my channel, I am Lou. There is a video that is called Python download SharePoint files part one. okay so i have a multi-part series you can tell you got part two part three of other stuff but this one is downloading files from sharepoint pirate one this part goes very detailed when it comes to the environment variables how to configure kind of how to get that configure how does the the site url looks compared to like my site um the site name all of this stuff right goes very detailed so take a look at that guys if you're not familiar with it and that video will kind of give you some guidance over that piece nevertheless if you are familiar with all of that then just literally copy and paste this in your project this is what we're going to be using um but then of course i'm going to have another file this is where i'm going to be downloading i just call it download and re-upload so we're downloading we're going to modify it and we're going to literally push back that's that's what we're doing here right so for this if i go back to my um My repo, I have, if you go to the project under examples, click on download files. Pretty much I'm going to take this, but I'm going to tweak it, right? Now, there's a lot of stuff in here that I need because it's exactly what I want to do, right? So let me go ahead and copy it, but I'm not going to use it as is. As you could tell, if you take a look at this, it is saving it. i don't want to save it right like we don't want to save it nowhere as we downloaded we want to keep it in memory and then do whatever we need to do with it and then push it back so there's no need to save it anywhere and you can tell i have arguments here where it says argument 2 is pretty much doing a local remote folder destination well this is where i want to save it locally or i'm not saving it locally so this could actually go away Right, so now we have for arg2, then arg3. So I do need to change my values to arg. and two and argument three so these i do want right as you could tell first argument is sharepoint i'm going to specify the sharepoint uh pass so here's one of my examples youtube 2022 and then course I have file name this could be set to none or you could provide if you want a specific file that you want to download right you don't want that on all the files maybe just want one file so our case we're actually going to use this because we didn't want to also the one file specifically um and then of course you have patterns you know if you are let's say you're not going to download one file but you're looking at a folder with many files and and you're trying to access a specific pattern you can do it that way as well and it will give you the list of files of the pattern but again we're going to focus on one file in our case we don't need the save function because we're not saving nothing this is going to go away you can tell we have um you know error here but we'll fix that but ultimately this is going to go away because we were now gonna save it nowhere right and all of this could stay this could stay this is our normal so all of this could stay right these are just some tweaks that then we're making there's a few things that we're gonna have to bring in so in my case I'm gonna be modifying that the goal here that I'm gonna do is I want to download an excel file specifically from SharePoint so because I'm doing it and I'm gonna put it into a data frame right pandas data frame so because of that make sure to install pandas i already have it installed in my virtual environment um and then i'm going to go ahead and bring that in so i'm going to call this import uh pandas as pd all right so that's number one another thing i need to bring in here that this this you know the code that copy that they have I'm gonna use the IO package and there is a bytes IO class that I'm gonna be utilizing and ultimately that's where as I bring in my car content i'm going to put that i'm going to end up uh package that into in memory and to do that use bytes io class so that's kind of what i'm going to be doing here and i believe this is it guys i believe this is that well i don't need the pure pass because i'm not reading nothing right i'm not reading no local directories or whatever so that could go away this is it so now let's go ahead and uh let's add what we need right that's the other thing now we need it what what are we going to add add so the first thing that i'm going to do is since we know we want to get files well we have a function already here where again because we downloaded it from the download.py file so there they function here to get files and could tell it's calling the sharepoint class which is up here and then it's calling the download files function and then ultimately it's able to download it right now i'm getting back my file object so that that's all good that's going to stay but i now i do need to upload files right as i download files i'm going to be modifying it then i'm going to be uploading them back so let's go ahead and create that class right quick and we're going to call this upload upload file that's what i'm going to call it right upload file so in this case this is going to take a few arguments it's going to be a file name so this is the actual textual file name whether whether it be sales report dot x y uh uh sx right is the excel file extension because i'm dealing with files if you're dealing with csv it'll be you know sell the report.csv you know whatever that name is that's what we're going to have here with the file extension as well uh the next thing that we're going to specify is going to be the folder name or the folder name is where where do you want to upload it to right in our case we're going to upload it to the same folder but it doesn't have to be like we could specify a whole different different folder to upload it to you but again rk same folder and then the final thing is going to be the context this is going to be the actual data like the file data itself that we're pushing on over and since the sharepoint class that i have here i really have a function for uploading which is called upload files i'm going to end up calling that function so in this case it would be share point oops uh sharepoint upload file you can do it in chunks but in our case we're gonna stick with just upload file like the whole file all at once then i'm gonna specify file name all right and then folder name um and then you the context it just happened to be that these these are the same name well uh yeah no context so if we take a look at this this function you could tell we have file name folder name context happening the same name so coincidence but it doesn't matter that they're the same name because i'm passing them in the same placement right so even if these this will call new file name um folder only or maybe data again it's it's in the right placement so that's all that matters uh really this is it though guys this is all you need to pass here the function this function will upload files back to where they need to go which is is pretty easy right again taking out what we already have we're doing some tweaks to it that would get the data back to SharePoint so we have this function to download a file for SharePoint we have the file here to upload the file back to SharePoint so now what we need to do is we need to create a function to let me make it a bit bigger because I think it's kind of small there it goes so now what I need to do is create another function and function is going to be more so to modify the file right do something with the file there's some kind of manipulation that we're gonna do to it's out for now I'm gonna call it modify file and this would be whatever action you're doing right if you're downloading a file maybe you're doing some sort of cross reference of some sort where you're you're iterating over the data by email and maybe you're cross-referencing that email by another data set to retrieve back some kind of identifier and then maybe you're going to add the the identifier to a new column i mean get the list goes on or what you could do with it we're going to do something basic to kind of give you the concept and the idea so in this case we're only going to have one um one argument which is file object so again if i go back see where we get our get file we get back this file object well this file object is going to get get passed into this function which is called modify file that's pretty much what we're doing here right so this is where as we get back that context from um sharepoint this is where i'm going to now utilize my uh bytes io class and i'm going to end up passing in this file object so by me passing it to the class class ultimately what it's doing is now i'm i have this data set this data object um that i could read now like i could pass this in through open py excel if i want to like read it through that package in my case i'm going to read it through pandas which is ultimately using open py excel so keep in mind if you're usually doing something like what i'm doing like with pandas and reading excel files that is a package that you're gonna have to install as well so if you're not familiar with open pyxl you know that is a package that you're gonna have to install um but that's kind of what i'm doing here first so once i do that then what um what i'm gonna end up doing is before i even read the data i want to be able to get a list of worksheets in the file right so this is the kind of this is key uh i'm gonna call this worksheet list and this is gonna call my panda and there is an excel file class which ultimately takes in a data set like takes in an object pretty much in this case it could be the file directory pass like if you're working locally it'll be your c drive you know your folder name then your file name right your full pass pointing to your folder your file and ultimately this will read that file in our case where we we're dealing with a an actual object um a um that we're going to read it we're pretty much dealing with in-memory object so it's going to react the same way they're going to read it no different if you're if you're reading it locally and then we're going to call sheets name so sheets name is going to ultimately give me back all the sheets in the workbook uh in the list which is why I called this worksheet list so if you have an excel file that has I don't know maybe ten sheets but you only want to modify one of them then you're gonna have to specify which one you're going to modify in our case um we're only dealing with a workbook with one sheet right so just kind of FY I'm going to make another video later on how to handle workbooks with many worksheets that you're only going to modify one because it's slightly different but I'm not going to do it in this video that's a whole different video. All right so once I specify to be able to get my worksheets list now what I could do is I could go ahead and read my in-memory data into a data frame. In order to do that I'm going to call this data frame and it will be panda read excel and then see where we have our io file pass or buffer bytes again it takes one or the other it takes a a file pass slash bytes which is ultimately what we're doing here in my case is going to be we're passing a bytes object which i call data so i'm passing that in one of the other arguments see what's the sheet name right so for sheet name i'm going to call my worksheet list index zero i only have one sheet in here so i'm specifying the first sheet so for me i could just leave this blank and that will be fine because by default it's going to read only the first worksheet anyways right but um you let's say you weren't you didn't want the first sheet maybe you wanted sheet number two that you want to specify maybe index number one right because again index zero is really the first sheet and then one is the second sheet you would have to do that so i'm just doing it more from um just kind of give an example if you want to pull other sheets so this is going to be the first sheet put the information into a data frame which is exactly what we want um do some then in here I'm gonna I'm gonna I'm gonna I'm gonna I'm gonna call this apply mod of Apply changes. So this is where I'm going to do something to it, right? I'm going to be changing something. So in my case, I'm going to create a new column. I'm just going to literally call this new column for testing so we could see what it is. And then I'm going to call this. testing new column that's something basic again in your case this would be completely different right maybe you're adding two columns together you get a total uh maybe doing some logic based on if it's greater than 30 you know change the value to this or whatever right the list maybe you you want to iterate and and change all of your values to uppercase values instead of lowercase values i mean again the list goes don't know what you're trying to do but you're trying to give the concept so once i do that something simple right nothing crazy we're just doing a basic change um this so the next step is um this step will uh now create the Excel object in... damn it man I don't know cannot spell today... in memory. So now we're going to be the next step is pretty much what as we make changes as you could tell what we did so far we read from a SharePoint site. We have our context. We pretty much package this context now into a bytes object. And then we're now able to read it into a data frame. Now we modify the data. data frame well now we want to take that data frame and now we want to save it back but before we save it we can just save it directly to sharepoint you know the way like normally to save it you would just do like a save and specify local directory and it would just save there and that's you know so straightforward but now we want to save back to sharepoint so it's not that straightforward what we're going to have to do here is now we're going to create because again think about we have a data frame object we need to create that data frame object into an excel you in-memory object because we need it in memory right we want it to be saved in memory not locally and once we do that then we could take that object and now push it back to sharepoint right so that's kind of what we're doing here so in order to do that i'm going to create this this is going to be this is going to call be called output object that's what i'm going to call it and now i'm going back to my i o so i o bytes and in this case case is going to be blank. So this is just a, I'm creating output object. That's empty. There's nothing in it, right? that's all i'm doing it's empty for now all right so that's number one um In Panda, there is something called Excel Writer. So we're going to be calling that class. And ultimately, this is to create Excel objects is what we're ultimately doing here. So we have an in-memory object to begin with. And now we're going to be creating an Excel object. So what I'm going to do, I'm going to call this Writer. And this is going to call PWExcelWriter. And I'm going to... My... my see where you could specify your pass whatever but in my case i'm specifying a bytes object that's what i'm going to be specifying in here which is called output object okay so now that we have that done uh the next thing is we're gonna now write to our we're gonna um export out our data frame so the way you would normally do it in any scenario right if you're going to do it locally it would be like to excel and then see where it says um it will be to excel and you kind of specify your pass and ultimately that's pretty much it right you know you're done but in this case it's not we're not writing it to some sort of directory pass we're writing it to our our Excel writer, which ultimately our Excel writer object happens to be an in-memory object. So, I mean, hopefully that's, it's not too confusing, but just kind of, let's kind of walk through it, right? The next thing is, see where I'm going to call, it's called index. Index zero. So this is key. If you don't specify index zero, what that means is it's going to add a column and that column that it adds is going to be for to. show like the sequential index of all of your rows so i'll show you an example i guess i'll try let's say you have you have a worksheet that you have first column is name second column phone number and then it's age you well if you don't put index zero when i export it out you're gonna have those three columns but column the very first column though in your excel export it's not going to be name it's going to be um index and it's going to show a sequence of one two three four five pretty much listing them but an actual it's taken up which is going to be placed in column a in excel so if that's something that you want by all means you could leave it there like it will give you that most people i'm going to see them don't want want that so this is where you specify index zero to remove that get that removed so the next thing too is sheets name if you leave it blank what happens is it'll export it out and save it but you're going to save it as default which is sheet one well in our case we want to save it back to the same name that we got it right again maybe you don't but if you do this is where i'm going to now call my worksheet list index zero So again, this worksheet list is going to give me back a list of worksheet names. The worksheet that we're going to be dealing with only has one name anyways, right? So there's only one name. But nevertheless, that same name, whatever that name happens to be, I want it to, whenever we save it, you know, to Excel, I want it to have the same sheet name. If that's what you want, then this is what we would do. If you do not want that, you can just remove it, leave it blank, and the default will be sheet one. Just kind of keep that in mind. And once we do that, next I'm going to do will be writer save. I'm going to save the changes. And then it's going to be output seek zero. So this is what we're going to specify. Seek zero for our bytes object. Um... this is it so this is it what we just did right now we have this output object and that object which is in memory now think of it as it has our excel file in there with the same worksheet name with all of the data and all that stuff has everything in there. So now we could go ahead and use that object output, you know, output object, um, to push back to SharePoint ultimately. Right. Um, so now see how we have this function, which we call upload file. So now I'm going to end up calling that upload file. and then again it wants a file name in my case i'm gonna create i want it to be a unique file name for testing so i'm gonna call it um i don't know let's call it what uh testing modify excel file then the file extension because this is what we're testing now so i'm just again calling it that if it's going to be the same name that you had you could do that as well but i'm just calling something different so you kind of see the two right um Then the next thing is going to be folder. I wanted to save it under the same folder that we pulled the file from. If you wanted to go into a different folder, then you got to specify where in SharePoint you wanted to go to. But in our case, same folder, folder name. So keep in mind this folder name, it's actually coming from up here when it's one of our arguments folder name. And then the last thing is going to be, um, output object, right? What you're going to be this, but, but keep in mind, we have to read the context of it as well. So we're going to call this read. Method. That's that's part of this object. So it'd be the. object output dot read so this actually reads the data now it's able to grab all the context and everything and that's what's gonna get pushed so by us calling this function that we call upload file and then ultimately again it's passed everything into our SharePoint upload file class which is over here and this will know where to save it to you know to me past the context and do all of that I like to separate my stuff by functions which is why I could have easily added this to here but again it gets a little bit more tricky because the purpose of this function is to modify the file it wasn't designed to modify and upload a SharePoint nevertheless though that just the way you know how i've been i've been i've been the more that i have been trying to clean up my code and keep them separated and more manageable you know that just kind of the way i have done it this is it guys this is it like um oh my bad this is not it so see we're having this get filed function well we need to call this modify file function down here so this is now where we're going to call this it's going to be come on modify files then we're going to pass in our file object and argument and this is it as we pass this file context over to this function this function now I'm gonna take it you know boom now we have it where we'll be able to read it the reason why reading this is to be able to get the sheets name so that's kind of that's the part this is the only purpose why I'm doing this to be able to get the sheet's name that's it no other reason i get in the form of a list then this one this is actually going to read that file and put it to data frame that's what i'm doing here then of course i apply my changes whatever it is that i'm doing right whether it be modification adding whatever it is you do it you know i did it here then once once i finish with that then now i'm going to create another empty object that's called output obj that's it's um it's going to be in memory object but it's empty then that's where i start adding stuff to it which i you know using the excel writer then uh and then ultimately i start writing to it which is to excel starts writing to this object and then you know then i'll save it and then ultimately yeah that's it i'm done and then once i send this to my upload file my upload file function which ultimately all it's doing is passing this information over to my sharepoint upload file which is this function over here it's able to um save it under this name this is a folder pass where i want to save it to you then it's going to read the context that we have in memory so again everything i'm doing here is all in memory downloading the file it's in memory modifying it put it back in memory and then it's sending it back to sharepoint so now let's look at the file that we're going to be testing out so if i look at my terminal so if i look at my uh you know what you can see it there it goes so it's python and i'm calling this this file my first argument here is in sharepoint so let's take a look at it right in sharepoint so if we go to sharepoint we have data so documents data 2022 cells oops cells and then it's going to be this file right here i know the name says csv it's not a csv you could tell it's an excel file so if i click on it let's just take a look at it this is what we're dealing with look at our tab you know we have our tab name here and then we have our id column title name so on and so on right pretty straightforward and then what should happen when we run the process should save a new file in here under What name did we call it? Testing Modify Excel. This is a new name that it should show up if things work successfully. And it's going back. As you can tell, this is the second argument is when I'm specifying the specific file that I want to process. And then none is because if we look at argument three, it's saying patterns. Well, I'm not looking for files and different patterns. Now keep in mind what I have here for my code is more designed to modify one file specifically and one tab specifically. There's a lot of different scenarios where it could be completely different. You know, maybe you want to modify a worksheet. a workbook that has many sheets and maybe the sheet that you want to modify is sheet number 10. that would be slightly different maybe you're dealing with a scenario where you have 50 files in sharepoint you actually want to download all 50 and you want to be able to modify all 50 you know all 50 you're doing something to it you want to be able to modify and push all of those back again that would actually be slightly different as well so i'm trying to give at least the core the basic concepts of it but i am going to make some videos related to those two scenarios that we just talked about so there will be videos coming out soon related to that so now let's go ahead and test it out guys if i go ahead and run this code cross fingers hopefully no errors uh boom no errors everything looks good let's come to sharepoint if i hit refresh boom we got a new file testing modify excel file So again let's take a look at the original file. Original file these are our columns right that we're dealing with and remember on the new one uh let me take a snippet just so we kind of compare the two right so if I take a snippet of this so now let's go ahead and take a look at the new one. So we take a look at it. It has the same column name. This one slightly has a little format to it because it kind of does it by default. It's bold, but you have ID, title, name, amount, sales amount. So this goes all the way to column J. Well, in our case, we have column K. the reason why we have a k column because this is what we call it the new column and we added some some value here right so this this kind of shows you how it does have the new column that we added which again in my case with a new column in your case it could be um maybe you're adding these two values together and you're creating a new column i mean it could be whatever it is you're trying to do maybe you're moving stuff around right maybe you're like saying okay well i want to move this column this reference column over and column b and you want to shift everything over like the list goes on or what you could do with it but they're just a quick example of kind of what i did so again guys hopefully this helps man this was a request i got from a viewer um again guys any kind of requests you have send it over I do I do get a lot of requests keep in mind you have you get a lot of requests but I am gonna be put in effort every week to try to knock out a few these requests that I get in so It may take a little time. It just depends. But nevertheless, I do appreciate the support that y'all be giving me. I do appreciate all of that, guys. Hopefully, this helps. Again, and if you have any questions, man, just go ahead and leave it in the comments or email me at contactiamlew.net, and I'll try to get back to you as soon as I can. Again, guys, I appreciate all the support and all the love I've been getting, and I'll talk to y'all later. Peace.