Transcript for:
Downloading ERA5 Data and Creating Time Series

hello there welcome to my youtube channel so today i'm going to show you how to download ec mwf data basically this is called era 5 data so we'll download this data and i'll show you how to create time series from this net city of files because one of our youtube follower requested for this one so i'm pretty new on this data set so let's try together how can i download and create time series so basically era5 provides hourly estimates of a large number of atmospheric land and oceanic climate variables this data covered the earth on a 30 kilometer greed and resolved atmosphere using 137 levels from the surface off to a height of 80 kilometer so era 5 includes information about uncertainties for all variables at reduced spatial and temporal resolution so we are going to download this data at hourly interval and you can see here if you just search on google era 5 e er a5 basically it's the fifth generation era era5 is the fifth generation of ec mwf er era5 okay er a5 then you can click there you can click there or you can click there whatever it is and see if you click there on the right side download erf5 from copernicus and then it will be there okay and from here we'll select you can select any you can select from here because these are the real analysis data set and we're gonna download the hourly data okay so here you are five hourly data on pressure level from 1979 to present so we'll have different pressure level from one aspa to 1000 asp and we'll have different variables so we can even download from here okay so let's try i think it's a pretty little bit pretty good to even try this way okay so i'm clicking on this one and let's see here you have the description you have everything there okay if you click on download data you need some processing you can see you have the options the product type reanalysis or ensembl members ensemble mean and ensemble spread you can click all of them but you won't be able to download all this thing because they have a limit to download so here you can select any variable you want divergence geospatial anything okay so velocity and you component of velocity v component of velocity verticity everything relative humidity sometimes we need that with the right relative humidity or we we need to calculate that one in order to calculate the if of transpiration so for this case i'm just making it simple i'm just showing how to download this one so i'm clicking on temperature and then you have the pressure level from one aspa to 1000 so for this case i'm just doing only for three because otherwise the data total number of data will be more than 120k so then we won't be able to download that one so i'm selecting this thing and you have the options right the year i'm just selecting the year of 2020 or you can select any other year as well or you can select even after the latest one okay 20 and 2020 and 2021st because it is going to give us up to present day after today hopefully i don't know and then for all month and all day and all time because it is hourly and then whole variable okay you can select the entire area the data extension or you can even specify any sub region for usa okay i'm gonna select this region like uh okay east minus okay minus uh what will be the 20 okay minus 20 to minus 1 0 8 and for north and south it will be like uh 60 okay and then that should be uh 20 that will cover usa okay i'll show you how to show the plot or extent and then you have the option format and hcd i'm gonna use a net cdf format even though it is experimental but that format is pretty easy to extract and once you're done see you can even submit the form if you select any other variable or more year and see i'm gonna show you still it's working and if i select even more year here 2008 still it's working and okay and let's see if i can select any other year here 2017 say let's see it's not working because it is mentioning we have more than this number of okay 120k more than so we have to be careful about that one see now you can submit the form once you are going to submit so it's going to give you this option because previously i tried that one and you need to create account there see since i already logged in and then they're just processing this one and they're showing their status so it is enqueued in a short moment it will show you in progress then it will take time to even show the download see the last one it took like 22 minutes to show me download and if i click there it's like 54.7 gigabyte so i have already downloaded this data okay i've already downloaded this data and i put it there let me uh show you where i have this data era five probably it should be there er5 here okay i have this data temperature erf5 net cdf file is basically the five gigabyte and i'm going to create time series for these stations let me open the stations here i have like 100 station from 0 to 99 and these are the stations i have okay and these are the location of the stations longitude and latitude and if i just want to show you using my gis and here you can see the location there basically i'm gonna extract the answer is for this station can you see that here if i zoom in is this locating along the coast of north carolina usa see so i'll extract the temperature time series for this on red stations right so how can i do that we have our data ready here near cdf and i'm going to pass this csv file at the list of station with their locations and then i'm going to store everything there within this time series or i can delete this folder automatically i'll create that okay so in order to do that i'm going to use python here you can use any programming language but the logic will be same you can use it in matlab so here usually i use spider but since i'm writing the code for the first time because i have similar code previously so let's copy and modify everything there and then i'll show you how to do that and you can follow this step so that's why i'm going to use a jupyter notebook here because i can even run cell by cell and i can see what is happening there then i'll just make a complete a package of this code and then i'll run it okay so let's try together so here i'm going to open here i already mentioned this one we'll say try it if it's going to work or not so what you need i'm just importing operating system comparing the system and then importing the projection if you need it these are the module you need to defend process this data but most important module you need net cdf4 and you need to import it as data set because this is the method we're use okay we're gonna use and since i'm gonna even do some processing using numpy array so i'm importing as numpy and i'll do some processing as data frame so i'm going to import that's pandas as data from pd okay and then i'll delete time that time so i'm importing this dead time as well okay and i also need to import directly the the time because i'll deal with that one as well so these are the thing we need to import and i'm this is optional okay the warning because if there is something wrong it will give us some warning sign signal or warning message i'm gonna just suppress this so this is the option i'm gonna ignore this one so first i'm gonna read the data and i'm gonna see what are the variables there okay so here i'm just starting from the beginning i'm clearing everything restarting the kernel and here okay for reading in hcd file you have to use that method data set because net cdf4 or even you can use net cdf4 dot data set then it will even if you import near cdf okay if you import you just import net cdf4 that would be also okay not a problem okay you can import that way then what you have to do from net cdf4 you have to import and then dot data set then you can read it so in order to read any cdf file you have to specify the location there if it is not in the current directory so i'm specifying the location where i have the temperature era a5 dot nc data nc file and then i'm reading as right since since i'm reading so i'm putting it as r otherwise you need to provide as w if you want to even write something so here it is if i just run it again and then see we have this object data object and if i want to see the variable so this is the command data because since it is object right it is an object the data.variables.keys it's a function that will give you the variables here so these are the variables we have longitude latitude level and exp ver these are the experimental variable and then we have time and t is the temperature variable okay so we'll extract and let's see what we have okay inside the time we have to know that right so that is the option we can even check the variable time that's why you need to know the variable okay you need to know the variable if i print and see this is the time and it has the unit hours in 1990 and first of january right and we have the share 15 000 0 right 15 070 time instead and these are the hourly time steps and if you want to see the time the values basically i can't show all of them but i'm going to show you only 10 first 10 then it's going to give you the first 10 values of the time see these are the times this time means this is the hour from 1900 first of january of 1900 but we have to specify because our data is starting from the time right so that is the hour referencing from 1900 so we have to create that date we have to find that so that's how we have to write the code in a way that will create our starting date based on the reference that 1900 first of january right so we'll do that shortly and for temperature let's check that the variable we want our expected variable it has this right the sql factor doesn't matter so see the unit is in kelvin it is temperature at temperature and it has this shape right first will be the time here you can see time and then you have to pass that experimental variable and then we have to pass the level you have to pass this latitude and longitude then you'll get this will be the values or index then you'll get the values of temperature right so let's see what we have for this one exp here so this is basically exp here it is dimensionless and we have the shape two and what are the values of this two shape right so you have to pass this one means colon means all of these values i i'm gonna display so we have only one and five right we have two values of this and what about the level okay level let's see for the label because these are the pressure level i have one 500 and 1000 and it is going to show you this one we have three pressure levels right so if you want to even export or want to display the temperature data for one time instead so you have to pass the time or index of the time if you want to display the data for first time stave or the beginning we don't know when it is going to start so the index should be zero and for this first variable this should be even zero because python is zero based indexing system and for level if you want to extract the data for one asp you have to pass one if you wanna extract we have to pass to zero if you want to extract this data for 5000 you have to pass the level of one and for 1000 it will be index will be two and then you have to pass the let us let it shoot along to the index of this one so we'll find those index based on our location i'll show you how to do that but these are the variables at first when you have your data in inaudible file you have to explore the variables their names their shapes and dimensions and their long and short descriptions along with their units okay then we can even go for our data and here you can even export the unit of temperature that is the command data dot variables and t and dot unit this will give you the unit of this data scalping and if you want to see the unit of time right it will give you the unit of time that one so we'll use this to generate our reference time and then we'll use that number this number this number of hour is from 1900 this year so we'll find that date based on that number from this year so how can you do that i'm just testing my code here i'm importing the date time here see the reference date so that is the function we have in that time to date time dot if i specify the date time the time.that time and then i'm specifying the year as integer see integer year so unit i already extracted the unit right so now i have this string so i wanna get these values so i have to provide the index limit of the index so it is zero one two three four five six seven eight nine ten eleven and first one is the twelve so that's why it's twelve twelve thirteen fourteen fifteen if you want to export this one so we have to provide up to 16. so that's what is up to 16. i'm making it because it is string i'm making it integer how can you understand that it is string so print or just type type will be unit okay so it's going to give you a string see so that's it from a string i'm going to convert it to number means integer and then i'm passing that month that one is the month so that's the 17 to 19 if you calculate that one and the last one is the day right this one so that is 20 to 22. so then i'll get that one so here i'm creating the the time for the reference from this unit okay if i run this one it will give you the same thing right the same thing we got but this time it is not even the string see previously it was string right the unit or the string and now what is the type what is the type of the reference date let's see it's gonna date time dot that time so now we have this string it is not a string it is as the time object so we'll use this one okay we'll make the time series and if i wanna see what is the starting time from this value right this is the hour from 1900 so what is the date time if i use this one how can i get it if i export all this times here right this is the time and if i use colon i'll export all the time values right all these time values fifteen thousands here here all this time there is fifteen thousand two 15070 really right so let's see i'm going to export that one so it's done and then if i want to use the first value of time okay first the value of time so how can i export that one if i use the times okay the times times and if i use zero that will give you the first value see the first variable how can i get the last one yeah i can get it by minus one it will give you the last value so this one so that is the starting time so we don't need that one we need our starting time and we need basically we need to look through the times and we'll create the time right so we can easily do that it doesn't matter so i have this thing ready i was working on different things forget about that one okay so that's that and if i wanna even really wanna do that i have to create that starting time so that's why we need to create this way see here starting date i'm going to create the starting date from reference it i'm adding the reference date and then that time then time delta i'm passing as hour and the first value of the time then it will give us the starting date let's see what is the starting date of our data set so that's the starting date first of january 2020 okay and then since i know my starting date now i can even create the time and based on the time i can even extract the temperature value or any variable from this net cdf file okay that's the trick i just tested this one these are the uh i'll use okay if you use time delta and if you pass this time as hour then you have to specify hour and if you have day you have to specify days is equal to one two three whatever it is if you wanna know the date right from today or from any specific reference date for after 30 days you have to specify days equal to 30 if you specify now and then it will give you the today's date and time and if you want to know the date after 30 days and just use this formula today class day time the time delta day is equal to 3 then it will give you 3 days 30 days whatever it is whatever the number of days you have it will give you the date for this one so that's that so now it's time to even create okay now it's time to create our data so by this time we know right everything we know what is our data we know everything so i'm just cleaning up everything and here i'm gonna start here i'm gonna modify everything here see okay let me start a clean okay here so what i need at first i'm just copying the thing i have because it may not extract everything because of the memory issue okay you may not then i'll use my linux machine so here okay i'm just explaining to you what i'm gonna do here first see i'm exporting this thing i already told you and then the output directory in the same folder i have here in the sum folder can you see that yes in the same folder as well okay okay so in the same folder right okay because i have other processes running and i'm checking the status at the same time okay so i have this a location here and inside this folder i'm going to create that time series okay that time series as temperature underscore ts so it will check if this directory exists currently it is not so then it will create okay that is the trick i apply and then i have the station there right this validation station sensor pressure level there i have this station there since the locations is there i'm specifying this one there as well okay see it is in progress i'm just giving you the update that it is in progress okay so i'm reading the file i'm putting the first column as index and then i have the station so let's see what we have i'm going to print see i have this station so now what i have to do i have to loop through this station one by one and i'll create a csv time name or cl csv file name based on this station okay so in order to read this or in order to look through this row by row we can write our order this way for key comma value in and the station df i have this is the data frame okay and then dot iter rows item rows and then you have to use this is the function and if you print see if you print the key it will give you the stations see it is going to give you the version of the index okay and if you want to see the value because the file i have station the first one is giving this one because that is the index first column so for key key is the index and if you want to see the station the index will be zero and if you want to see the value of longitude it is one and then a value it is two see it's gonna give you this one everything see that's that okay this printing and then this is how we need to loop through each of these stations and then we'll create the file name i'll even export this one as well in order to create any station okay so the station station will be value and zero okay and if i want to create the file name file underscore name so this file name will be i'm gonna use my formatted string there this way i will use formatted string okay so dot format and then station station and then if i pass dot csb right i'll have the file name okay so i'm just creating the file name first and then i'll save the data i'll create the time series and i'll save it i don't need to print all of them so i'm putting it that way here okay it will print the file name i don't need to print even that main file so here it is see it's going to give you this file name for each of this station so we'll save our data our time series data using this name okay so that is pretty convenient right definitely why not so let's do that so what next once we are done with our file name once we are done with everything so we have to use our location right so we can use our lat first one is the longitude right already know that first is longitude if i use one so long long point because it is a location of our station long point will be a value one and then the left point okay that point will be valley two now we have our latitude we have our longitude we have everything right everything is ready then what next once we are done with latitude and longitude so we have to ex we have to export our data right and we can even export this data this way the variables we need what variables we need we need our latitude we need our longitude we need our time everything right so i'm gonna just copy this line and um when i'm reading then i have to read our net cdf file see here is the line where i'm reading our net cdf file so i'm exporting all these times okay this way and if i wanna even export the latitude i have to use this one variables and then you have to pass latitude okay let me check the name of the variable it should be the exact name that's why you need to even it's better to copy the latitude other than just writing otherwise it will create problems done and i'm copying this style and longitude okay long and then i'm gonna copy the name of the variable this time i do i just copy the name so that there won't be any problem so here's the longitude so i have all this latitude i have all this longitude everything once everything is ready then we can even use our main code okay we have and then what we can do if we can even export the unit okay the unit we need the unit because it will be necessary to use the unit we'll use that unit and here i'm gonna use the unit this is the unit of time okay that is the unit of time and i'm gonna even use the unit of our temperature that is necessary and i'm gonna give this as t okay the unit of temperature is unity and unit of time is unit so because we are going to use that one as well so now it's time to even look through this one so i'm done with this step okay i have the station name i have the point for my station this step is done now what i have to do i have to create because now i know the location of my station right this is the latitude and longitude so i have to find the closest point from our net stereo file using this latitude and longitude how can i do that so i can make square i can make a difference between this all of this latitude and from my latitude and in between all of this longitude from that net city file and because i have to use column there then it will be all of this okay this way so if it is done then i can do that right so i have all this latitude long issue how can i do that i have sample code here i tried it previously so i can use usually what i do i use this square difference and i i'm gonna just copy this after this one then i i'm gonna explain it to you here it is these are my previous code i use the same notation so a square difference so i'm keeping another i'm making another basically array called square difference latitude then the latitude of that latitude from my cdf file and my lat point here it will make the difference and i'll square it because there will be some negative positive values so the square difference will be the value and then for longitude same it will do so now what i'm going to do i'm going to find the location okay index of the minimum now i have the square difference so the my point this location of the station that will be close to any of the data point i have so that's why i need to know the index right so this array dot arg mean function will give you the index of the minimum value between this difference so that is pretty and if you want to even extract the maximum index of the maximum value you can use arg max function that will give you the index of the maximum value so it will give you the index of the minimum value between this difference so the why we are gonna get the index of the minimum value because the point if i take the difference the point close to my point right because we don't have we may not have the value of the temperature for my point i'm gonna you can export right i'm gonna like this way we have this because these data are like graded data this way right these data graded data this way so i have a point for example here so i'm going to make a square difference in between this point this corner and my point this point and my point this point and my point this point and my point then i'm making it square right then i'm creating give me see this the this distance between these two point will be minimum so i'm considering this point as my point to extract this value because that data set may not have value for my point if my point is on that point on that corner on that grid then definitely it's perfect otherwise we are just considering this one okay so that is the idea we have the gridded data i'm gonna create uh select the closest point as my point so that is the trick i'm applying here okay so when i'll get the wait a minute so when i'll get the index of the minimum value that means i can use that index to export the data right i need how can i do that so that is the option and before importing that i need to create the time okay when i'm done with that one i'll use this thing here see it's another way to so now i'm gonna generate okay this line so i'm gonna generate the time series for that station for each station because i'm looking through is of this station then right i'm getting the location of the station i'm making square difference between that station location and all this latest longitude i'm getting the index of the minimum value in the closest point on the grid then see i have the reference time because i use the unit same thing i'm using here and then the data range i'm creating an empty list and the temperature data that i i'm going to extract since i have the times right i already exported all times here i have 15 070 values of hour from our reference time so i'm looping through each of these and i'm making the time dead time okay here that time so the date time will be the reference date from the reference that we already know referenced it here right 1990 first of january and then the time time delta and hours equal to integer of the first value of the time because when i'm using this enumerate function it will give you the index and the value the time is the value of the first value of the array right the times the number of hour so i'm adding it so it will give you the date and i'm just adding this to the empty list that is the command m append and i'm just appending these values for the temperature data i'm exporting this the temp i have to use this value here temp okay it's time you need or you can even i need to export my amp variable alt temp all temp without unit if i use colon it will give you everything okay so i have the temp so i'm using that temp because it needs this five dimensional data so it needs the index first index will be the time so here the first time index should be the zero and then i'm giving the exp here it should be zero index first value and i'm exporting the temperature for you know five three levels of pressure zero is pa five hundred and one thousand so i have zero one and two so i'm gonna export the surface temperature that is the 1000 spa and then minimum index of the latitude and minimum index of the longitude then it will give us the value for that first time instead first of january 2020 and then first hour because we have data for 20 of four hour right hourly data so it will add there so it will create empty it will fill up it will populate this empty temperature data and at the same time it will populate the data runs or the date trends right for our first time period so that is the trick and when i'm done with that one let me just quickly uh show you it will print i'm going to show you before saving it to net city file the debt okay that runs let me uh print it if it is cover i don't know maybe it it's just gonna show you the out of memory because it is gonna do all this processing and it needs a lot of memory because it is huge data set we have so it's reading everything and it's doing everything possible okay so when we are done with that one theoretically so what we are gonna do we are gonna save okay we are gonna save our since we have our date range we are going to create a data frame empty data frame from this one okay i'm going to give the name that datetime and i'm just making this datetime as index so that is the option that time.index it will give you the index and then i'm adding when it will populate right this temperature data and i created the date time from this date range and then i'm adding it to that df and i'm giving the temp and then that's why i'm using the temp and then the unit will be there okay the column name will be temp and inside this parenthesis it will give you the unit of this temperature data that i used here unit t okay this unit t that will give us k kelvin and then that is the command df.csv and i'm saving it to my output directory that's why at very first moment i specified this out d right there if the directory exists then it oncreate otherwise it will create so that's that so we are done with this one okay so it will save that and if you want to even plot if you want to plot this data we can do that because i have similar code i can even show you this one as well usually when i export any data i even plot it say here right i'm just uh plotting this one title temperature atmospheric temperature at that station that will be heading and then i'm using figure and for that one you need to even export that matte plot leave as high plot okay you have to even import that one as well then it will plot and if you want to even see that plot leave in line or okay we don't need that one okay we don't need that one as well so it's taking too much time maybe it's gonna show you out of memory or something like that because it needs a lot of memory and calculation it's gonna do this thing for you okay so it is gonna save everything there and if not see it's created already that output file and it's gonna it's empty because i even didn't run this file for this one okay and i'm uh because if i run this code in linux machine that's why i put it there it will just show you the plot for two second and it will then close automatically then it will go for the next one and these are the normal plot if you already know right i'm plotting i'm just declaring that figure size this one a4 size and then i'm just declaring the title here is the title and then i'm making this x level time zone and then the y level with the temperature and then the unit and uh specifying this tick param the size and you can change it okay so it's going to show you this error maybe honorable c unable to allocate 21.8 gigs so that's why i'm going to run this same code in my linux machine i'm going to show you that one so then here is the code i have okay so same code i have here but for linux machine i have to specify the output directory as this one and here is the data i have see i stored this data there era 5 let me era 5 temperature data here see same thing i'm gonna delete that one so i have this data i have this location of my file okay same file it's the same one and then that is the code specifying and data said i'm reading everything is same everything is mc there is nothing different everything is same and then it will show you the plot and it will save okay let me just run this code then here in order to run it okay since it is python you need this one and er a5 maybe e r okay that is the name of the script i'm gonna run and it will show you the temperature plot and everything at the same time so this is how we can even export okay let me show you or i'm gonna even if it's gonna work then definitely yeah it should work it's reading it's running maybe i don't have processor empty let's check if it's going to run or not definitely it's going to run otherwise it will show you the same error because it should run because i run this code for even more than this data set this memory because this pc hpc i'm running this on my ace pc and this hpc has 256 gigabyte of ram and 64 processor so it should definitely run there is no alternative let's check if it's really gonna run or not it should run okay see it's generating okay it's generating and see it's okay it's creating the time let me just switch my window okay let me switch my window and i'm going to share it again so on that window see what it's going to create so that is the plot it's going to create see every time it's plotting then it's creating that way can you see that definitely this is the temperature for each of this file it is creating and it's showing this message that's why i usually extract and plot okay i'm gonna switch again because i have three different monitors so i'm processing this one and it's plotting there and if i can switch back to my previous monitor c still is running it's plotting and showing and what it is creating there uh let me show you the output directory er a5 right it should be temperature there see it's gonna increase this number there see one by one see one by one it's saving all the csv file and plotting so this is how and i can show you what it is generating as see this is the time series we have from first of january and temperature kelvin and that's it see it's huge because we have the hourly data we have fifteen thousand and seventy see fifteen thousand and 070 since it is 71 we just started from the header that's why we have this one and we can plot it here as well it will give you the same okay if i want to plot it there as well if i want to plot here it will give the same type of plot see here it is here it is it is going to give you this one but that plot is not that good right but it's similar to that one okay so that's it so this is how we can even download that encwf data okay ecmf what is the name of this one ecmwf this is the era 5 it's basically the fifth version of this one and then you can use python code the code the similar code i just demonstrated here you can use this one to extract time series or any station you want within your data range you can do that okay and if i want to even plot okay i want to plot the extend special extent of the data let me show you that one quickly then you'll be able to understand what i'm gonna uh told you the spatial okay the spatial location see i'm gonna show you this one the data i have here so i'm gonna plot the temperature so it's going to show the aerial extent specialist efficiency it's going to cover right it's covering the entire usa part of the usa and you can see that see from this i'm just exporting this data from this location because this is this is the north carolina here this location okay so that's that if you want to download and export this era 5 data you can do this way and that's it so i'm going to finish up here and if you like it or if you have any query you can ask i will try to answer your question otherwise you are done here okay thank you very much for watching and thank you very much for supporting okay bye