Fourth Session of Summer Internship Program Notes

hello good morning good afternoon and good evening on behalf of petro and Military search Hub I'd like to welcome you all in our fourth session of the summer internship program my name is Mayor Tara a senior Gaz and petrochemicals engineering student SP Egypt young professionals member and I'll be your moderator for today so before we start please keep the chat box professional and ethical and don't forget to drop your questions in the Q&A section below so without further Ado let us welcome Dr Ramy Alou Dr Ramy Alou is the technical product owner of Upstream data at SNP Global Community Insight in the United State he brings extensive experience in Reservoir engineering digitalization research in data science to his role perod to joining SNP Global Dr Alou was a research assistant at the University of Wyoming Center of innovation for flow through porest media where he earned his PhD and was awarded his patent for research digitalization and automation Dr Ramy research focused on fluid phas Behavior at the nanoc scale in unconventional reservoirs and the application of machine learning and artificial intelligence within the oil and within the oil and gas indust industry Dr Alou also holds a master's degree from King fat University of petroleum and minerals in Saudi Arabia during this time he served as a consultant at Saudi aram's Reservoir characterization Division and as a teing assistant at King F's College of petroleum engineering and GE Sciences Dr Romy earlier career includes roles as a consultant assistant at the Egyptian petroleum consultancy company and the teaching assistant at the American University of Kyon following his Bachelor of Science degree in petroleum engineering from sus University in Egypt a recognized expert in his field Dr Alou holds machine learning engineer and data analyst certification from leading companies like Amazon Microsoft and Google he has an impressive portfolio of projects and publications addressing energy related challenges through machine learning and data analysis techniques thank you so much Dr Romy for coming today and the mic is yours thank you thank you so much may um is the sound clear thank you yeah very good Al righty um hello everyone um again thank you for the introduction my name is uh is Rami and I will be um chatting with you about machine learning um I actually was expecting a much smaller class usually uh my my lectures are a little bit more handson and practical and we still will we still do that um I'll just try maybe to go a little bit slower um and maybe with the next session you you could be ready with like with with your computer with with your laptop if you can um and maybe try to follow through or really um get this feel of hands on on um my belief when it comes to machine learning is is usually is that it's um it cannot just be towed like you can have hundreds or thousands of lectures and looking at slides but it's really it's about getting your hands dirty as we call it like getting um hands-on experience is what really matters what makes the real difference and that's why I encourage you to do um and what I'm sharing with this um with this course um is not a typical machine learning like we we we go through the definitions and the stuff no um that's why even I changed the course Title Here I think you you it was shared with you like um as machine learning um applications in oil and gas industry um and this this can be like one slide I already have it but it turned out to be just one slide we can keep talking about how and where you can use machine learning in the oil and gas industry but uh um um again I'm I'm I'm trying this with you this uh this course I thought it would be much more much better if we um can have it more practical like really do something with machine learning in the oil and gas industry um and maybe earn some money while doing so you'll understand later um uh we will be um exploring a public challenge so it will be less slides and more handson coding um looking at the data exploring the data and so on so um and this is what we should be doing in in in this regard we should be trying and getting stuff done rather than really learning about what could be done um and this what we'll try to do uh this course again with with the current setup with the with with the webinar it it will not be as interactive as I was planning to but um again we let's plan to make it work let's utilize um the chat like if if it's okay with you we might have like a little bit more frequent stops with the questions rather than at the end if that if that works with your style um and see see how it goes like we will um will be dynamic and this actually the main word I have when it comes to the the next slide it's Dynamic so for me the course is all about getting benefit you like you learn something new um and um I believe this is through a little bit bit more practical approach because you can search you can ask CH gbt what can I do with machine learning um in the oil in the oil and gas industry and the list is very large like there is so many things you you can find the titles about but what can be a practical way of approaching this is what I'm I'm looking to share with you today um so as as a general approach to to this course we will have a very generic overview of ml or machine learning which um I assume you would already have some um experience with already aware of um and then again this one slide that I told you about which is how can utilize the oil and gas um um sorry utilize the machine learning in the oil and gas industry um and again based on your feedback today this will determine the rest of the course of this we still have three sessions to go so based on your feedback from today um we might pivot to really be it more a standard like we can really dive deeper and share some case studies of how machine learning is being utilized in the early gas industry rather than the my original plan of having the hands on so please share your feedback um uh maybe like the other strategy we can Biv it I'm very flexible so we can Biv it if if the current course setup will be more beneficial to you um to have um maybe like you understand and study how the machine learning is being used in the island gas industry rather than being handson and trying to do it ourselves so um we cannot do both unfortunately like the time is pretty tight so um we either explore all the ideas and all the ways that ml is being used in this domain or we pick one uh uh example and we we work on it which is my current plan but again I'm flexible and we can Biv it um and then this handson that we'll do today would be through the public challenges so um there are multiple platforms that puts out data challenges machine learning challenges some of them very specific to the oil and gas industry and you get money actually like you can win a prize if you sold the challenge and this is what we will do we will not solve that challenge like it's it has been going for 10 weeks now but we will go through it like introduce you to the concept which I believe is very very valuable and maybe try to make one or two submissions together just to show you the process from a to z um there's still three weeks remaining the challenge that I'm I'm planning to work with so um it it might be uh not too late for some of you to get your hands on if we um if we decide to go this route um and then again like as I said the hands on which is my current plan and the WRA up um I see a question here uh from R which language programs coding should AUM engineering learn to help him during his work and to get better chance to be hired very good question um um the the def facto we will go through this um the def facto this is um python is is the def facto like that the standard but I have like um I have a very I would say very particular um opinion that it's not about coding anymore and like this is changed just last year right and if you hear like the CEO of um Nadia for example he's like don't learn the code um it's it's um you don't really need that skill and I wouldn't say I 100% agree with him uh but I would say don't make coding um is a blocker like we can we can solve the whole like we can submit a whole challenge as you will see without without really knowing to code we will just have chat gity or have um Google Gemini which is embedded in in in Google collab and and have the the code being written for us having the knowledge having the coding which will be definitely helpful but it's not you don't need to spend three years learning to code before getting into the machine learning no this is this is no longer the case you can really get much like get into the coding get into the machine learning applications and really have some results um available to you much faster than you think um um I I I see another question it's a little bit long so uh let's uh let's defer this a little bit it will be I don't want to keep everybody we'll have a break and then go through the kind of longer questions so again let's start with this quick overview about uh machine learning um and the the the different terminologies that we have these days between machine learning and um artificial intelligence so on so artificial intelligence like the general umbrella right it's having um the computer do something basically that's it like you you give it um a command and it can oby the command so this is the general uh umbrella uh we have and then we have machine learning which is very specific this is where we are at which you have a specific algorithm that can uh have performance Improvement and this is the key part that as a machine learning you have the machine to learn something it doesn't know something and then it becomes knowledgeable of this thing and then what happens in between is what we call is the training is to give it data to learn from so previously you would have very specific instructions please do this and do this and do this um do this if if this is the case do this otherwise do that you have these very specific instructions and you don't always know what you need the machine to do so that's where um the difference come like now nowadays we um you have a very complicated problem you don't really know what to tell the machine so the idea is that you keep the machine itself learns the machine will be the teacher how by seeing huge amount of data so it see a lot and a lot and a lot of data amounts of data that human brain cannot absorb basically in in the in the given time so the machine will be able um or the algorithm will be able to absorb huge amounts of data um and figure out and deduce the patterns um that a human wouldn't see or wouldn't see in a reasonable amount of time maybe if you look at the data for 10 years you figure out the pattern but that's not practical um and then the Deep learning is a subset of machine learning um which we have um like the multi-layer so this is the more complicated uh form where we have at at a certain limit the data is not enough so we keep adding more data and adding more data um and the more data we add the more the machine learning becomes um knowledgeable and the Deep learning it has um multiple layers of learning if if you put it in simple terms and then finally today we have a new subset actually it's a generative AI or the Gen AI where the machine learns so much that it can actually produce start producing um output based on all what it learns it's no no longer a very small specific task but now it's actually very very knowledgeable of this topic that it can start creating content um for this topic basically um and generating content uh based on that so we will not go to the to the to the generative AI um um in this course uh but again having this this mind map or and understanding this distinction um is very important um so again this is the same overview but in terms of of times um as you can see uh the artificial intelligent concept have the machine learn or do something it's very old it's from the 50s and fun fact a lot of the algorithms that we use today that empow sh gbt and other llms and other uh machine learning models are actually that old they are 50 60 70 years old so why we didn't have this breakthrough in Technologies um until recently like we had we had basically two breakthroughs one in in like around 2012 you'll see in the next slide and then the the the recent one with a CH gbt a couple of years ago but still not even two years if you believe it um so again machine learning like artificial intelligence in general and then with with advancement in compute um advancement in amount of data available we we get to machine learning and then with the of course with the internet boom around the 2000 and more and more data available we uh we finally got into the the Deep learning and then now we are here at the two like 2022 and we are at the Gen AI era so um when when when we have this I wouldn't say history class like the Turing test again 1950s the the first concept of AI um and some of the boom times um all of this the the real re ution started here so what did we miss what did we have here that we didn't have here right why now this is um the main question and we have a lot of things if if you believe it or not the first thing we have powerful algorithm but still we had algorithms back then but then these two key factors is what really made this happen is the compute power meaning the the the availability of complicated Hardware like the GPU the CPUs the tpus um all this Hardware that's able to Crunch billions and billions of numbers and metrices in in in in no time that this didn't was not available or like we barely was making like graphical user interface like the first Mac or the first PC around this time uh so we didn't have that that the compute power and we didn't have the big data right we like there was like there was no internet still it was not Global so the amount of data we have digitalized um and in a digital form that a computer can read was was nothing like there was barely anything um out there um if you compare this to now it's all the difference in the world right now we um we have the huge amount of data with the internet we have the Google projects that digitize everything every single book so most of the books written in human history the whole human species knowledge is actually digital so you have every single book in a digital format you have every single um post all all the social media content we have the the social media boom which we have the user generated content like you the Reddit the Twitter the Facebook all of these that the user generate the the data and generate the content rather than just a select number of companies so all these factors um contributed to having huge um large amounts of data that's available for this algorithm to read to Crunch to um understand and to absorb uh and be able to learn from which is which is the core part of course again with a compute power with all U the hardware now if you if you don't think this is a big deal like the the the the idea of having a large amount of data this cannot be like overstated this is very very important um it's like open AI the maker of chbt it's funny that the challenge they are facing now is that they are running out of data the chbt read every everything that was ever written like read the whole internet like the the largest open data set now is at 44 p P of data um just text for 15 trillion tokens every token like let's assume every word is a token 15 trillion not 15 billion 15 trillion token and this is open source this is what everybody has not not open AI or Google so the amount of data that these models crunch like comprehend read and then be able to understand and figure out from and this is ially for the but still applicable to the machine learning is is huge it's very important this slide actually um is is like this a fun slide that really um shows this shows how much data uh we have this is from 2017 and this is from 20 um 2020 this is just for comparison this is the amount of data that gets generated every minute every one minute and if you read some of the numbers um you wouldn't like even understand this like from you like YouTube or Twitter if you look at let's look at Instagram we have like 46,000 photos every minute and this is 2017 if you look at Instagram again in 2022 we are talking 34 like 340,000 stories or user post every single minute not day not an hour even so I'm not sure if you are able to comprehend the amount of data we generate every day one day um let alone like weeks and month so the like this explosion of information this explosion of data really um allowed for for the the machine learning uh boom um and the recent um uh gen AI boom um as well uh when when it comes like um thank you saf for for for the question we will we really that was my next slide touching with that okay Ry we have we understand this like thank you we we really know there's huge amount of data there's huge amount of compute power now what like how does this apply to the petroleum industry how does it impact to the oil and the gas industry and this is where this course comes in right we we we have all of this knowledge this very general knowledge um uh and general data available but how can we apply this St and gas industry how do we understand what's going on um in the oil and gas um industry and how can we utilize this learn from this being ahead of the curve like sa far's question is specific um to to the Computing impact uh but um it's the same concept like the the big data and the Computing impact like the increase in compute power the increase in the amount of data all of this uh gives us the opportunity for much more powerful um machine learning algorithm and even traditional traditional models like now you can with the current compute power capacity you can have much more complicated like sismic processing four 4D cmic processing 3D reservoir uh modeling and simulations um running in much um shorter amount like shorter times or running at a higher resolution or finishing in Faster times so we can have multiple runs or a larger number of runs so all of this is is becoming possible um with with the advance in in computing power um and uh Computing amount available amount of data so we will we will go through every one of these like um a little bit explain what this um can be used to but I I would like to stop for for a minute um if if you have any questions so far uh please post them in the chat um and then we can continue I'll try to read the Dava question I hope I get your name correct uh in my thesis recently I'm doing evaluation um um economics will performance and will Intervention Program using vpa micro my mentor and my lecturer suggest me using SQL as I have already made a basic code in VP is it common in oil and gas industry using SQL as based on machine learning um data processing um so actually this this really um depends on what you have what what is the type of data I'm not really sure I understand um the amount of data and the nature of the data what what what parameters you have what attributes you have as we will learn but to to be specific to your question um SQL is is definitely itely the way to go if you have um if you have a lot of data so SQL is is a language um that helps you extract specific data from from a large data set you have so you have a database with with huge amount of data large amount of data so you would use SQL to extract the specific information that you need for your use case so they are not apple apple to Apple like we wouldn't compare V PA directly to SQL they are two different things actually uh the VBA is like to execute commands inside a Microsoft Office application specifically Excel or um or AIS or even word uh where SQL is used to actually extract information from large data sets so learn SQL and definitely if you already having a vpa micros and you're already utilizing the power of programming inside um a Microsoft Office application definitely keep doing this as well um uh lb or IB I'm not sure if I get the question like the name correct sorry how can we utilize ml to post uh production of reservoirs I'm not sure if you mean boost like B St like improve uh production of reservoirs um uh and if if if that's like I assume that's what what you meant um that the way um like we will we'll go do some like our challenge today or in this course will actually about utilizing production so the short answer is we will the more data we have the better we will able to understand the reservoir performance and then be able to um extract the the bottlenecks or What's um holding back our Reservoir model uh our holding back our production and then improve from there uh a guide on application of ml to seismic interpretation pysical evaluation and seismic conversion um okay um unfortunately this course will not be sismic because I'm I'm I'm I'm a pum engineer by training not not a pist um I'm not sure I think Dr I think there is another um training sessions will be ML and the cmic if I remember correctly uh uh but again um what we are trying to do here is actually acquiring the general knowledge you you acquire the base knowledge um the technical skills and the understanding which is the most important factor to all of this like you properly understand how machine Learning Works what you need to be able to capture um data so this will be um this this is the the important part um that you can uh utilize um all righty um let's um just have a simple fot and then we can continue not sure if we have one more question um is the data for ML needs to be provided by companies working in the oil and gas um that's a very good question Amir um and this is this is where um the challenges come from these kind of challenges that I EXP explain to you today is you have the data like the companies they sponsor this challenges they put the data out there you just have to download the data and keep working with it and this is why I feel this is very valuable and this is what we'll try to learn to do um during this course uh uh another question is it possible to use ml for uh peping stress analysis if yes how um I I'm I don't have a specific use case uh but let me let me take this to um to have it as a notes and I'll try to bring some examples for the next for the next session if I can find in use cases um for this but Ahmed thank you for uh for your questions uh do you uh you don't you mean the coding is not important for machine learning studies uh coding is important that's for sure it's not a blocker you don't say I want to wait for a year or two being be perfect in coding and then start the machine learning do not do that start with the machine learning even if you your Cod is skills are very very very very weak and improve and this is what we're trying to do in this cycle like like we will get into this challenge together with no coding knowledge we assume we don't know anything this is what I will be doing with you I assume that I know nothing I will be just searching Google asking a chbt live it's it's it's um I know like it might be weird to you but this is actually my intention from this course is to break this barrier like I want to wait until I know this or that to do machine learning I want to wait until I'm good at coding I want to wait the the message from this course and this is the core message do not wait start with what you have start with what you know giving I'll give you some tools to help you do exactly that just start um and then this will open up doors for you um Can ml be used by regulation authorities um to Aid in regulation the oil and gas Industries in their countries has it been done like somewhere in the world it can definitely be done especially with um I would say like the generous ative AI more than the machine learning in general uh because the idea of taking a specific text let's say regulations from like country X or even here in the States from from Louisiana to Texas so they have different regulations um um a generative model or a large language model in LM can take some of the inputs from one state revisit these regulations based on the state rules and rewrite them the same goes for like for policies this is really a good um a very good use case actually this is a type of use case that You' love to use machine learning for or specifically a generative AI for has it been used or not I don't have answer to that I took another note I will try to see if there is some cases to be honest if they even if they used ml they wouldn't disclose that at this point we still have this stigma if if you used machine learning or if you used chat gbt you are cheating and this is changing and this is needs to be changed it's exactly what happened I'm not sure like 20 years ago that if you have a calculator you are cheating um and then it became normal and everybody had a calculator in the exam this is happening again with machine learning like still the public opinion is not really settled and we still lack regulations we still lack policies around them but we will get there that you would you would say yes I used ml in this work but still my work still it's an original work so this is u a work in progress um another couple of questions here before we continue um the predictive maintenance ml does is it done using vibration analysis um there are multiple ways it's again it depends on um the type of data you have I I did work personally on a preventive maintenance project um and it didn't have actually vibration analysis what it had um it was for uh compression gases and what we had was the pressures of the G of the turbines temperatures of them um uh byeline pressure U multiple multiple other variables and after a lot of analysis we were able to get um to get very good prediction and a prediction model just relying on the temperature and pressure even though they are correlated but the the flow rate with the pressure and temperature between multiple BS was able to predict um a general failure not a specific failure to a specific unit but a general failure um and this was like with a very little actually analysis like and we we like we actually planning a phase two of a project to to Really dive deeper into more data uh what if there is not enough data can we use ml to build for more data very good question and the answer um is yes um um but It's Tricky like I have have to really really understand the data what we we call this data argentation um the idea is that yeah I have 50 data points but I need 250 So based on these 50 I can um I can predict a little bit more of data and I can change a little bit of this data add noise to this data because I know the data that I'm getting from my sensor is not perfect it already has some problems so the answer is yes but It's Tricky the more ground the truth data ground Ru data means data that I am positive it's correct it's from the field it's real data this is usually the best data to work with but you can augment the data to increase it of course to a certain extent um how do we know the right data we can use for machine learning uh you try basically this is um and this is like this is actually the the fun of it like there is no specific answer or there is no specific data set that you know yeah if I have this data I can do this no y you would be really really surprised and this is this is the cor of machine learning that I was just explaining that the the the code or the algorithm is able to extract um relationship extract information that we didn't see that we didn't think exist so with the naked eye you would say yeah how can the pressure and temperature reading predict that the pump will fail right may like if it's working at normal uh you have to have like Advanced thing like vibration analysis or something but no there is hidden information embedded in the data that we don't see that a machine learning model can see this is this is the beauty of it so don't be judge that I don't have good data for this I will not do this no whatever data you have you start to try maybe you fail maybe you succeed maybe you have a success to a certain extent but it's all good output maybe you is what data I need right you you so don't prejudge the data that you have is as being the right or wrong data for a specific model whatever you have try to play with it try to U model it and see how far you can go um if they are using techniques to tweak policies how can we conclude that the machine is correct on most cases we are just relying on the machine to get insight but how can we know that the insights it's giving us is correct this is um the correctness challenge for this is specifically for MML so it doesn't really apply to this course but for the llms for the large language models the idea of correctness is actually doable it's it's through what we call like power prompting so you ask the M like the the model to whatever information it has to go back and check this piece of information and make sure sure it can correlate to to a specific source and this is what Bing now if you use co-pilot with Microsoft Bing search for every statement if you hover on it it will give you this statement is from this specific Source um of course to a certain extent it's a challenge and this is like the top most challenge especially if you are following the news we had the Google overviews which using Ai and then starts to give wrong information asking people to eat rocks and put glow on the pizza and Google is is fighting that so this is a really like head of the curve and um like bleading Edge challenge but it is doable and there is um some legal cases I was just reading an article yesterday about the legal Solutions because in in in the legal field if you have like you go in front of a judge you cannot keep give fake information and they solved out the solution in the legal field um so it is a challenging but it is doable and it keeps improving every single day Are there specific site uh to extract real oil and gas field data this is what we explored it we'll have couple of websites which has some um oil and gas data for the challenges but also we can share some other public in sites for all over the world where they share um usually you go to the government sites like Norwegian here in States in in Canada a lot of these governments actually they share public data public oil and gas data production data will logs data so you can go and collect some of this data for yourselves but again my recommendation and this is the approach for this course is to actually uh start with the challenges and we will we'll explain what this mean in a bit um couple more questions again how do we know the right data we can use we answer this as you mentioned development of AI has been since 2012 uh before that how could um the huge petroleum company succeed um so we um when it comes to oil and gas and still till today we still rely on what we call analytical models like um equations you have its mathematical models it's physical physics based models and now this is what the industry is exploring that um because physic physics based model you solve all you solve the equations you solve the the partial differential equations for my masters I I was solving like fractional partial differential equations I had to write a program a whole Reservoir simulator to to implement these uh equations and solve them into my reservoir simulator so it's a lot of work and it takes a lot of time and it's on a very small scale so the industry now is exploring the the data driven modeling it doesn't it doesn't really care about the physics that much like the the the degree to which you should rely on the physics is is what being discussed like do you right now you are going 100% physics and mathematical equations that we solve um and going all the way data without like respecting the physics also has it some problem so we're trying to have this Middle Ground where we we are data driven but we still constrained um by the physical model itself so this is um this is the the approach that is trying um to to go through um if you are using ml technique to qu to Quick policies uh get inside conclude the insights I think we answer this um we depend on trial and error technique until we reach the closest answer that's exactly it SAA this is 100% 100% what is machine learning it's exactly how we learned everything the whole the whole machine learning is successful because it's trying to mimic the human brain like the most icated the most advanced um thing that we know um is the human brain this is still the mystery to us even with all the technology we have and this is the concept of the neuron like that's why it's called neural network because it's based on the on the brain uh neural idea and when it comes to the actual learning of the machine it's a again it's the human feedback the same uh reinforcement that we do what reinforcement uh very quickly is that you do something and then you get told is it right or wrong if you if you if you go back to your like childhood this is what we learned a lot of stuff like this is like I'm I'm trying to memorize the the ABCs right this is a b and the next time oh this is a d no it's a b okay it's a b and the next time it's a d no it's a b and the third time yeah it's a b yeah it's a b right so I I guess something and then I have a tutor I have my parents told me that's correct and that's that's not and I do it again and I do it again and I do it again until I really know this is a B for effect because I have been I tried multiple times to guess it and sometimes I got it wrong until I was um I I really know it correctly so it's it's exactly that um and we will see this through code like we'll try to do this um handson how could you think that machine learning can be applied to improve um exploration um and production and again this is very very generic the best idea to to to think about it is what a specific problem what specific challenge do I have and how can machine learning help me this is the best way to think about it like for exploration what kind of exploration are we doing is it like magnetic um magnetic resonance or magnetic um um sismic or um whatever whatever problem we have and then the challenge we are facing is that that takes us 3 months to analyze the data or process the data can machine learning help us in this specific task like make the data processing may be faster or more efficient that's that's one way of looking at it um um the same like so improve production and and exploration is very PR and we will have some examples um here but the idea is that if if you narrow down a specific problem um or revisit anything that you do and ask yourself can can can machine learning model help me in this this would be much more um effective what steps can be taken as a beginner machine learning to become proficient in it and using it in borium engineering um applications and what projects can we work on to build these skills this is this course so in this course we'll have one example one one approach again it's my Approach which which is like move fast and bring bring like break things and uh it's also called um learning while doing it's some people like really love it some people it's hard for them to to adopt but this is what I I think is the fastest way um to to become proficient and to to gain skills uh and we will try to do this together in this course uh but again uh if you if you don't love that like if you think this is not the way I love to learn I really am a books person give me a couple of books I will read through them 100% And then I will start working um if this working for for you that's fine but the whole idea is that to start acting on it don't wait until to finish 10 books don't wait until you get a PhD don't don't wait this this would be my core advice what do you think about using machine learning to increase or decrease will h a choke on G Will by uh by automatically but automatically and new technology being used um um again um for I'm I'm I'm taking these as as as notes and I will um maybe every every session before we actually dive deeper into our handson session we can have some case studies for this specific question so I have I have the the um the stress analysis we have the like choke I'm not sure because on top of my head I cannot really name specific case use cases so I will try to bring use cases for these for these points that you bring up so thank you for for your input uh what do you think about using machine learning um yeah we got the same question um do you have YouTube channel where you have other lectures I don't actually I'm not that famous uh but yeah thank you Dr for for giving me the opportunity and this platform to really um try to to add any value all right here let's get back to the slides that was very good uh very good uh questions break let me get my S of water all right so Emil can be applied in various way I think we discussed a lot of that while while we discuss while we we had the the QA break but let's let's revisit them very quickly predictive maintenance um again can be to alizze equipment sensor data again which sensors uh it doesn't have to be vibration data doesn't have to be we don't know whatever data we have just get them and see what we can get out of them in parallel and this is important the more data the better right so in parallel what I would do if I have this case I'm working as a as a in a factory I would start reading papers about people who did predictive maintenance or failure analysis into um similar situation or similar setup or similar equipment and then I will have this comparison right oh so they had this amount of data or they had this sensor data was it very important how much did it improve their model if we remove this data which I don't have is the model still useful so I evaluate my data on my own and then I also read other case studies and see um how what data they have how did they utilize it and how can uh we improve it um Reservoir modeling we already um discussed like the accurate uh models um of course from Reserve estimates to optimization to all everything that's about Reservoir modeling um and trying to have what we call data driven models versus the physics based model again till now we still rely on the physical models um there is a lot of push back on on on the data D models and um uh it seems the industry will Converge on on the combination like you have a data driven model which is constrained by the physics so it doesn't actually solve the actual physical but whatever Solutions you have they have to validate um the physical requirements I believe this is where we are going and this is where um we will have the most potential the seismic data analysis um again from identifying potential locations to improve accuracy of exploration and so on a lot in the cmic especially with the cmic being huge data in gigabytes or um terabytes sometimes so um a lot of a lot of efforts a lot of active efforts in in the seismic data analysis domain um any any process optimization and this is again for everything that you do think can can I use an ml to really um help me with what what I'm doing I'm I'm having a well planning um um what variables get go go goes into a will plan um can this be fit into a model can I quantify this thinking from a machine learning perspective is um is important um predictive um analytics um any any Trends or any outcomes and actually our challenge that we will be discussing is actually um can be classified in this in this domain like we we want to be able to predict uh the the bottom hole pressure basically as we will as we will see uh quality control um again um we can be like ml can help us in any detecting any anomal this is something the machine learning is very good at right because again it it's it's all about Trends um so if if you have a specific Trend or expected Trend let it be in in in in production in a specific machine in a specific Factory whatever having an like um anomalies this could be detected so anything that you you think has a specific Trend um and you want to know if you are deviating from this trend like think of applying an ml model to that um supply chain optimization um in general and of course the HSE like the the safety um um prevent incidents or minimize environmental impact this usually um typically would be uh by analyzing the sources of these um of these hazards in your environment and then um collect in much data about these different incidents again it's about data this is what we you'll find us going back to and this is what differentiates a company really utilizing machine learning or or a country or a nation utilizing machine learning versus one it's amount of data available in a digital format in a structured format structured data is is the hot key word here um and and this is um this is important to be um to be understood and to be utilized um if you don't have um this data um and this is as we discussed this is the the whole reason for this boom that we have more and more data and more and more computing power to really like digest this data so if you have like um a paper like a couple of paper sheet files and you come like I want to do machine learning I'm sorry I cannot help you right so this is the same here so how specifically can machine learning K be me in this domain the question comes back to you what data you have about this do you really really have very detailed reports about your incidents uh what happened when it happened why it happened if you have thousand or 100 incidents we can take these reports digitize them convert them into toiz models and then try to see a pattern yes so it seems like the incidents always happen after this like we always have a problem after like two weeks after we do uh the maintenance right nobody noticed that nobody saw this pattern but the the the machine learning was able to to see like to see the pattern right if it has detailed information detailed data so the more data we have the more structure data we have the better um the better um conclusions we can drive um using the models all righty so the the million dollar question right where can I start um and again my advice is very straightforward very specific get your hands dirty uh meaning like don't go play in the mud meaning like actually start doing start typing start um exploring um this is this is the most important this is this is the barrier that I'm trying to break with you in this course like you you really have the confidence to actually start um you doing machine learning basically without like I don't have data I don't have coding skills I don't have anything let's still do it and this is this is the barrier that I'm trying to break um using this course that like don't let anything hold you back okay um and again repeating coding is no longer the challenge don't wait to be a good coder so two platforms that um I suggest uh we start with we will be using onward there's a specific ongoing Challenge on onward that we will be um working on basically um and there is always kaggle so what are these these platform these platforms are um Awards um oriented challenge platforms you you get um you get and we will we will explore them live now after um I just have a couple more slides and then we will go like demo hands on again like I don't this this all the slides I have the rest is really uh practical and hands on so um these two platforms they have what they call uh challenge oriented so they they give you data so they have a lot of the data sets they get the data from different sources they clean it um to a certain extent of course and they prepare it for you to start working and then we have the community Associated um so you find other people solution you see the code from other people you ask people questions and this is this is the real value here that you really learn from you have the opportunity you have the opportunity to participate in the challenge and and win a prize if you really um able to succeed but for your beginning like don't worry about winning I like I I don't want I don't win this challenge they are very they need a lot of time Dedication that I don't have but I learn from them and this is the core right um I go there I explore the data set I I I try to to to to participate and then when the challenge is over and sometimes they share the winning code this is really the value like to really see okay this is how I thought about the data this is how I thought about my model oh they did something different how I didn't think of that I didn't see that that's very valuable that's and then I learn from this and I can apply this to my next challenge or um to my next um project basically so this is this is what I call hands 100% Hands-On approach it's learning by doing you are really um no books like I'm not sharing any books with you I'm not sharing any slides it's just dive there just go there sign up create an account it's all free and start playing now of course you need some guidance to get it started and this is what I'm trying to do here to get you started and hopefully guide you a little bit or give you an idea of what what you can be doing or um how to approach this um so I will just like for for the last couple of U of slides I will these are some some definitions that I would love to for for we to be aware of um which will come back to later but the these are just um a typical step like typical steps in any challenge so we will go through these um uh quickly before we actually jump um into the actual websites um any questions before we continue give you a second and then we continue Okay so um when it comes to a challenge like I have a specific challenge what um what I would have is the most important that I need to look at is a problem definition what are we trying to do here this is this is the core when when I have a challenge I want to know what we need to achieve if I don't have a specific goal if I'm not if I don't want to achieve anything specific then I'm I'm wasting my time basically right so the problem definition is what really drives uh uh Drive the whole thing right so it's um what you want to predict for example you want to predict future oil production right from historical data if you want to predict price of the oil or um dedictated something is going wrong or classify different types of FS like so these WS I have specific WS they have um they seem to be in different production zone or different um reservoirs but I don't know what's going on which Wills are should be grouped together and tilt with a specific treatment so the classification is one is one of of of the way that um a machine learning and we will we will talk about these different types of of machine learning um machine learning models um and how we deal with them so this is the first step the second step is the data again it's always back to the data um when it comes to the data we need to gather data from different various sources of course like we cannot just rely on One Source the more sources we have the better um and in the oil and gas this includes like sensors production logs geological surveys any any kind of data we have and then we need to clean it and this is actually a part so our work will start from here so we'll have problem definition given to us we will have the data given to us and then our work starts from here this is where we start to clean the data because this is coming from real life or like don't assume the sensor that's um in the will buried um thousand of feets with extreme temperatures and pressures don't expect this sensor to give you very accurate precise data no this data will have deviations this data will um will have errors we have problems so it's important to to be aware of that so you need to clean this data handle any missing data um remove any inconsistencies or outliers so you like as we said The more data the better but the more good data is the better so a crucial step in our work is to actually clean this data and get rid of any data that we believe um is not is not helpful or like not clean enough um and then what we call engineer features or features engineering this is this is the hardest part ever this is what takes this is some people say this is 80% of the job um um could you share PDF yeah I think like I don't have a lot of slides but definitely we can share this slides um um I will see with my and Dr G um so um having this um Fisher engineering is is is very crucial um it's it's what me like it means how can this data be in its best format that the model can understand it do I need to um normalize it we understand what normalization is do I need to normalize it do I need to convert it to another format do I need to um change anything about it and there's a lot of things that could be changed believe me um to change it in a way that that the model can understand it so a lot of work goes into feature engineering and this is something you can never say yeah I I'm I'm now I know all about like feature engineering this is an ongoing process and that's why I don't recommend you wait for some time to start because you you need a lot of knowledge actually to actually um say I'm an expert but don't wait to get this knowledge like as as you go you will learn and your skills will improve but at every stage you can have some output you can have some value so take this value and don't wait um the Eda or exploratory data analysis this is very important and this helps you with the feature engineering to really understand um this data I want to visualize this data is the what is the data distribution is this data balance it does it is it does it have a correlation does it have a problem that I need to fix um so a lot of for again goes into understanding the data you are you are dealing with and understanding the different variables that you have or attributes and then comes the model selection this needs a little bit more knowledge but now with with with the machine learning models like the llms the Chad um Bots basically they help you a lot with that like do I do I have a regression mod uh problem do I have a a classification problem clustering problem we will try to identify um these types like we will dive a little bit deeper when we get to this step in in our um in our uh challenge uh but for now it's um it's just to what machine learning model or algorithm um that we need to do like as you can see here it says the right algorithm because again all of these are just algorithm right so for every problem and for every data set you have to choose the right algorithm for your problem and this is what we'll try to do um and then we have the training and evaluation um so this is this is I think back to I think Safa was was saying is it a trial and error and when I said yes this is here you really train the model you give the model okay here's the data try to learn something from it right so and then the evaluation is some data that the model doesn't see and you keep it for yourself and you you test the model did you really learn so um I I show you like I'm teaching you how to do addition for example so I tell you 1+ 1 is 2 1 + 2 is 3 1 + 4 is 5 and so on and then I ask you what's 10 + 11 so I didn't teach you that right I didn't teach you but now I expect you to be able to figure this out on your own because you have been trained a lot you have learned a lot so I ask you what's 10 uh uh + one and you say 12 I say wrong and I give you more data I still don't tell you that 11 + 1 um 10 + 1 is 11 I don't tell you that but I keep showing you other data and I ask you again and ask you again until without explicitly telling you that 10 + 1 is 11 but showing you so many other addition operations um you you tell me yeah 10 + 1 is 11 so okay now you learned that so you learn this on your own by by by basically try and error so this is what we call training and evaluation so the training is the data that I give you um we call Ground the truth or like um I tell you what's 1 plus 2 what's 2 plus 3 and so on but I don't tell you what's 10 + 1 10 + 1 is my evaluation how I evaluate you so I cannot evaluate you on something that I already taught you this is like okay you just memorized it and this is something we call overfitting it's basically the model memorizing what you're telling him I don't want that I don't want to the model to just memorize and this is back in school this is exactly what they they try to fight don't memorize the book understand the concept and this is exactly again a lot of of similarity between how we learn and how we trying to to to make the machine um uh learn basically so we we um when it comes to evaluation we ask the model about about thing about data that we didn't show it to him we didn't explicitly tell him what this is so but where do I get this data from that's why we split the data if I have 100 data points I don't give all the 100 data points to the machine to learn on them no I give it 90 and I keep 10% or 10 data points for myself and these 10 data points is how I test the model if he really learned or not because if I give the model all the 100 data points then I don't have any more data to test or evaluate the model so this is a very important concept and you see how complicated it sounds like how very deep in coding it's just one line just one line you do all of that you split the data this is the training this is the testing like so that's why the concept to understand what trailing and evaluation means is much more important than just memorizing how to type the code to do that the data splitting or the training and evaluation um and then we train the model of course Like the Model start to learn and then we evaluate how did how good did the model learn is like we have like absolute mean error we have root screen error we have the accuracy the Precision recall F1 score all of these are different matrices that we use to evaluate the model and they don't they don't work for each algorithm So based on the problem type and based on the algorithm that you used which in turn is based on the the problem type right so your your model your algorithm is based on the problem and the same way your evaluation is based on your problem and based on the algorithm that you used you cannot evaluate um all the models the same way the same idea you cannot evaluate a 2 years old a 5 years old and a 20 years old people with the same way right everyone he learned in a different way so you should evaluate him in a different way and this is exactly true for algorithm again based on the problem you have based on what algorithm you chose you have to also choose how you will be evaluating this model um like then we have the model tuning what we our fine tuning like the model is working fine but we can really improve it we can tweak it um a little bit to improve um its uh its performance what we call hyperparameter tuning um and there's cross validation um to to ensure that the model generalizes meaning the model didn't just memorize what I gave it no it actually understood um um the real relationships between between the different um between the different attributes and really um extract the information properly um last but not least the deployment right how I will be using this model again it can be like I can deploy it on a server um uh or I can just use it at home like I can just use it on my machine but if I need a lot of people using it then I need it to be deployed somewhere in the cloud on a Google server or an Amazon server so I I don't think we'll be discussing this but it's it's the natural step into um into this um and that's it actually that's all like the typical steps um in and a challenge and we can of course we will not get to actually work with the challenge today but at least I want to introduce you to the challenge show you um the different websites keggle and onward um and then um next session we we can answer some of your questions or have a little bit of um Theory at the and then dive into the handson um so any any questions before we get into the into the the demos or show the website like I give you a couple of minutes to if you if you you want to type any questions until I share um the proper screens that I want to share with you please type your questions uh now is a good time I don't see any questions so let's continue I see questions but I I think I didn't read them Dr Ramy uh before after we end the session we will have a quick Q a session we have 17 questions um yeah but I I I was going through them live you like you want me to you want you want to answer them now I I answered most of them we just have like three here new so I had a couple of stops and we did answer some of the questions um so now I just have three new ones like 14 have been already answered okay like I changed the strategy a little bit to be like interactive yeah that's very nice yeah I like again I was just telling them like I was expecting like I'm more interactive so this is how I I drive but they have been very great impulsing the questions so it has been going well uh I'll ask answer these three questions and then share my screen again for for the last part of the of the class today I know we are about to run out of time all righty so uh uh how can we understand what model to choose or what method to use after we had done ex laboratory data analysis that's very good question um the the short answer is it's about the problem definition what are you trying to do like again the big categories is it a classification is it a regression is it um any any um anomaly detection whatever uh whatever type of problem every problem has a typical um set of models actually for our challenge we have a completely different thing we'll be using like um time series challenges but again your problem definition and what data you have is a big is a big part of of deciding this and again this is the part that really had some like you need to learn it but with with with with the chat gbt and with with the with the recent llms they will answer you very quickly like it it's much much easier job now with the with the presence of of other tools like tgbt and Google Gemini could you mention website links here sure here is the first one I'm not sure do you see the links here uh because this is to panelist I think yeah to attendees uh I'm not sure if my chat goes to you anyway like we will share the PDF and it will it will be with the PDF I don't think you you see my chat for some reason uh but we'll open them together now [Music] um and then finally the last question a case of having good training results but reduced test data performance say in terms of R2 value what could be the issue this is typically a case for what we call overfitting so an overfitting is that the trailing really uh memorize your data right so I ask you what's 1 plus 2 one two like every single one that I showed you you really um you really memorize it but then um when I ask you something new you don't know so that means you did not learn how to do addition you just memorized all the examples that I showed you and this is very crucial so that's that's very um important distinction that it's not about really memorizing so uh and again how to this is again identifying the problem that is an overfitting problem now how do we identify um um the and how to solve the overfitting that's a different story right and again there's a lot of tools um how you you perform the model there is something called early stopping like um there's a lot of techniques um that we can use to compare that but at least you need to understand that your model has overfitting and now I want to solve that um we we only have 10 minutes so let me um at least um quickly show you the website I'm not sure if you got the links or not again now we still have a couple of questions uh will it help machine learning um in case having good training results yeah we answer that okay let's go um to the website so the website that I want to share with you here as you can see or still I think now you can see is is keggle your machine learning um and data science community so the whole idea here is that Learners here developers here and researchers and you have um when you have a big project you get all the community all the people working you have all the data sets as we discussed where do you get the data from Keel has as they say 339,000 high quality public data sets so um everything from avocato prices to video K sales so this is very very valuable and we will we will see some examples and then the notebooks this is this 1 million public notebooks these are like very smart people much smarter than than than me at least and they they have their codes available for free how they solve the problem all the code code I'm I'm to very and this is most of the time these are winning code so this is the code that win the competition or this is the code that best solve the problem so that's where I'm telling you like it's not about the actual code because the code is there you will find it from published for free as you can see here or you will have chat gbt write a lot of the code for you but really how to think about the problem understanding the problem and how to approach the problem is really the value if I go here and I I write like kaggle oil and gas you think I did this search before here and you can see yeah you see here is very um open data sets oil and gas production data so you already have this is this is six years ago this is pretty old this is two years ago so this is um data set ready for you to download to start to play with production date Country Town field um this is another one this is global data set for oil and natural gas produ uction prices exports and the net exports right so huge amounts of data available and then you can go and explore what what what notebooks or what models um uh did people download um and use this data for so this is one of the very um like valuable sources that uh we talked about that you could you should utilize in actually um having and looking at the data so this is kegle um we will not be utilizing Kegel in this in this course but what we'll be using is um sync uh onward it's it's a it's an innovation platform they they are doing multiple stuff but um the core the core thing that they um they do is actually um their um their platform um that the challenges on the platform so I think the platform is you have to register but it's free so please sign up um and you can have see all the challenges this that like the active challenges now they have two challenges that are active this is the one that we will be using actually for this course at least this is my plan um is spot the trend patternation in one de pressure data this is oil and gas production data actually um and we need to find the pattern of this data um again this has been going for 10 weeks it only has um it only has um three weeks left um actually so we will maybe during the course we will try to at least submit something I I don't think we might win to be honest uh but at least we get to um to submit something and understand how to how to see the challenge you can see the the upcoming challenge there is a cmic challenge I know a lot of people ask about sismic the price will be 40K so that's that's intriguing uh again don't like a lot of professional here so uh but it's not impossible like a lot of first timers actually won but you will learn a ton so this is my uh my message to you basically from from this course is to open your eyes on this kind of of challenges this kind of opportunities and encourage you to go explore them participate um and learn while doing them so this is our challenge um again I'm I'm they are texting me that we are at time so we will really go through it um next week this is I think your petroleum Engineers like you can recognize this as like uh measured and calculated bottom Hole uh pressures um this is the data they will give us um the how can we evaluate the model um this is a little bit on I would say on the advanced level so um uh we can like I'll try to slow down a little bit we will we we might have some uh like you don't have to really care about the exact code but the concept the these steps like that we mentioned how to do feature engineering how to select the model I want to give you an example of how to do this how to participate in such a thing just have any submission um um available even even if you don't you can already see people are jumping at the top um so it's it's very very good and a challenging Community to to to be a part of so try trying to make a submission in this challenge is my original plan for the course again your feedback if you think this is too advanced like we really need to slow down or go to a little like more Basics I'm I'm I'm happy to do that uh but my original plan was to live with you try to have a submission in this competition so this will teach us a lot of things how to get the data understand the data how to run a basic machine learning model the good news here that I know this sounds very scary but they actually give you the code that will get you 70% so already you see people are getting lower than 70% but the code they are giving you ready should give you 70% if you just know how to run it you see the code is already written for us and this is my plan to really just go through this code um understand it and go through the steps and um and do a submission for um for this competition so again you can see the code is not always that challenge a lot of the times you have the code ready or you can use um llm or chat gbt or whatever to type the code for you so um it's 2:20 I need to wrap things up here uh again thank you so much for um for your time for attending um I know this might have been too much or uh I I wasn't sure of what what are the expectations usually I collect the the audience expectations at the beginning but this is this setup is a little bit different so I hope um you you you learn something today and learn something in the future uh future sessions a last question uh can we start at beginning level and work on a small project and really see the result working that's that's 100% correct and especially if you go to like um kaggle they really you the the level of the competition how hard and how easy some of them like don't have any prices or like very very low you don't have to start with a million dollar pricee let me tell you I usually go for for the already knowledge one there is no prize because people share knowledge there this is a very famous one thean like Titanic Titanic machine learning uh just from some data predict if who will die on a Titanic ship very interesting problem you you always have the answers like you can go to the um the the note books um and you can see people already have the code this this code is the best code like it's submitted two years ago people voting for the code so you really can read the code understand it execut it 100% ready for you with zero knowledge but this is this is a course like you are learning from here and this is the core that um I want to share with you like you can reach this um uh you can reach this knowledge without really go pay $1,000 for a course this is out there this is free I'm bringing this to your attention and this is my main uh message um for this thank you thank you so much Dr Romy for this informative session and for your time and effort as well H we really benefit quite from it thank you so much thank you so much again thank you thank you everyone sorry if I I passed the time best of luck for for the whole internship and thank you May and Dr for your amazing work and organizing this have byebye

Transcript for:Fourth Session of Summer Internship Program Notes

Transcript for:
Fourth Session of Summer Internship Program Notes