Transcript for:
Overview of Predictive Asset Maintenance

I think we're live now. Let me quickly check across platforms. Yes, yes, yes. How is everyone doing? Good morning, good afternoon, good evening to whichever part of the world you're joining from. And if you're watching it later, then hi. I'm Mudit. I'm one of the founding members of DeFi. We're a global community of AI, AI enthusiasts from 150 plus countries. Our vision is to educate and build AI for all. If you are a learner or somebody who's into data science, you can come and learn on our platform for free of cost. Every single resource is available for free. And you can also engage in AI challenges, interact with other community members, and give back in other people's growth. This particular session that we're having today is a part of our weekly sessions where we host an AI leader, an AI changemaker, who is... willing to come forward for the community and create an educational session that can help you gain hands-on skills and a proper holistic viewpoint around the topic that we're choosing. Now on today's date we have a topic around predictive asset maintenance. Okay what that is and what it talks about briefly as you can understand it talks more about machines how you can predict how machines can be run in a better fashion and you can improve the life and all. But the nitty-gritty is something that will be explained by our host, our guest for today's date. And he's none other than Divyanshu Vyas. Divyanshu, for a lot of you know, you might know, he's the founder of Petroleum From Scratch. Petroleum From Scratch is a YouTube channel and a community where they teach data science, analytics, and related content. He can obviously introduce it better. He's also working as a data science researcher at Shell. Previously has worked with various other companies. He has been with Accenture, L&T Infotech and with a startup as well. And with this, I would love to invite him as he's waiting in the background. So let me have him on board. Yeah. Hi, Mudit. Thanks. Thanks for having me. Hope you're doing good. Yeah, I'm doing all well. How are you? Thank you so much for agreeing to be a part of this. It's my pleasure. I was just waiting to be a part of this. and i'm happy to have yeah and i think a lot of people might not know devyanshu comes from the community he has uh been on defy uh so uh it it is and it's even more uh you know kind of enjoyable for me to have you on the show so thank you so much um i think we can straight uh right into the session you can talk about what you do and this session where we talk about pam one important thing for the attendees is if anyone has any questions you you may feel free to put them in the comment section or we have also received a few questions beforehand so we'll pick them of course towards the end and if you're putting it live then we will you know we will curate them and we'll pick them as and when the time comes for the q a so over to you devyanshu i'll go to the backstage thanks thanks mudit uh yep uh welcome everyone and thanks mudit for inviting me uh i have been following d5 for a lot of like from i think from the starting of its uh It's existence and I have been a great fan of the work they do. They want to reach out to the community, not into their pockets, but to their brains and just help them understand the fields of AI and ML. I'm really, really fascinated by the kind of mentors they've brought on board. And I'm happy that they are including me now as well. And yep, with that being said, I think. Petroleum from scratch, the venture that I along with my teammates, we kind of take care of is kind of sharing the same vision. We also try to revolutionize and we also try to make sure we reach more and more people based on whatever we know. And our aim here is not to just have the one directional knowledge sharing. But as we reach more and more people, we gather more and more mindsets and we are trying to. connect that as we do all these webinars and all on the weekends. So Petroleum From Scratch is basically kind of the DeFi of petroleum, if I can say that. And we try to explain all the things. There is a YouTube channel and so on and so forth. So feel free to reach out and I'd be happy to contribute in whatever way I can. So with that, I think just like to confirm my screen is visible and my camera is it's all fine, right? Yes, it is all fine. Yeah. Yep. Okay. Perfect. Perfect. So yeah, I mean, I'm cutting short my introduction and just directly heading to the session because it's a it's an important one uh and i i would like to uh talk a lot about whatever i know and keep it on the plate so that whoever is blind to the subject of predictive asset maintenance can uh you know directly uh have a feel of it through the session i want everyone to take away some part of it if someone is like they have worked on it before and they want to improve what I have done and they want to suggest some better practices, feel free to type in the chat for everyone to, you know, improve on. So let's get started. This is a picture that I clicked. I was recently traveling to Bangalore and I clicked this picture because of the recent predictive maintenance project that I had worked on. And I was like just thinking in my head that this particular picture that you see, this is something, this is some equipment, which kind of the better functionality of this equipment. decides the fate of all the people that are traveling on the airplane, right? So that's how big the impact is. If you realize how important this particular picture is, the equipment is, you realize how important predictive asset maintenance is. Asset is nothing but any equipment, any site, any sub equipment that you're looking at. And the approach that we'll be looking at today is in fact to make sure that... equipments like this they don't they don't malfunction they don't cause any issues when they are operating and even if they are going to create something we we get to know it as engineers we get to know it beforehand so that's that's the basic idea i mean just imagine if if uh if god forbid if this particular piece of equipment just has a slight error in its functionality it can cause it can cause problems keep it very simple right and not just a simple equipment problem not just in terms of money in terms of life as well fast forward like connecting it to other pieces of the world uh you can think about any industry be it let's say drug manufacturing industries uh the industries that are developing uh developing let's say drugs for cancer patients or drugs for let's say the the companies that were making covered vaccines for example for example all these all these companies they have manufacturing sites which are which are having bioreactors etc etc in their in their uh in their plants and if they are they end up malfunctioning without us noticing it uh they won't fail to be honest but if they they end up rotating in a less rpm or they end up functioning in a way they are not supposed to and we do not notice it it might end up leading to results that will affect the humanity right so that's how big an impact this is and this is one of the most impactful projects that i personally find so let's move on and and i hope we develop that mindset and we look at this project with the impact that it can generate moving on yeah this is a photography uh skill of mine moving on so like i said predictive asset maintenance looking at three words together asset i i already told you asset is nothing but uh uh any any equipment that we want to we want to pamper to be honest we want to take care of we want to make sure it does not fail and predictive i will come to that in a bit maintenance we already know we are taking care of it we are maintaining it right so our aim is to monitor the data that we get from an equipment or assets whatever you want to call it uh you can you can host a predictive maintenance project for the entire field as well a downstream refinery for example Or you can do that for a particular equipment or a compressor or a pump or something like that. So it depends on whatever use case you are following. But these days, initially, we were going with the reactive maintenance kind of philosophy, right? But we'll come to that in a bit. But our aim in predictive maintenance, just to introduce it as a subject, is to use advanced data analytics or machine learning algorithms on the data that the equipments are sending out to us. to create a sort of a proactive model it's a very important word i hope everyone notices that proactive model is a word i want you to take with you today proactive modeling for equipments or assets in the rig or in the sites is what we want to do that's the ideal ideal thing that we want on our hands right machine learning models they can be trained using historical data to detect any abnormal behavior that could indicate any signs of failure our aim here is to know something before it happens. Our aim here is to know something before it even slightly deviates from its original functionality. and our aim here is early detection we want to we want to detect we will do that we are engineers we will detect something but pam focuses on early detection our aim is to detect as early as we can so if we do that then we would have timely preventive actions then we would have preventive actions taken care before the failure even commences and that's what predictive approaches that's why the word predictive here is you might have come across the word predictive analytics It's being proactive, to be honest, nothing else. There's something which is passive approach, like something has happened, and then you are analyzing why it happened so that it does not, like, you are just making sense of what really happened. But if that approach connects to something that can affect the future, you are going in the predictive domain. So that's about the word predictive. Now, let's talk about the old school approach, what was happening. And then we will understand what I just blabbered out way better. So old school approach was reactive maintenance, right? In a way, there were two approaches like we could have like two kind of approaches are very work were very common and are very common in most industries today as well. One is waiting for the equipment to fail and then repair or replace it. This is something like your gadgets, like your phone chargers, for example. It is not affecting you in a bigger way, right? Even if your phone charger fails, it's not causing problems in your life. If you think that's a problem, look at this picture again. So you have to wait for the equipment to fail and then replace or repair it. Of course, this cannot work in this kind of picture again, because what if? I mean, just imagine, you know what I'm talking about. So you cannot wait for the equipment to fail in such cases. But there are a lot of cases, a lot of plants who actually do that. The problem, I'll just come to that. There is a huge problem. Even if it does not affect or kill people, it can lead to some other problems. It can lead to financial problems as well. The second kind of maintenance was holding periodic maintenance activities. The problem with this is, what if the equipment was still very good when you kind of replaced it? What if there is a company which follows a replacement? replacement activity every one hour for a compressor. So what if the compressors that are developed these days are way better and they can perform for 10 years without any problems and you are already replacing it one year after the other. So that's not the best practice because you are killing the potential of nine years or more just by replacing it very early. So early replacement is also a problem. Late replacement is also a problem. So the step one is late replacement. Step two is, the approach two is early replacement. Both have their own problems as you can imagine. So they can prove to be very costly. If you are talking about oil and gas assets, if you are for example talking about the electrical submersive pump which is located offshore and you are doing doing a maintenance activity which was which was not required, in that case you are wasting time. You are causing non-productive time and every petroleum engineer kind of knows that non-productive time has its own losses in terms of millions of dollars right so you do not want to do that right so let me read out what i've written here traditionally equipments or sub-equipments or whatever hardware you're looking at they they and that's happening still in most companies they used to be handled in a way that they are replaced after particular set time intervals there is a there is a protocol written that yeah one year every one year we will replace them which means they could be replaced even when they are working okay or working in good condition like i just told you alternatively they would wait until the equipment fails both of them are not good approaches and that's reactive maintenance that's what we do not want to do that's why we are attending this session today now there's a better approach that people started following back in days and still continuing with that and a lot of companies of like all oil and gas service providers they are still carrying out this practice something called a physics-based condition monitoring right so this particular approach is better than of course better than this approach but it will have its own drawbacks what we are doing in this case is oh sorry i'll just take this up what we are doing in this case is we are trying to monitor the data that we are receiving of course data is important for physics-based monitoring we need for example we have equipment it has a sensor located on top of it It's generating data, we are on our computers, sitting in the offices, we are directly getting the live data streaming on our computers. And we are monitoring that data that is coming to us from the assets or equipment. Our aim here is to again help detect any issues, any malfunctions, any abnormality. It can not necessarily be a failure but anything which is not normal so that we can be proactive. Yes it's a proactive approach, yes it is but it will it will will it be that good? Let's just read what I've written. So for example if you have a refrigerator and by some way you attached the temperature sensor on the back of a refrigerator and you are recording that temperature value. on your computer and that temperature value as per as per the the textbook the manual that was given to you by the refrigerator manufacturers it says that the temperature of the compressor of the refrigerator should be the venting temperature should be around 50 degree celsius 200 degree celsius for example i'm just making numbers up and you now have a sensor located there and now you're recording the time series data for every time interval you're recording how temperature is looking like and then you observe that hey uh the temperature now today it's it's getting closer to 80 80 degrees celsius and 90 degrees celsius or something like that so what will you do because it's reaching to that threshold that the sme or the the hardware manufacturers have guided you would go to the refrigerator you would call someone for maintenance and you would check what's happening why the temperature is abnormal or getting closer to abnormal that's what physics-based condition monitoring is there are a set of rules that the that the field people or the engineers uh who know the things better they would guide you so that's one approach that a lot of people are following and for example physics-based maintenance of esps the submersible electrical submersible pumps that we we have on our offshore wells what if we were using physics-based maintenance on them what we would do is you would check or repair only when the rotational speed of the pumps goes beyond a rpm or goes below b rpm rpm is the rotations per minute and the values a and b can be suggested by field smes again which is basically the 30 year experience petroleum engineers or something like that but the problem with such approaches these approaches are only good if the time series data or the sensor data is highly consistent throughout the normal events. It is not noisy at all. For example, the refrigerator works at normal daily on a 45 degree Celsius temperature and all of a sudden it deviates and gets to 80 degree Celsius. Our human eyes can just observe what happened and then suggest a maintenance operation. That's the ideal cases where these kind of approaches can be good. But that's not possible. everywhere because we will just now look at a project where the data set that we will have would be very noisy it's it's almost impossible to create a physics based rule based on the data that we have and even if we would it would take a lot of time so with the increasing pressure if with the increasing reliability uh requirements we are we are uh we are asking for better solutions we are asking for uh better algorithms that's why we move on uh to to uh the AI and ML based predictive maintenance of assets. So let's move on and talk about why AI, ML could be a good candidate for predictive asset maintenance. So like I just gave you an example, most industrial assets, they capture many thousands. So that's again one more challenge. Like with the refrigerator, you might not have that much data points. The physics based problems may not work with huge data sets because it's impossible for us to set rules for data that size. Most industrial data sets, they capture data in milliseconds or thousands of data points per minute. Now, just imagine, can you set up a pattern? Can you observe a pattern for a data with that huge a frequency? Of course not. So the reason being the data set size is huge. That's far too many for human brains and eyes to capture any pattern. And that's why computers are brought in. And that is probably one of the most intuitive reasons why machine learning, which is kind of a computerized brain we are trying to develop, to understand the pattern, which we as humans could not. Maybe physics was not able to capture abnormalities which computer could capture, right? So that's our hope, basically, in trying to use AI and ML in predictive asset maintenance. Remember, machine learning wants quality data and good quantity data. So the quantity thing is pretty good in these cases wherever we are working with. predictive maintenance projects, the IoT sensors or something. The data size is never a problem, but the data quality can be a problem. We will look at it. So what we do, assuming we have data, we use the historical data, and then we train machine learning models on it to recognize anomalies in that data. Just like we were calling anything beyond this A to B threshold to be an anomaly as per physics, machine learning will also try to... based on various algorithms figure out what are the anomalous behaviors of this data and based on the anomalies we would generate alerts or alarms we would trigger alerts or alarms and then those alerts and alarms would lead to further investigations okay you would we might not uh take every alarm seriously okay so please read this statement very carefully A lot of people think that the alarms which are generated, then they are directly conveyed to the field people, which is not the case. The alarms first, because they are based on data, we also need careful human in the loop kind of scene. We also need people from the field taking a closer look based on their domain understanding, because of course, this is not purely data driven. We also need SME inputs. The alarms are first conveyed to the SMEs or domain experts or. field field experts maybe a production engineer working on a production site or maybe a drilling engineer if it's a drilling equipment we are looking at or or someone who is who's good with pumps and compressors if you are looking at that kind of equipment so we would tell them that sir around 100 kind of alarms are expected and this is it and they would what they would do is they would look or take a closer look at those kind of alarms and they would be like yeah i mean they would take a closer surveillance of whatever reports they have you And they would narrow down the number of alarms that our machine learning model has suggested. And only n out of those 100 or whatever number of alarms ML has generated will be forwarded to the field engineers to carry out whoever is actually operating on the field to carry out necessary actions. So like you can see, it's a good blend of data, machine learning, physics, and field operations. So it's a good project for engineers. Because everyone kind of works together in this. That's all about the mindset. I was just talking in order to, you know, develop a context in everyone's brains to develop a mindset for everyone, right? And now coming to the common assets that we can handle. Control walls is one of the most important, interesting success stories as far as spam is concerned. Then there are ESPs. Downstream refineries are also using PAMs a lot. pumps, compressors, aircraft engines and feel free as audience feel free to type in whatever cases you have seen PAM applications in and I already gave you examples of other industries like the bioreactors that are you know bioreactors that the drug manufacturing companies they have right. So moving on let's move on to a hands-on project and understand the workflow. I will not be very deep in coding in in this session at least because it might take more time than than i have but in this workflow i will try to talk about the the practices that i have uh basically practiced in the projects that i have been a part of and maybe try to convey the the workflow sense of a pam project so this data set is openly available it's a very popular aircraft engine data set that you can find on github if you need this data set you can ask me as well so we imported a few libraries. I'm expecting a bit of Python domain background for whoever is attending. But even if not, don't worry, I'll not be focusing too much on the code. So first up, we import the dataset. And it's a time series dataset. So every sensor located on every equipment, you have to start with the data strategy first up as a company. If you are expecting a PAM project to be applied on your site, you need to set up sensors on your equipment because without sensors the data will not be there and if there is no data there will be no pam applications there will be no machine learning possible so these days every equipment every sub equipment a system of systems they all have their own sensors right so if there is equipment with three sub equipments all those three will have their own sensors and the entire you know bigger scale equipment will have its own sensors so it depends on what kind of pam project you are working on whether your predictive asset maintenance is too uh too like focused on the details like the sub equipment level or are you looking at the site as a whole Okay, so you might have a variety of sensors. For example, I was working for a mining project where we had a huge plethora of sensors. We spent a few months just to figure out what all those sensors are. So to understand, we connected back and forth with the field people to understand which of these sensors are talking about temperature, torque, which of these sensors are talking about. velocity, which of these sensors are talking about the load on the weight applied on the particular equipment or something like that. So your physics-based knowledge has to come very active in these kind of scenarios. For example, if you are talking about submersible pumps, the speed at which they are rotating is of course very important because that speed governs the fluid which will be pumped up, right? That speed will be... directly kinetic energy will directly be related to the potential energy generated in lifting the fluid so all these kind of uh high school physics concepts come in very active for example for the mining project we realized that uh that the torque uh the torque has to be uh with below a particular uh threshold because if if uh the torque applied on the on the entire moving conveyor belt is huge it means you are loading it way more than it is required so that's the kind of idea you need to develop and the field people will always be in in the loop as we do that so as we import the data first up what we do is we try to we try to visualize we try to look at the data so you can see we have a lot of sensors in this data set s1 s2 s3 now right now because this is the purely data driven methodology i i don't have any sme talking to me if i was working for this project in a company what i would do is i would ask the field people to tell me hey what is this s1 what is this s2 what is this s3 and so on and so forth and then i would make my notes that yeah s1 because we are trying to predict the failure of a compressor i do not care about whether it's raining on that day or not i do not care so that's a very vague example but i need to narrow down my sensor data just so that i only pick those sensors which are relevant to the kind of failure i'm looking at if i'm looking at the failure of a compressor i do not care about the the sensor data from equipment located far away in the field right so that's the kind of narrowing down you might need to do okay but of course be ready to face challenges be ready to be a one-man army as you are working on it because you might not have all these inputs from the field directly or even if you could get them it might take a lot of time to get to you so the data will kind of look like this and that's a consistent thing uh You will have sensors data spread out across time domains. And this is called time series because the index is time samples and it's a minute level data. OK, so if I just look at the index, just the time samples, DF dot index, you look at is the frequency of the data. The frequency of the data is right now it's not able to interpret it, but you can see 410, 411, 412, 413. So the frequency of the data is minute wise. And that's even that's that's a that's a very less frequency. I've seen data sets in three seconds or milliseconds or so on, so forth. So that kind of high resolution data is also available. So we have the data. Now we might want to look at the time series plots for all this data just to make sure we have data in all those sensors. So just as I look at it, you can see the sensor one located on the top does not have much data. It does not have any signatures. So that might make us or might tempt us to exclude these sensors from our analysis. And there might be some sensors if you zoom in on these plots, there might be some sensors which are having null values as well and so on and so forth. So it's very important to handle these kind of problems first up. So if I just look at the df.info of this, df.info, it will tell you how many non-null. So you... the nulls are not a problem with this data set at least but if they have a null values which will be the case in a lot of industrial data sets you might have to use interpolation which is done something like this df dot interpolate or something like this okay and the method to interpolate can be specified by you you can do linear interpolation you can do backfill or you can do forward filling or something like that uh it's it's advisable not to interpolate by the average uh if your time series data is expected a pattern, it's not good because filling in with the average for null values means you are saying that uh if you are if you are expecting thousand thousand uh like like the day before yesterday the value was thousand yesterday the value was uh one thousand one hundred today is a null value and tomorrow it will be one thousand three hundred and if you replace it with average the average says that the average value is 50 you would fill in 50 that's error that's not good because you are taking the entire data set average just to fill for a trend so it always is better to to interpolate in the trend that the time series is following as compared to filling it with null filling it with averages or central tendencies right so this data cleaning data processing is going to take up a lot of time in your pam projects because the data that you will get will be very very bad so be prepared for that because this is a open source data set and most people have already processed it i i kind of use a better version of this data set for the time that i had And that's why we might not be challenged that much today. Now, assuming we have sorted out the problems in the data, the numeric inputs are there, the time index is there, and everything is sorted, the basic requirements, the null removals are there. We have data that we can work with. Now we can look at the descriptive statistics just to get a feel of the data one by one. Okay, so we might want to, you know, I have written down a function here. does nothing but displays the distribution of the entire data set we are looking at one column or one sensor at a time we want to look at those kind of sensors which are kind of generating a lot of lot of outliers what is an outlier an outlier is a data data point which does not look like any of the normal data points that that kind of sensor has ever produced for example If you are looking at a class, if you are looking at a kindergarten class, right? And if you see a kid with a height of seven feet, that's an outlier because that's the height that is possible. But that's the height that the schools normally don't see, right? And that's a very like intuitive example. But in terms of data, we can also have such examples where, for example, you are producing at... from an oil well you are producing huge amount of oil like production rates and there is one day where the oil production was around two barrels per day now maybe someone mis-entered that value but when you look at the entire distribution that data point will be way off as compared to the normal trend of the data so that's an outlier in statistics this is defined by any data point that lies more than two to three standard deviations away from the mean of the data, from the average of the data. Let me explain it in a better way. Let me pick up S7, which is seventh sensor in this data and visualize the distribution of the sensor. Now there are three data points we are looking at. We are looking at the mean, which is the most expected value of this sample of the sensor. So if any day you would ask me, what is the most expected value, most frequent value, or most common value? What is it? I would say that this S7 would generate around 0.6 to 0.5 magnitude data on its most normal days. That's the most normal value that it has. Now, there are other normal values as well. So basically, sensor 7 basically generates data from 0.25 up to 0.82. Just making up numbers. 0.25 up to 0.82 is the most. common kind of data magnitude that sensor 7 can produce right any data point beyond that range is abnormal for this particular sensor so if on any given day you are finding a data point beyond this green line to the black line for example if your data the s7 sensor has generated 0.15 or 0.2 for example the s7 sensor is continuously from the last one hour generating data lesser than the green line then you might want to raise an alarm similarly if your sensor 7 is continuously generating data beyond the black line then again you might want to raise an alarm because your data data science knowledge tells you or your statistics knowledge tells you that this is not normal for this particular sensor okay so that's something that you would convey to the particular experience folk in the in the industry and he or she will tell you yeah i mean I agree with you or I disagree with you, but can you give me more evidences? And then you would go and check other sensors in that similar locality and see whether they are also malfunctioning or not. So this part of a data science project is called exploratory data analytics. And this kind of exploratory data analysis holds major importance. But you need to know how to tell the story. Now, I know that S7 is giving me some outliers, some anomalous values. I would directly pick up these data points. I would go to the time series and find out the date and the time where these anomalies are observed. And I would correspond this to my seniors and check what is actually happening on the field. And this will happen a lot in the initial phases of a PAM project. And now similarly you can keep on looking at S8 or whatever. We can see S8 has lesser number of outliers as compared to that. Let's look at S11. And see S11 is having good number of extreme value outliers or something like that so you have to give good importance to the outliers in your data set right so you can see there are some sensors which have skew in their data which means they might have outliers at some timestamps you need to check with the field people and if they give you a green signal yeah that they say that yeah this is an expected value for example uh shut in if your oil field is shut in you might not observe the esp like rotation something like that right so that's an expected value that's not an outlier because your field operations agree with what what was happening outlier is something that uh outlier or a problematic outlier is something which can create a problem so if your field people are like yeah this is a expected value this is not a problematic outlier please i want your analysis to include include this data then do not do anything but if your field people tell you that yeah this is an outlier i want you to exclude this data this is a noise this is not a good data point this is some someone mistakenly someone has entered in that case you might want to replace that outlier with the better value or or smoothened out or a more normal value right because if you include outliers in your machine learning models your machine learning might might give them importance and that you do not want you do not want machine to give importance to data points which are bad your machine should look at data points which convey some information. Moving on, you can also look at descriptive statistics numerically using df.describe. That's all about the exploration of the data to go back and forth with what's actually happening. Once you have figured out, yeah, this is the kind of data that I want to develop a model on, that's when your exploratory data analytics can pause for a bit. And now we will ask the field people to give us our labels. Because there can be two to three kind of PAM projects. One, the data set can have its own labels, right? Second, the data will not have labels. Third, you might want to sit with the field people to generate labels. So in the first case where your data did not have labels and the field people, they still want a kind of a PAM project. In that case, you might want to do unsupervised anomaly detection modeling, right? And... that's a more involved project if it's a very complex data set it might require to do some deep learning etc okay but for now today we are focusing on a simpler case where we have the labels available with us in the form of not straightforwardly but we have the information called the remaining useful life in fact we did not have remaining useful life as well what we had was suppose there are 200 small mini equipments you that have all these sensor values right and we have the number of cycles they have run for what we would do is you would group similar equipments in various bins like this is a this is a bucket of one kind of equipment this is a bucket of another kind of equipment and then we would look at what is the maximum cycles that this kind of equipment has run for and then we would subtract the current number of cycles that the current compressor has been running for that would give us the remaining useful life for example if if a particular kind of compressor two kilogram compressor let's say small size compressor has the maximum possible uh number of cycles of 200 cycles and the current time stamp conveys that the current cycle is one cycle then there are 199 cycles still remaining that's the remaining useful life of that particular cycle I found this good definition of RUL from a website. So this is your current state. This is your current timestamp. You can see on the x-axis, we have timestamp. At A point, you are currently situated. And B is the worst case scenario. You want to check how much time do you have before the worst case scenario, right? And that's your remaining useful life. Remember, most projects purely rely on RUL prediction. And what they do is normally they would have 5 to 10 equipments with their RUL values captured. And then you would do a Monte Carlo kind of simulation where you would generate an ensemble of similar degradation plots. And then... your current equipment you will take up and you can you can find the most similar kind of degradation plot and then that degradation plot will tell you that yeah 10 days or something like that are remaining so that's the RUL it's kind of very intuitive very understandable it's more like a simulation kind of approach that people follow and there are other approaches that they follow as well I think MATLAB, MathWorks they have great documentation on the RUL prediction as well but moving on we have RUL in this data set. What we did was we set up the remaining useful life column using the approach that I just mentioned and using that RUL column we created labels. If your RUL is less than 30 cycles, if your current compressor has less than 30 cycles remaining, you can call all those timestamps as a failure timestamp because your failure is about to happen. That's an urgent kind of scenario. If your failure is about to happen in 15 timestamps, that's again a danger that's more urgent so the labeling kind labeling is done using that approach in mind okay so you can see there are there is a rul value available to us there is a there is a label available to us there is a label available to us as well you can see the zero labels they they they talk about normal timestamps and the one labels they talk about failure timestamps So you can see corresponding to the one time labels, which is failure time labels, the RUL is four, which is very bad, like almost four cycles later, the failure can happen. Almost three cycles later, the failure can happen. Almost two cycles later, the failure can happen. So that's the kind of labeling approach we have used. 95% of the cases, you might have to go with some of the other labeling approach, you will never have straightforward labels available to you in the field. So you might have to sit with the field people, manually label every timestamp. Or worst case scenario, where the field people are not ready to sit with you and label it up, you might have to do unsupervised machine learning, where the machine learns normal trend and then any abnormal trend, machine calls a failure that raises an alarm. Of course, that will not be highly accurate. But there are deep learning techniques like autoencoders, etc., which function very well in those cases. All right. So moving on, we have the labels with us and we have the data with us as well. Okay. And this is coming to a very interesting part now. One of the most interesting challenges of predictive maintenance projects. Just imagine, even if you have last five years of data of the equipment. What is the max number of failure instances you might have? What is the max number of failure instances you might have? Of course, if the failure instances are huge, that's a lucky, but that's unlucky because your field has not been functioning well. But most frequently, the failure classes will have only 1-2% data in your entire dataset. So there is a 99% to 1% class imbalance. with 99% of your data being of the normal scenario and only 1% of the data being of the abnormal scenario or failure mode, right? So this is me checking for the class imbalance of of the labels. You can see we are going to use the label 1 here and you can see the label 1 has 17,531 normal data points and only 3,100 failure points. And remember we have not even done great amount of data cleaning the anomaly removal yes we have talked about it but not done it yet so now it is a problem right so there are techniques like synthetic minority over sampling or under sampling techniques or something like that or generating synthetic like augmenting the data or something like that that you have to do or some other approach that you have to do basically you might have to under sample the normal events up to the up to this particular count or you might have to follow out some other approach or something you have to do or maybe uh increase the increase the failure labels let's say i i would from a particular point of failure, let's say 4pm in the evening the failure is happening, I would label all the data points from 3pm to 4pm as my failure labels. So that when I train a machine learning model, the machine will understand that every data point which is in one hour proximity to the failure is a failure data sample. The benefit of that is, we would be able to look at far more in the future as compared to other simple models. The second benefit is we are trying to oversample the failure class as well. So that's one approach that has been industrially followed in a lot of projects as well. And it helps handle the class imbalance, not as much, but you might still have to do other ways of class balance handling, right? So that's one huge word of caution I'm throwing at you. You can note that down if you are firstly looking at it. Class imbalance is going to be a major challenge when you do BAM projects, okay? Label 1 and Label 2 are both based on RUL and both are like depending on what kind of use case you might want to create. Label 2 if you are training a Label 2 it's a 3 label kind of model and it might generate 3 kind of alarms. 0 is a normal scenario if your alarm is green it means nothing is happening it is a 0 label case. If your alarm is 1 it means your failure can happen in 30 minutes or 30 cycles from now. or if your alarm is of this color or two color then your alarm your failure is going to happen very very free very very soon so this is a urgency kind of modeling we are doing but focusing on the binary modeling for now now it depends on us what kind of label we want for simplicity you can go with if you are taking this project forward you can start with a binary model a binary model will be like whether or not failure is going to happen that's all that's that's it We are not talking about the urgency of failure, how soon the failure is going to happen. All we are talking about is in this particular time from now, is failure going to happen or not? Nothing else. Okay. I've already told you that RUL, etc. You are lucky if you have that information. Otherwise, you might have to simulate or do your own other best practices or something like that. Right. So moving on. Modeling. Of course, a lot of things has to be done before we model it up. data pre-processing etc but one thing that we can surely do now is creating condition indicator features right now if you just look at a common time series like let's say we have bfs7 we look at the time series plot of this particular this thing does it convey any kind of pattern to you of course One thing that you can do is you can apply autoencoders on this and figure out where the anomalies are or any kind of unsupervised learning technique. Figure out which of these data points like this particular peak at around 30th May or something between 29th and 30th May. This particular peak looks a very, very abnormal value. So maybe run this kind of algorithm, exclude all the data points which are anomalous and then proceed. now one of the better practices followed in machine learning for predictive maintenance is create four statistical condition monitoring features for every single time series feature because it is believed that a normal time series just like this one is not able to convey that much information so what we can do is we can create four different kind of data points four different kind of features for this particular sensor one is it's called a moving average feature so what it is doing is it picking let's say 10 10 rows 10 rows of data calculating the average of it taking that data point the next 10 rows of data taking the average of it that is the data next 10 rows average that is the data that's called the moving average kind of data similarly you can create moving standard deviations first 10 standard deviation then second 10 standard deviation third 10 standard deviation similarly you can create rolling skewness matrix you can create rolling kurtosis etc the benefit of that is if you replace a sensor data with condition statistical condition conveying features like these ones they would be able to tell way more information as compared to the normal data because what is anomaly anomaly is a is a data point lying high standard deviations away so if your 10 rows any 10 rows are having a huge standard deviation That means your data is fluctuating a lot in that particular time domain. And machine will, the data will be highly amplified in terms of standard deviation around that timestamp. Your normal time series will not tell you anything about the fluctuation. But if you explicitly create a standard deviation feature and then train a machine learning model, your machine learning model will understand that whenever there is a huge amount of standard deviation, whenever there is a huge amount of kurtosis of fluctuation in your data, that is the case where failures are most probably happening. And that is what you will observe as well. Wherever there is a failure label, in the close proximity of that failure label, you will observe a lot of huge standard deviation features. So these are the way to create that in pandas is like this. This is the rolling average feature. And then this is a rolling standard deviation feature. You can see standard deviation is spiking out at a lot of points. So that's not a good sign. You need to manually check what's happening and then so on and so forth, whatever averaging you want to do. So replace, like add these two features in your data set. And for every sensor, create these kind of features. remove these normal sensors and then train a machine learning model and then observe the performance right so such statistical features called condition indicators can can be made for every sensor as they help extract more statistically significant information as compared to you know the significant here is not that significant but it captures it extracts much more value from the sensor data as compared to the normal sensor data and i have personally observed this Okay, so these are the four statistical features that we can have. So once all those features are created, and the data processing has been done and something like that, then we can go with a classification model. Okay, binary classification. And just a few words of caution, because the classification is based on RUL based labels, it automatically becomes a forecasting classifier, right? Because label one means that 30 cycles are remaining right or less than 30 cycles are remaining so it automatically if your machine is machine learning model is predicting the failure class it means you have at least 30 cycles or less than 30 cycles remaining for the next failure so it's like a forecasting classify you're trying to generate if you had used the label 2 for this in that case also you would have three kind of labels that the model can generate okay so it's automatically every label every prediction every classification is not talking about that particular timestamp but every classification is talking about failure about to happen rather than failure has happened so i like to use the term forecasting classifier for that right and yeah this is for all those people who are trying it out for the first time uh note that 60 accuracy is great performance in this case i mean go and try your luck for 99 or stuff like that but 60 is okay i mean If you are saying that to the field people, you are telling them that, sir, some failure can happen and I am 60% sure of that, they would be more than happy to check on their end, right? But make sure you do not be happy with 30% metrics, right? Also, make sure because there's a huge class imbalance, accuracy is not a good metric. Because even if your model has learned just to predict normal scenarios, your accuracy will still be 99%. but your model is not capable enough to predict a failure right so accuracy should not be the metric if you are if you are not using uh the the sampling balancing techniques in that case accuracy should never be used make sure you're using the right metric accuracy or precision or recall or whatever or f1 score i remember there was a there was a meeting we were having with the site people and they were like i don't care whether your failures are correct or not just make sure that whatever failures you are suggesting based on your machine learning model does not cause unnecessary down times. Okay, so make sure you realize you assess the situation being asked for. So if your situation if your current project is asking for and if it's an aircraft kind of problem like this one again going back to the favorite image of mine if this is the case in that case additional alarms are okay. Because failure is very, very disastrous. But if it's a field site where human life is not in danger and money is the major criteria, the company is going through economic crisis or something like that. In that case, you might want to be very cautious in the number of alarms your model is generating. So in that case, your classification model has to go through a lot of scrutiny. You have to check a lot of scenarios before you suggest something. So I think that's the entire story I wanted to tell. Like I said, my aim here was not to just dive deep into, you know, deep learning or something like that but i thought it would be way better to to to generate a to generate a mindset in the induce a mindset in the audience and yeah i'm i'm i'm more than looking forward to the remaining few minutes that we have for discussions because that's what i'm here for so mudit i think uh i'll stop here maybe discuss something or ask like maybe learn something from the audience yeah um thanks a lot dival for for that really elaborate presentation and drawing those instances from your work also i see a few comments one of them being a highlight from jayesh saying big fan sir so but yeah it's being said first of all i wanted to understand one thing from you like you mentioned that typically it is understandable that there will be a lot of data points that you may have beforehand but in practice are you are our companies able to predict these errors well beforehand because technically you know If it is getting delayed or something like that, then it can have a lot of issues. And if it is something which is just before the time being, then it is something which may not be implementable enough. So how is it in practice? And apart from that, another question that I saw is on the challenges that you typically face in implementation. But yeah, it was on a similar lines. Yeah. So what I understand from your question is like you're asking about how do you tackle the... the proximity from the failure right yes correct yeah yeah that's a bit yeah so yeah yeah so so basically yeah in fact i i did uh like have this kind of discussion in the mining project that we did so you can you can tackle it two ways either you can develop a time series kind of project where you are trying to forecast the most representative sensor which is very closely knit to the failure So if for example temperature is the most loud feature that can speak about failure with a huge amount of confidence, how about you generate a LSTM model that can forecast temperature into the future based on other inputs and then if your forecast is saying that in two hours from now the temperature spike is supposed to happen, then you still have two hours. So that's one way where you forecast something. that's that's something that is very common lsdms are used a lot in in these projects but the thing that uh most people industries do not want black black box models so what they want is much more understandability so you can tackle it from the labeling aspect as well like i told you if failure is happening at 5 pm and you want to be very very safe you can label so 5 pm will have the label one normally when you receive the data 5 pm timestamp will have a label of one Every other data point around that will have a label of zero. As per the normal data, even 4.59 PM is a normal data point. But we cannot afford that because if your live stream comes in and you say that 4.59 PM kind of scenario happens and you are happy and just one minute, boom, you cannot expect that, right? That's bad. So what we do in that case is we label every data point from 5 PM to 3 PM, as safe as you want to be. as one in that case now if you train a classification model your classification model calls the failure not at 5 pm not only at 5 pm but every data point from 3 pm to 5 pm is a failure for for the model so that of course will reduce the accuracy as as far you go from the actual failure the accuracy also drops the the pardon me for using the word accuracy but the performance drops but i hope i answer the question Yeah, I have one question from Alan. He's asking like, towards the end, you mentioned about probably he's drawing the context from the meeting that you had with Inchelle with the site team where they mentioned about unnecessary downtimes. So he's asking on the unnecessary downtimes part, is there a must have use of explainability in models? Absolutely. Absolutely. I mean, I think we've had discussions where the few people they want to know if you are suggesting a failure. They also want to know the RCA, root cause analysis kind of thing. You cannot just be like, yeah, my deep learning model with 10,000 neurons is suggesting that there is a failure going to happen. But they also want to know which sensor suggests that a failure is going to happen. So either you do that based on statistical inference, or you do that based on, let's say, a regression model with coefficients talking about it, or maybe a random forest model with feature importances, or something must be there with... as an evidence for your suggestion of failure. So that if you are going to the field people with an alarm, you must have evidences and a lot of industrial platforms that work on pam actively actually have that evidence kind of bar at the side at the side uh which talks about this this particular section of their platform will talk about failure and this particular section will talk about which sensors are talking about the failure so yeah that's the approach followed okay um there is one question from rahul um which i mean you can uh you know We might have probably covered it briefly but you can look into it once. How can we determine anomaly in future? How can we determine anomaly in the future? Okay he has probably added this part just now. How can we predict the anomaly in future and what can be reason or parameter is dependent? Yeah so I think about the future like I mean history so for example let's talk about auto encoders right so there are algorithms which which are very smart in a way that they capture how a normal trend looks like right so there are algorithms like these talking about auto encoders again what they do is they they focus on understanding or understanding the pattern of normal event and whatever data point like you said in the future comes in the the the anomaly detector model what it will do is it will first predict what the normal prediction would be in that in that case and what the actual data point is in that case and the deviation from the expected normal prediction and the current data point if that is beyond a particular threshold allowable threshold then that's a that's a problematic anomaly otherwise it's it's okay so that's how how we do that in on live streaming data And again, if you want to generate a forecast, of course, not that reliable, but the forecast can be generated and then you can look at the normal trend expected compared to the forecast generated. That can also tell you that, hey, 10 minutes from now, the forecast will deviate off the normal trend and that's a normalist behavior. So a lot of approaches. I mean, you can find your own as well. So that's all. Okay. Okay. Thanks. I have one question that I received beforehand on Discord. So Shell is one of those companies which uses AI in PEM and in the manufacturing oil and gas sector primarily. And a lot of people want to join the company. So do you have any recommendations for somebody who's trying to get into Shell as a data science researcher? Yeah, in Shell, I think the focus is more on the engineering aspect. I mean, this is my understanding and this should not be the conclusion. But... Shell is a company which kind of focuses, at least our team, we focus on the engineering side of things. And data science is a tool that helps solve engineering side of things. So like you must have realized in my talk, I was using the words like torque and potential energy and kinetic energy and stuff like that. So your mindset should be developed, your focus should be on solving problems for the industry. and not on developing fancy models just for the sake of it and trying to think about the where the oil and energy industry is heading for example uh oil and energy oil and gas industry is heading and try to be be try to work on projects in those lines and that can uh kind of i think shell would like that uh what is what i think so that's my answer for that okay okay thanks um I see a follow-on question from Rahul only. Can you suggest live dashboard? For live dashboarding, can you suggest some software that you use? You can go for Dash. I think Plotly Dash is a good tool. If you want to just not use... I don't think... Streamlit is also a great tool. I've not used it for live streaming data, but Dash is very popular. And you can directly use the pre-built wireframes from Dash Gallery. and then replace their components with your components. And I think Dash is a great tool to learn. Dash is equally popular for Streamlit also? Yeah, Dash, like similar to Dash. Dash is, again, kind of different. But in Streamlit, Plotly plots are also possible to be created. Streamlit is the simplest, quick dashboarding tool closely knit with Python. So I've created a lot of Streamlit web apps. I've also created Dash. Dash requires a bit more coding as compared to Stringlet. Okay. Okay. Yeah, there's a live session and courses on Plotly Dash by Plotly Educators. So if you want to check that out, you can, you know, check out their documentation and DeFi also. It's there. Yeah. Yeah. So that was a shameless plug for me towards the end. I think we are done with the question. I see a good evening from Mohan sir. He has been an active community member. So thank you for being here. Now, this is where we probably wrap this session. I would, again, thank you, Divyanshu, for being here. It was a pleasure hosting you. And we look forward to having more such engagements with you in the future. If you have the time, of course. Always. Sure. Thank you, everyone. We'll see you next Saturday. Till then, please feel free to explore more about data science, have interactions with people in the community and continue learning. We are also starting bootcamps next Friday, which is on 11th, I believe. So you can go to defi.tech backslash bootcamps to check them out. These are mini bootcamps will run for 10 days. So they are not extremely, they don't cover a lot of breadth or, you know, you don't get to learn a ton of things. but the idea is that you will cover something substantial within those 10 days it will be intensive enough but you will get something out of it so if you want to check that out you can go ahead and do it thank you everyone and we'll see you soon