MLOps Workflow with Kubeflow Lecture

hey everybody I am from and I'm a developer Advocate at Cisco and today I would like to go through and talk to you about my mlops workflow with kubeflow so this is my GitHub repository where I put everything there and this is basically the whole uh pipeline there which I want to go through with you so you can deploy coupon any kubernetes cluster of your choice I will show you in a bit like what are what kinds are supported but uh usually it should be fine uh then um I used up the notebook instance in order to create uh the machine learning code and so on then I create like a kubeflow pipelines components so basically the goal is getting the data putting everything um interestingly into the kubeflow pipeline components there and then finally we will serve the final model the pre-trained model with case surf with a k-serv instance and finally we can do the recognize or the program is basically recognizing the digits here um yeah all right so I think uh let's go so like along the way I will show you like Min IO like what kind of data storage and model I used but uh yeah basically it should be a good start for you you know to get started with kubeflow get started with um the pipelines with notebooks and so on and we have a simple demo so sorry like I wasn't talking about it so it's like a simple digit recognized like recognizing the digits um uh because uh I think you know it like this is the classic um that it was recognizing the digits image data set so um we have the status sets it's like the hello world one-on-one data set where you have like all these digits there and then we will basically say hey okay what's the digit like like our case surf um model inference server will basically you can put an image in there or like uh send the image to to the to the server and back comes uh the the what the probability and the accuracy basically on of the probability like what kind of Digit it is um yeah I think let's get started so like I have put here like all the um uh kubernetes the cluster like all the information there so first of all like if you go to kubeflow like uh kubeflow website you can see like all the information there um and then again here's the com the information so getting started is like installing Kube flow so these are the maintainers there or like these are the distributions or you can like what I did is basically like install the coupon manifest manually there um yeah all right like with that being said um you should be you should see them like with Cube control all uh these um specific pots which are running up there um so let me uh just go here open a new tab uh and you can see here like all the namespaces which are running um yeah like these kinds of things it should be like completely normal for you for you to uh to see let me just close this um all right then uh let's move forward and then access the cube flow Central dashboard so like if everything is running as you can see here like uh everything is in in good shape and ready and running then uh you like depending on on your setting of course um uh you but what I did is basically you can do a cube control port forward like it's a hack basically like that you can access the dashboard so again I open my item here I do the port forward and there you see so right now if I go now to my local host and then 8080 you should see hey kubeflow so um I was logged in already so basically I have a log out like you can see the login and then it's going in again and you can see here like this is the cube flow dashboard um right now just for you to know like it's this is kubeflow version 1.5.1 I think like 1.5 for sure um and here you can see like all um this is like wall what is what they put in there basically um cool then let's get further like right line right now it's like uh there is uh let's go to number three so is about set up the YouTube notebooks um so as you know so maybe like Jupiter notebooks are there um and actually I can show you in a bit like um uh to basically interact you can code python in there you can have visualize visualize the python uh in the graphs there and you can work together with your team the thing with uh kubeflow however is the following that you need to allow like because this is security measurement so like kubeflow is using deck decks Auto identification um and they have of course some security measurements there you need to allow access uh to kubeflow pipelines from you to in a notebook so what does it mean it's like Jupiter pipelines like is this is a component and in order to gain access to this component from the Jupiter notebook instance you need uh to uh basically say hey I would like to allow access this is all written as well in the documentation but what you need to do is a simple run this command and um and basically apply this yaml file and this yaml file you can see actually uh here in the in the GitHub repository oh no this is the wrong one uh this is in the kubeflow configs and then access Q4 pipelines there so basically you should just write in there here this is your namespace and this is usually like if you are following the same like the the did you do it manually install it you can go to manage contributors and you can see here this is basically the namespace cubeflow minus user minus example.com this is the name space there um yeah so like if you did this so like you just need to edit this one here uh and then um this is basically it so here we basically say hey um you are allowed or like this instance is allowed uh to in from this name is allowed to access the machine learning pipelines there so I did this already but basically I can uh apply it uh again let me just go back here uh to the number three here so you can see here like if a notebook instance so um when I um go here to documents this is my GitHub stuff and so on so these are the examples so these are the coupon conflicts there and then when you just click there uh then this would be the same uh you just edit there let's go like this you just type in here like I said like uh kubeflow user example.com and then uh like I won't do this now but like you can then just uh execute this command uh and then it will work um this is now needed like you you just executed uh this this command and then it should be it should be fine right like if you're like with kubeflow when this is one also one important point you know when you use kubeflow you actually should know about kubernetes like you should know about Cube control you should know about like how to apply this yaml file like how EML works there like I'm sorry like I put this as a prerequisite but once you use kubeflow um you need at least someone like if you are a data scientist if you have machine learning and an engineer and you have uh you have someone uh who is who knows kubernetes definitely grab this guy and uh or girl of course and then uh and uh that they will show you okay how it works uh because uh you definitely need of course some some free knowledge there um so yeah so like now let's go to the spinning up a new notebooks instance is pretty easy and pretty um straightforward so we go to the notebooks instance so I have already one but like I will show you so this is like okay let's say tests you can check in this is the namespace um it comes actually with predefined images uh this will be downloaded from the AWS server so you have here uh the pi torch Cuda full tensorflow Cuda whatever scipy whatever you like so I because like in my example I will use tensorflow that's why I pulled them tens of tensor for full you can even like select the CPUs uh two two or two four like with the memories there if you're like if you have even gpus you can support it there and now is the most important part because we need to access the queue for pipelines so this is what we did with the Yammer file you have now here in the configuration uh like a tickle like uh allow access to tubeflow pipelines and this is what you need to click because now you have actually or this container does have now uh the uh the privilege to talk to the kubeflow uh pipeline um uh to the group of pipelines basically SDK like the SDK is talking to the kubeflow pipelines component in in kubeflow uh and yeah and then you launch it so then it will go up like it takes some time of course in order to uh to to launch it but you can see here hey um everything is is running smoothly um yeah cool like all right like with that being said you can now like oh it's done already no this was fast yeah I pulled already the the image there which is uh local cache so once this is done once this is pulled the cool thing what you can do is basically you just connect here like you click on connect and uh it will open a Jupiter lab instance so here we have a Jupiter lab instance where um maybe you're familiar with it you can create a new notebook there which is like here this is um uh for example we can uh hello world like you can print here uh the the python code so if you click on shift and enter it will execute hello world like this it will execute this is interactive python basically you can also say hey this is a markdown one and then say hi or like this is a um a captioner heading this is also this works also what you can also do is the following you can open a terminal so you can directly have your terminal and one very important part here is as well and this is coming in the next um in the next uh in the next docs as well is you need to check with kubeflow especially in this instance hey with Pip list are my python uh libraries or my packages are they up to date why am I saying this because it happens like in 1.5 at least that you have like not the latest python packages there so here with kfp and pipeline server like I had 1.6 but you need at least for the new pipelines 1.8 and then K surface well like this is 1.8 um a python package is there so you need to definitely update them so how do you do this do you just like say pip install and then put in here K surf or like all the others uh the other stuff as well so this is like a python 101 there and this is as well like if you scroll down um I had it here like updated piping packages so these this should be listed here yeah like access to Jupiter notebook so like what you can do so I put them here if I scroll up here again these are the latest these are the the the items already so these are the final iPad notebooks there which work I tested them um so you will feel a digit recognizer notebook uh and the pipeline there so what is the difference I will open it uh for you so even if you're the following so we go to Jupiter lab I go here to The Notebook so the notebook is like the data exploratory mode so here right now imagine you're the data scientist or there's a data scientist there and you explore basically data so we are importing here number appenders tensorflow like all the python packages there and then like that's the cool thing or maybe I did an easy one but it's it's already there you know so there is actually Keras the the like characters that the the python library has already a function uh where or method where you can basically say hey Lodi and missed data set so basically um everything is nicely already uh put into train or like separate into train and test data set with nicely extreme uh White train and x-test and a white test and yeah I mean basically it's a cool thing you know because you you get the data there of course you can pull it from an existing data set there you can pull it like you can do more data exploration but for now I just said this is the most important part is is the kubeflow component or the coupon process this is why I said okay let's leave it like this um so then exploring the data so this is like really classic data science work where you say hey I explore the data what is the the shape of the data so we have like 60 000 examples uh in the train set and ten thousand examples in the test set um the trained the images you can see here is 28 times 28 pixels and if we are actually seeing it here with the visualization so we just visualize it you can see it hey uh this is 28 pixels like from 0 to 27 and here hey correct number is five like here you can see it's like this is the correct number with the correct number five then there is also like what you need to do then of course like to build the model and everything um to uh the data preparing so what we do here now is basically we say hey we are actually um removing like uh with 2821 uh we are using just the channel one uh so we are reshaping it we are reshaping it with with uh with grayscale so we don't need any colors or anything if there was there um and uh and yeah we can do this everything like with the with the Keras API uh then to normalize the data so I won't go super into this one but like to normalize the data to say like we know uh the data is zero to 200 uh 255 um as a value so basically the uh how how great it is and how how um uh here you can see like with with the five like some are super black some are white some are grayish so this is this is the value of each pixel and of course this is maybe too large for for processing um the the machine learning model and so we want to trim down the data or like they we want to trim down the values there and so we just divide them the all the uh all the values we divide by 255 so we get actually values from like zero to one so this is we basically scale scale down the data and this is mostly because um yeah because like it's it's better or like it's faster to create that and to to train the model there um yeah so now we now we have to this the shape here so we have 60 UH 60 000 uh in the train sets with uh 28 28 and one and also the X shape is 10 000 with 2828 one um yeah like this looks good from this perspective like we we did it and now we actually can go to the model building and like I said if you want to do like a more Deep dive on this model so I just go over through over this very kind of quickly uh like there are tons of other videos out there uh which are explaining this in more detail um yeah so for the model building I really I just put a simple um sequential layer there so the this input shape 28 28 1 uh just uh reload activation then I do the max pool then these uh it's like yeah like I did it is three times and then I flatten it and then I put uh the dense layer there um and then you can do it whatever you like and you can of course like trim it or like make it in more detail however you like but more importantly is like the last part is that there is actually that you have like a dense layer out like with 10 options or like with with the output of 10 which means like from zero to nine so we know what what number it is Activation softmax so you can see okay what is the uh the you have like a multiple number of outputs there and these are like the the 10 classes there and yeah this is the model summary so pretty straightforward what cameras is usually doing like this model summary there um and then we compile the model with classic like in the end it's like I put uh nothing special in there um and we've lost Parts category cross and and cross and entropy um this is still wrong we want to we want to have a binary outcome now we want to have basically a categorical outcome category because I will change this um and then we fit the model so like we say hey let's let's train the model you can change the epoch's time of course there as well so I just put now one Epoch time still get an accuracy of 93 but yeah with this data set and so on I think it's uh it's uh it's definitely um good and fine um all right um so with that being said so like there is also model evaluation so like modest builds pre-trained model is there um and now um what do we do so um here now you we save the models we save the models to models detect digits in this in this folder there uh this is um like simple safe model function there so we don't save it in a um in a in in the file so we save it in in a folder this is also uh important to know like this is the newer uh version or like a newer um option on how you can save the tensorflow model there um and then um this will come later on because this is already involves Min IO so here you can this is like just here for for testing as well uh you can basically save this model to your Min i o instance so mean I always uh block storage like a block storage solution uh running uh or already like it was already installed and comes with kubeflow so this is why I already used it um and uh and yeah it's uploading it so here like uh this is this is the um like you can also import the the or like export the model in the H5 uh format like this is the file format but um why don't we do this uh here with H5 because um the the when you go to ksurf and case of the inference server it does not support the H5 inference um options there so like it needs actually the output of save model and it's safe model to a folder so it needs actually this newer optional new version on how you save the model and this H5 is basically the old one um yeah finally like uh we we tested uh we evaluate here um and see hey and and create out of it the the confusion Matrix to see accuracies 97 so yeah it's this um I think this is also from another uh test site that this output here but uh yeah but you you will run it anyways uh on on on on your server uh and yeah this is the predicted outcome so it looks pretty neat I would say so with that it's like we have some swords like here like the zero and and the six like some of them are not that that good um or not not a good recognize but all in all I think we have a decent good amount or like a good amount of accuracy here all right so just like over a few okay that like we did this so if you're not that familiar with uh with um with deep learning here in a way we explore the data we do some preparing some splitting we do some model building there uh then um we do some evaluation to see like okay how is our model performing there um and also uh saving the model to uh like the our like uh data format or format of our choice there or what the framework in in our case tensorflow is giving us um yeah so like with that being said I think this is a good I will just go over now to the um to the other part uh which is now here um that like we access now the Jupiter notebooks you can clone it uh this is what we also one thing so if you don't know how to um basically get the the data so you can just clone uh This and like I had it already in here um but you can um simply like git clone uh directly into um into into the into the Container directly can copy all the files there so you have the IPython notebooks there uh in here cool um so uh let's go to our next step and this is set up the Min io4 object storage so um I will go actually go up more again right so imagine this you have the Jupiter notebook you work with it you have some output there and so on and you need to store somewhere like the model for example what we created like you need to store it somewhere and a classic example I know a good choice is definitely Min i o where you have like object storage there and um you can also this is why me now is also good you need to with kubeflow pipeline so we didn't create a pipeline yet this was just the notebook code you need basically uh also here access to mini IO storage because you can't access the Jupiter notebook there from a few Pipelines so you need to have some single source of Truth there and this is why uh Min IO comes into place so that's the cool fire part is that um there is already a mini IO tenant in kubeflow um so basically like why is it like this it's like uh they use it already internally and you can already use it you can for sure like take your own uh um like I said like Amino IO you can you can set up your own but I want to make it you know as smooth as possible so like I didn't do it uh so yeah like I used uh this is why I use basically the um uh the the internal one so how to obtain the access key in secret query for Min IO so um first of all like you can see that there is already in Cube flow like in the name space Cube flow there is a service with Min IO service on Port 9000. so again I go to my item open new tab I forward it to new uh to the 9000. so if I go to localhost again and I go to 9000 I actually go to Min i o so this is actually the Min i o from uh from kubeflow and what do I do here I actually need to get the excess skin secret key from there and this is here with this commands so when you run this again another item tab you run it and you can see here hey this is min IO um and then the um the secret key for that let's uh check this one out is min io123 so this is basically a Min IO I mean I want to three so Min IO I mean I know one two three I saved it already and there you go there you have it right so you have here and maybe this is like from from my testing but this is basically hey you have the ml uh ml pipeline bucket there you can create a new path you you can create like um maybe in other instances there depending on on where you are right now but like you you see like you have like all all these uh like storage there available and what you um um basically uh see here as well like these are the default values so we are using now here Min IO as access key secret key and the ml pipeline uh to see here all the models are safe basically our our stuff there um cool like that that was it's basically like we will we will later on use it for the kubeflow pipelines but just to know like just to set it up all right we need to have some storage option there available um all right like now let's go to caser for ready actually like we are going in the last set of two uh to to the pipeline uh but this is now uh something to do with case serve so the thing is that um caser so I will just check it out in in Cube flow here so ksurf is about um model a model serving like model inference so to set up a server um and and to have it so the k-serve was previously kfserve like named KF serve and was part of one of the core Cube flow components so it was basically listed here however they now was moved to an external project there and now ksurf is an own extent it's an external add-on to uh to kubeflow and uh right now like in Cuba 1.5 you have like a bit of case surface KF serving and K surf now but I think in 1.6 day they really they made it everything clear okay this is case serve now and um and yeah so like basically it's it's this one where you say you have your cluster you have your kubernetes then they are using K native and istio so this is also important to to know okay need if it's uh is is comes from SK server and here basically you can do a model monitor like um say uh like um deploy your model your machine like model the trained model you can deploy it to an instance and then you have actually like an interface or an API where you can query it and say hey please predict this one you send it you send like the uh via a Json object and then you you bring it back for example and you bring back or you get as a response you get what kind of Digit it is um yeah so like with that being said this is uh you can see everything here like usually this uh you can see it in the in the um in the documentation and caseurf also has now an own website so there's also um this is also really really good there so I think this this one so you can see here for example um that yeah this is all the website there so administrator guy you can see okay how to install it basically everything and then if you go to user guy and to hear like you can see like here okay how does uh caser do it there's also like a Transformer service like if you need to have something transformed before it's going to the predictor service um I won't go into into that so we just use the predictor service um but uh but what I want to show you is here like single modem service and then you can see here like tensorflow like we will definitely use um tensorflow here um to basically create the inference service there and and yeah so like if you're interested I will come to this um um also in a bit so if we need the we need or like what you see here is basically you are creating with uh yaml like with with uh kubernetes and with a kubernetes um uh yaml file um you were creating here an inferior service like kind inference Service uh using ksurf then with the method emitted the data you say hey I will name this like this is the digit recognized example and then as a predictor you use tensorflow and the storage URI is this one so uh basically you need to put in here um the storage and we will put our object storage like our mini i o instance there where you have basically the um like the flower like here the the models there like in this case it's the flowers but in our case it will be the digit recognizer there but we won't use this I mean we can use it of course like I also put this in in on the GitHub but uh you can also do this with uh with python with the python SDK uh with the case of python SDK 2 create some inference service um cool uh but let's actually now like this was case of now like it's just an overview there uh and um the thing is that now we need to set up case surf again so what is it uh what is it about this so we need to apply here again this is a kubernetes related uh we need to give the Min IO secret to k-serv so actually that's that K surf knows about hey okay um I can access the Min IO the object storage there so um basically you can see it here like k-swift will copy the safe model into the newly created inference container and we need to allow this so if we go now to keep the conflicts and set mean i o secret yaml so this is again also in here so Cube flow config send me our secret so here again uh you need to check um so the namespace kubeflow user example.com is it the same the your namespace you need to check it the name can stay the same then I would uh like it's written here S3 endpoints because it's compatible with Min IO so this is why I put here S3 so this is the Min IO service Cube flow on Port 9000 uh we are not using https um and and any credentials and here is the access key and ID so what we like use for getting into Min IO um and then like this is one part of the of the yaml file so this is the secret what we what we uh what we create um and then here we are actually creating a service account so in this service account again in the namespace cube for user example.com here same namespace we are adding this uh secret Min IO case of Secret for basically that our like what the instance what we are um creating um like with uh Cube for example combo the inference service what we are creating that it can actually access uh the um uh the mean i o the the minio instance there all right so like once we've done this like I said so again uh I would uh I'm going back here and going down to setting up k-serve so basically again you just copy paste this here so I won't do it now like I did it already but you can just like uh copy paste it qfly apply and then it should be applied it doesn't do anything like it just said okay created or successfully so that's good and yeah so if you run into any troubleshooting issues like there are you on the website you can find here like good documentation there so usually what I I needed to do is basically um I couldn't fetch a Docker image so I needed to do some uh to to create like do something in in the in the configuration and put the registry scripting text attack resolving uh in index.io I need to put it from like here from the example directly under the data one uh so I needed to basically um uh create do it here and um you just need to copy paste this command and then work in Vim basically or like in in any edit of your choice and then uh create um yeah move it under the data one but everything should be written here so like maybe you don't have any any problems there all right so with that being said we are going now to the major part and this is to create a machine learning Pipeline with Cube for Pipelines so qflow pipelines is like the most used component of kubeflow so it's like this is why lots of people are basically using it and um it's very it's it's uh it allows you to create for every step um or function in your specific machine learning project a reusable containerized pipeline component and which can be changed together as a machine learning pipeline and this is like the cool part like uh because like um the you basically have it already there so this is in the digit recognizer pipeline iPad notebook um and we will go over this right now um if you're like unsure about different things because the documentation I I unfortunately have to say is not the best like you definitely need to figure out a lot um uh on but a bit learning by doing a bit on on the documentation side on off Cube flow and I think they have again moved it or so like okay there was in um let me just maybe it's a wrong link here yeah so here this is the link I will definitely change this so here is the I would say like I also pretty pretty um like good documentation uh where you can see okay how do you do the the functions how do you create the functions how do you deal with kubeflow pipelines uh and how everything works because I think this is very important for you to know it for me it was a bit um uh I would say tricky because of the documentation but this tutorial helped me and I hope my iPad notebook pipeline instant uh yeah will help you as well of course uh now let's go to Jupiter lab and now basically let's open uh the digital recognizer pipeline so what do we have here so I will go through with this with you so in the end what we have here is like we are important the kfp so the com um uh Coupes your pipelines uh python Library there and then we have actually a number of functions so here is like um for now it's fine that you that we're uh like inserting this but just be with me you are have the definition get data batch you have the definition or like the def or the function here get the latest data def reshape the data then def modeling with some outputs or like some some inputs there so with that but like okay and then uh we have here like this goes down down model serving this is like for model for for model serving and then that and here we stop right like here in the end uh we kind of say okay we stop here right now and this is all the functions there the functions off and you can see it like this that every function what I Define there is one component and this function is running in its own container in its own container which will create be created and now comes the next part the next part is these ones these things we are actually defining or like creating um the the uh the container with these um with these definitions here so we have here the component get batch data we see here from and this is the kfp so if you're saying like components is part of the kfp python python Library we create a component from this function so this is we are creating a component from um from from from a python script or from from python sorry from a python function um and then we say this function like get batch data from up there uh please uh this is the function what we want to um uh to create a component from and then this is the base image and I just used to to be honest I just use the same base image there what what is already in there so this say the same base image what we have there um you can use others like your own of course whatever there is so if again if you plan like to have a really cool machine learning pipeline there you can create your whole your own uh your own container image uh what fits you need basically and we do this for every function here so we say create a component for function get latest data reshape data model building model serving we're using the same base image and just with the model serving I yeah like case serve wasn't available like case of is not by default in this component in this in this um sorry how is it called in this in this container uh in this in this a Docker image or in this container image and that's why there is a really cool function and everything is written in docs um what kind of a package should be installed there and then I said hey please install like because we need it in the the case of python SDK uh we need this one so please install it before uh before running this Con this container image or before starting the container um this is again create component from function this is just a an option so this you can create a component from a function you can create a component from uh like uh from a yamifi you can create a component from uh from from another from from a different uh like uh with with a different like just with a base image setting up or running a base image where your code is already in there so there are lots of other options on how you can create your component I just used a very simple one to say hey uh this is this is one uh in this component model serving just run this code here and then uh and then create this um this case of instance so I will come to this later but this is like this is one component there but that's not all so that's not all one thing is is important there is as well is that um there is as well uh like uh the the basically a definition or like uh where you need to run uh the or Define the whole pipeline like now we Define the components but how can you define like what components should come afterward like how does the step look like and uh how does the pipeline look like like uh which component should run first which one second which one in parallel and so on and this is here we are actually defining our pipeline so we Define name digit recognizer pipeline we say yeah detected sheets and here is now the following uh the output error like the test the output which is now we say hey step one and one like this is in parallel I said it's like the get the data patch and get the latest data should be in parallel then the step two which comes after step one and two uh this is this component reshape data so basically here we are so we are saying hey we are have the component Bishop data then comes the step 3 which comes after the step two which is component model building and here we actually are giving uh some options there uh for the machine learning pipeline like for the for the um uh in in the kubeflow um um like UI and you you will see it like in a bit that you can Define here the number of epochs and the optimizer so that's uh that uh that you you can choose it um and which will be executed with it uh and then step four which comes after step three is the component model serving so this is where we create our case of instance and uh and yeah so this is um and this is it like uh of course like I put some some like here to just to uh say it here like what I did here is um you will feel the client like you connect the pipeline client there uh these are the arguments default arguments like number of epochs one Optimizer is Adam um and then I just put it here if we want to run it directly which means that um I want to run it with uh create run as also so per se I create the specific pipeline I want to run it immediately uh with the experiments experiment named test with this on this arguments or for example I can just compile it like uh to a yaml file like I can put the whole pipeline in the yaml file there and then basically do I upload a pipeline version so basically I say with this yaml file I upload it with this pipeline version name with the pipeline test with the description clust testing so this is in the end it really depends on what you want to do because like I will go back to the the dashboard so how it's usually done is that you have here and these are these things three things here you have experiments you can basically create an experiment which means that this is one machine learning experiments when it comes to like uh the um they're recognizing the digits and so on so you can create it here and you can see here like this is the test experience name name so you can see here they out puts there like was it the status was there a failure the duration here um um and so on like this is basically about uh the uh the experiment there and I will come to this uh in in a bit as well then you have the pipelines so these are where basically your pipelines are being stored so if I would would do the second choice like here would be the pipelines there and finally these are the runs so these are all the runs which are executed and of course you can archive ones and so on but basically these are all the runs depending on uh on on which pipeline there as well um cool like yeah we've uh with that being said um we can like uh execute this one so if we execute this one you will just see experiment details Run details so that that it runs so if I execute this here now everything experiment details Run details um will be fine and now let's actually check what what's uh what's happening so if I go to runs again let me just uh redo it you can see it started so it's running everything is good so your runtime is there and if I go to the experiments even I think like you can see here as well that everything is good and now the cool thing is the following you can see here that these are the outputs of model accuracy and model loss you can see here like in in the the specific parameters there you can see the start time the end time and then more importantly and let me go to a really nice pipeline there which is which which work is you can see here all like how the pipeline looks like and then as well like click on uh on this on the graphs and then see basically the input output and all the information on uh on the specific um on the specific component so for example like get the latest data you can see here we are adding the latest data and with latest data if we are going to the uh python to the Jupiter lab we can see here that um get batch data catalytics data this is okay this is just adding latest data this is a dummy function for showcasing I wanted to Showcase you know that uh that everything is in parallel then let's however go to get batch data and you can see here that we are actually um like saving uh and let's go first to the code so here is basically we get the latest data so so we get a data set and what do we do here we are saying data points training data points test data version We Want To Have and this is like part of the the python um uh how is it called like the python Library so you need to Define this you need to Define this here as a name Tuple this is what's written in the documentary this is how you how you need to Define it with the outputs there and here we run this this is why we need to import it uh again from here we need to run this or this code will be run in the container so this is why we can't uh import everything on the top so we are actually importing this one or this code will be running to container and this is why we need to import it from here so we import tensorflow and everything we are connecting to our minio client uh so here like this is the the IP address of the minio client and then uh we are we're saying hey um uh get this um um client f-get object so we are actually saying hey I would like to uh get the mnis data set uh from um uh from the uh from Min IO and save it temporarily here you can of course download it directly as we did with um as I did in the in the digital recognizer notebook uh however like I um for versions to keep up and so on I thought it might be good as well to keep it basically in in a Min i o bucket there um then in the end uh you save it uh you save all the extreme uh y train X test y test and so numpy files there uh you uh basically say hey this is the data set version you can you can Define it here um you print this the the extend the shape there um just you know for to see um and then this is the output collection with the named with the name taboo uh so what we do is in the end we are saving everything like uh from uh to to the uh to Min IO um so when we go there you can see here that we are seeing the data points test is like 10 000 data points during 60 000. this is the data set version what we defined uh and these are the logs these are the logs that we say hey okay we are we we deleted it uh sorry we we are actually getting the data this is the shape and this is yeah this is this is fine like you can ignore this uh um this this Arrow here there are some details uh like um if there are some logs there maybe then uh events machine learning data and so on I don't know why they now they're in unlocks okay let's see um anyways so then we are going to reshape the data so we are going to say hey this is a reshape the data so if we go to um to the logs here we again uh getting uh everything uh from Min IO so we are loading the test and the train data and then we say hey we would like to reshape it so we would like to reshape the data what we did um before in the in the notebook I python notebook uh so in the this is record as a notebook um uh notebook has to say um and then we save it back to Min IO so this is all what it does all what is what this step does so basically reshaping data looks good the details is there and so on it should be fine and the next step is model building so basically here you can have the input parameters and if you actually I want to run this here like in in your own with your own pipeline you can actually input these parameters directly uh in here so it should be here in the config uh that these are the ram parameters and you can actually if you want to run it directly in the dashboard you can also uh do it uh configure the input parameters there so number of ebooks is one optimizes atom and let's see what the output is so the output is it following so like we have here the UI metadata and the metrics so foreign the metadata and the metrics so you have here model accuracy and so on like these are the things what you actually see them in the experiment step so when we go here in the experiment step you see the model accuracy model loss this is why actually um these are the parameters which which can be seen there so if I go back to the Run let's see this one here model building uh and then yeah you you see like there's this number of epocrites one like maybe I should should have put three or so but yeah you can see okay the accuracy and everything and how how it is but then again it also offers visualization so you have here already like a confusion Matrix and you can put your different visualizations there um I have to say it was pretty tricky uh to basically put this confusion metrics together in a way because like the um the the documentation zone is not uh not the best or wasn't the best especially when it comes to these are the outputs it's the confusion metrics uh this is like category and so on predicted and with with Target and so on so um it was kind of uh like tricky to read through documentation also depending on on to see okay how how it uh how it actually how it actually works um like there are some examples like you have here put the confusion Matrix there which worked and then I also put in a markdown which means like I also put in here uh the the markdown files like you can see here like the model summary the model performance um and um accuracy and and the loss basically um and and yeah so this is this is all like usually you have here the locks usually there should be the locks there and then um and then yeah machine learning metadata is a different story however I didn't go into this one there but you can put visualization stands specific summaries uh in here which is actually pretty cool I would say um how does it how does it look like in the in the code so in the code it's not that different so you can see here that these are the input parameters also of course with the name Tuple there um and then again you if you're like a connector mean i o client and then I actually do everything I created the sequential model um show the model summary how it looks in a way uh then I compile the model there so basically hey um compile it and then uh yeah we need to get some data of course for the model fit um and uh then we are actually uh saving uh this model um to um so yeah sorry we at first we get the X test and the Y test of course uh and then we evaluate okay how how it's how it's done and so on uh then we predict it and this is also the code like how to generate the the confusion metrics there um and this is the basically the metadata so this is like the model summary of what you see so like I I did the format like I said okay how much is like the metric model the the sorry metric model summary model accuracy and loss so like I put this here and this is this is basically the metadata this is everything kubeflow pipelines related um in order to that it's visible basically here in the visualization uh and then finally now what I wanted to say is like now we actually like after this we save now the model from the temp folder to the detect digits and again we yeah like this is was already in the other code as well so we are um adding it and uploading it to Min IO and there is one very important part which took me some time to to realize is that um like the the tensorflow model inference what K surf is using the inference server they need to have like this version number so model C type digits and this is like slash one like in folder one like this is one for version one it needs to be versionized and this is why it's very important like to put here the one so that it actually if you go to Min i o browser and um I think no it wasn't pipelines it was in models detect digits so it's very important like this is like from an old tryout like you need to have the one there so it actually actually gets to the one and you need to point uh to uh you need to have the one there in order for for the inference server to take it otherwise it says we will get the arrow hey you didn't version nicely do there's no version available uh for uh for this pre-trained machine learning model there um yeah all right like in the end uh this was this is it so you you see this is kind of like the whole code so at first we do very similar to in the other notebook uh we do the the compiling of the model we get all the data from Min i o bucket there we do some evaluation we do the predict with the test data set um and then yeah we do some in a way I would say like with the with this one we create the confusion metrics there it's still I think the other way around and so on but like confusion matrix it's still like you I like in in a way like I think the documentation needs to be a bit better there but all in all like you can create uh something like that in the confusion Matrix there um all right like I think this was basically it and then yeah finally like I said yeah you you save this uh into uh the mean i o packets with and you need to version I versionize it um yeah once done uh we are going to model serving uh so we are creating the case of instance so what's uh what it's uh what's about this one here so when you go down we are actually create like again we need to install the case serve package uh on in this in the container and then uh we are creating here the uh the Imports then uh we say here okay get the default Target namespace and so on like this is um for from uh from the kubernetes client there um and then we are creating a an inference service there so basically with this python um with this uh python Library we are actually creating this specific um inference service there and here you can see the here the URL like the URL or URL URI where you can see okay where where is the model located and the model is located with S3 so like this is compatible with um like the S3 storage what you have in AWS is compatible with Min IO in this module there and this is like Min ml pipeline model City digits don't put the one in there like here it's fine not to versionize it and then now here it will grab the the latest version there and then yeah you say hey uh I will I would like to create this service uh connect the client of course and then create this service there um what I also did for you to try it if everything works because it can be tricky with this one here with k-serv um I also put in here uh and you go to the kubeflow configs there that you create a case of instance here so basically you have here the yaml file for the inference service so what you can do theoretically or like practically as well um you can copy paste it here and if you go to models and this is basically all the models all the the model servers what you have there and when you create a new model so you just copy and paste it here and then you can create a model server there which means you're created in a cube flow example there is something with istio so it's like the thing is that you need to have like with um um like a case surf or like case of is using K native and istio and I did everything like from like scratch and everything like as fast as possible and that's why in order to um get access uh to the uh to the inference Service uh from the inside of your cluster like you noted you can run it um I actually did the inject Falls um so that it's uh basically um um you know like uh accessible from inside the cluster from outside the cast it's also a different story um so you definitely need to check okay what is your like how does your cluster look like what is your URL and so on how is your lab environment uh so it's it's for now I think it's kind of uh tricky um to access the inference servers from the outside um especially when you have like a you know just a quick and 30 demo um but you can for sure like um set this up in the lab and I think they even are working on on specific like to have like a non-uh non-super secure uh stuff available um and yeah but was it like yeah the specs the predictor so you have here now the the accountant SI mini okay surf like that you're actually uh have access to the Min i o instance there like that case of as the menu instance and this is the storage uh URI um which which the model service existing um all right so the other one is not there yet so yeah the service may be a bit slow here so I will just go to hey digit recognizer like to the um uh which is already there and you can see here um this is like the external UI this is the internal UI so again it really depends uh on on your setting uh how you can access it um and this is actually like also a very um interesting or like that you are actually here uh this is the the API or like basically the the the the URL service where you need to send the the post to the like the or from the The Rest Post basically post command there so you have version one you have the model ticket digit recognize and predict what you are basically what you sent what you sent there and it is actually up up right so like uh you can see here um CPU limits like everything like tensorflow it's the predictor uh there are some logs like it's actually up right now like this is the yaml file for this but did you recognize up and running so if you send anything for prediction to here it actually should work and so there are two options now and I'm coming here and out to the uh to the last part already to the seven part which is uh test the model inference so you can test the model inference um so either there are two options now the simplest way is of course like use the the python script this is directly inside of the uh here in the in the GitHub folder or you can also um do it with the web app so I created here and you see it up there um like a simple digit recognizer web application where you draw with your mouse and you have this web application actually here uh in there so I just used here like a simple you know index HTML so I just use here a simple canvas and then some JavaScript code where basically I I say okay the user can draw the mouse then I will with the tensor for JavaScript um um a library I will reshape or like this tensor uh like this uh this image there this canvas there sorry uh to uh 28 281 as we have with the input there and then I I send it basically to the um to the predictor the localhost or like um wherever wherever you have uh with uh with with fetch uh and then you get the example output and here again like I said so um um definitely feel free here to um uh to to Define whatever you like basically so like here like I said Outlook is still needs to be formatted so basically um usually use you have here the lock will basically say hey this is the data but I didn't put any specific output there so this is really up to you how you would like to see the data there from the web application of course um yeah with that being said uh let's actually go to the more like uh uh easier way or faster way as well here again to Jupiter lab so I there is one k surf python test uh ipyton notebook so you have here um I actually uh import here like the K server and everything um the actual number of this of this array basically is five so you can see here this is again this is not the the scale down so you see here it's like from 0 to 20 uh 220 55 um the number so I I just basically I just took the random number so you see here like uh with the with the five here I just say hey print print me out the number um I'm sorry I just print me out the number there uh and then put it in into uh into array um and and here so uh basically I say hey with the cavesurf uh client I say hey give me this um uh give me the uh the inference inference Service uh definitely give me the the address the URL um then create uh and reshape uh the uh the tensor there and uh then um yeah we you need to do basically with the inference input it needs to be in a list so um I I um I basically put the tensor and then because you need to send it and this is also a bit tricky of course how you send it uh to to the service but you need to put here instances and then the tensor to the list to I like or like put on the on the tensor on the on the reshape tensor uh you need to um apply like to list uh the the method and and yeah finally you can actually um like request it and you can see here hey okay I get the response back and then you can see here with Arc Max their predictions there so if we run this now uh actually it works so you have here the number five and the predicted is the five now if it's the five there what I maybe want to show you is like what you see here is that um I already put on predictions they are Max formal so like let me let me show you this one so you are actually uh receiving back this one here which means that this is actually this is what what we defined so in the soft mix we have the output of um of 10 options or 10 10K uh 10 10 uh 10 cases like from zero to nine um and uh and you can see here like what is the uh the uh the uh the argument or what is the number that the accuracy for is this number a zero zero zero this is number one no two three four five and here on The Fifth Element there is actually the predicted number and this is why what this Arc Max does it actually says okay where is in which um in which on which um in which order or on what order is is the the the the largest or the the the the the the maximum um maximum number there of all of it and then you can see here okay it's definitely because everything is zero zero uh this is the largest number and this is on the fifth position in the which position there and this is then it is predicted as S5 all right so yeah like I said you can also try with the with the web application um but for now I think to wish the python I think we actually said that it's that it works and uh and this is it so I hope you enjoyed it so I hope you could see okay like how how everything works you can see here like how with the how kubeflow Works how the pipelines are are working and of course the models there and and yeah I hope you enjoyed it and yeah wish you all uh all the best see you soon bye

Transcript for:MLOps Workflow with Kubeflow Lecture

Transcript for:
MLOps Workflow with Kubeflow Lecture