welcome to inside Builder Channel large language model automators python experts and my dear friends mastering hugging phase model evaluation in detail walkthrough of measurement metric and comparison processes when you are doing ml Pipelines why we are discussing this particular evaluation process for that we need to first of all understand the steps that is required for training an NLP model if your camera you know watch my earlier uh you know videos in this particular playlist where I discuss the various tasks the models that is available in hugging phase as open source models you will have some idea why why you need to evaluate the evaluator model also if you are from the background of artificial intelligence and machine learning you must be already come across the concept of checking the accuracy checking the Precision in case of classification and checking the uh how uh you know best the model is fitting the real world data in case of regression models so all these kinds of uh and the the way to test the model is called as evaluation and in case of hugging phase there is a entire library that is uh design for evaluation and the library and the functions the methods inside the library is uh is designed in such a way that you can pull automation of those processes can be possible so that is the main intention of the evaluate Library also the most important point is as you can see here in case of the steps that you need to go through for MLP module training you will get start with the data you will load the pre-trained model instantiate the training model for training but then when you are going to instantiate it for training you need to have something to calculate the metrics because the machine learning model has to learn learn something and it has to learn with respect to the ground truth correct and for that process of learning the machine learning model has to continuously calculate the metrics also that is where the evaluate Library comes into picture once you create the trainer you have to you know include the metric also the metric instance is also inside the trainer object I will be entering into the trainer object in the next video so if you see let me go back to the browser for a moment if you see the discussion flow that I am taking right now so I have already introduced hugging face data sets I will be linking the linking this video along with this discussion also then now I am discussing the evaluation process so I have actually skipped the training part you might be wondering why did I skip the fine tuning and the training part because once you know the data set and once you know the metric actually training part is you know has a minimum amount of detail in fact the you'll be using a pre-trained model that is already trained by someone and you are going to you know add your own data set and once you add your own data set you need to you know what kind of metrics you have to use but the model itself can still be a black box if you are thinking okay you need to you know build the entire model from ground up also even then hugging phase provides various uh methods in the Transformer library that will be you know discussing in detail in the next video where you can use the knowledge that is you know created by many other uh top researchers and scientists at a click of a mouse you don't even need to do a click all you need to do is you know program it I mean use the python Primitives to learn about those models you will still need to learn about the Deep learning the Deep learning and neural network architectures a bit before you can really double at that level of you know model architecture fine tuning but for uh you know for starters uh the way that I am introducing will be more than the vision the most important point is for you guys to understand that there are three three different uh you know areas one is the data set that we use it for uh training the model and there is a matrix that is the evaluation process that we need to use for keeping the model uh up um how to say to train the model and then that is the training Loop itself the the training Loop that is taken care by the libraries like pytos trans uh tensorflow or Jacks all these internal these libraries are interfaced with the hugging phase Transformer library and most of the things are abstracted from us even after that even after abstraction we still need to learn a lot of stuff and that is the reason why I am you know following this kind of a different path because I when I started learning I was facing the challenge because if I directly enter into trying to learn about training I need to learn about metrics and I need to understand about data sets if I go and learn about data sets there are also the concept of metrics and metrics comes into picture so then I decided okay let us first learn data sets then learn the metrics and then go to training so in this way I felt that I can get a good overview and that's all I am sharing uh the information with you guys so I hope that you'll like this particular content do leave a like and also share it with others because this is a new technology that has been you know coming up with the data sets that is abundant in a hugging phase There is almost 33 000 Plus data sets and the data set spans uh you know across multiple languages and multiple media multiple domains and even multiple you know applications of research and development so hugging phase is one of the you know hugging face Hub is one of the center point of artificial intelligence development so do learn about hugging phase and get a get an account inside hacking phase and start working with that I have a I have a series of videos that I that I've made on hugging face and the various open source models around the large language models and a models with that said let me go back to the presentation do subscribe to my channel for further updates on similar kind of videos with that said let me go to the start okay what kind of challenges that uh that is solved by the evaluate Library so the library name in itself is called as evaluate and it starts by uh you know by the objective of uh evaluating a pipeline the machine learning pipeline itself so from the start of the data set and creating I mean making the model to learn and then comparing the Learned model with other models so all these three steps are taken care by the evaluate Library so metric is the module that helps to you know helps the machine learning model to learn and comparison is the module that helps to compare particular two different models because you you will know right with respect to the data set and the way you have you have architected the model the output of the model can be different or the performance of the model can be different and then finally we have measure measure helps to investigate the proficiency of the data set the data set that they are using also needs to be appropriate for the particular task that you're working upon next next story challenge that evaluate solves is it has separate modules inside the available library with its own description and features so if you are new to ml ml domain or artificial intelligence domain you must be already overwhelmed with the number of metrics the number of ways to check the model and and the definition which comes occur along with each of these you know metrics and comparison parameters what happens in case of hugging phases all these things are kind of offloaded to the library itself yes you still need to know what is precision what is F1 score what is accuracy Etc I am not saying that you shouldn't know but what I am trying to say is that if in case if you want to you know refer any of these descriptions or definitions all you need to do is look for the features and description of a particular metric or a comparer in that way you can get the details from the evaluate Library itself I'll be showing you the examples in couple of minutes Computing the metrics can be done either one by one batch wise so there are various ways of you know Computing the metrics also you don't need to do it manually and the way the metrics are computer will be shown to you as a standalone process but when it comes to training the machine language model the evaluator or the metric instance will be sent along with the trainer object so you will not be doing it manually the trainer module that I will be you know taking as a separate video will take are all these processes however you need to still understand how the metrics is getting computed so I'll be showing you a couple of examples so you'll you'll understand it in a matter of minutes finally there are very different evaluations each of these tasks can have for an example text classification will need accuracy and a fund and precision score while the segmentation model the vision segmentation model might require a completely different metric altogether and sometimes you might have to you know combine multiple metrics if you are using multi-modal models so okay what is multimodal so mode in case of artificial intelligence is the text mode or the vision mode I mean the picture mode or the audio mode or the video mode so all these are various modalities in which the artificial intelligence model can be trained and used and finally evaluator mod module is also an additional module that is on top of metric comparison and measure there is another module called evaluator that automates the model evaluation so we'll be taking a look at that also these are the challenges that the evaluate model solves right let us now move further so what are metrics and measurement when it comes to any any Improvement that you need to do in your life you need to have uh have a yardstick or something to measure then only you can improve right in the same way when it comes to machine learning models the model needs to know where it currently stands whether it can predict something whether it can classify a particular text whether it can generate a text in a particular format so it has to know where it stands and for that we need to use metrics for that and there are there are generic metrics that can be applied for different situations like accuracy and precision you can apply these things in multiple tasks but there are task specific metrics also like in case of sequence eval you can only use it for uh ner that is not the natural entity recognition task and there are data set specific metrics like blue so I'll be you know introducing couple of data such specific metrics also I'll be showing you how to get the details of these metrics that is available you know all these things are inside the evaluate Library only so you don't need to search for this but you have to know how to access them and that is the primary intention of this particular discussion once you are comfortable with that then when you are going to use it in your product or in your project it is going to be much simpler how to find the Matrix for the particular task so there are a couple of ways you can use the task page so if you go to hugging face up then you can use the task page let me go to the browser for a moment and let me show you what I mean so yeah here uh here are the here is the the quick tour of evaluate Library so you can take a look at this and if you go to the if you go to the yeah this one so you'll see there is something called as tasks so click on tasks and you'll see there are various tasks depending on the modality let me show you that particular you know discussion also since we are in the hugging phase up also I really urge you guys that you take a you know sign up for this particular of sign up for this particular hugging face up account so that you will get a better idea of what is going on just give me a moment so the detail related to the task and the models were already discussed in this particular two of these uh videos take a look at this you will get a deeper idea now let us go back to the presentation now the point I was trying to make is that you need to find the metrics that you can use for the particular tasks you can use the later boards you can use the tasks Pages you can use the data sets itself that provides you the necessary information apart from that you can create custom metrics also so at this moment I am just you know introducing what is possible and uh you know what kind of automation can be enabled in case of evaluator module you can use nine tasks for Automation and finally evaluator Library also has methods to visualize the data generated like creating radar maps and uh working on bar graphs Etc the point I wanted to make in this particular slide is that you have to know how to find the Matrix okay and this is a topic that is really vast and uh and that is not something that I am going to touch you might be wondering Kamal why do this because see what is happening is there is a huge amount of information out there on learning about machine learning machine learning processes artificial intelligence processes you will already must have come across various metrics and various processes you need to follow in order to create a model what I am trying to show you in this case in case of evaluate library is that you can make your life easier by using these uh The Matrix measurement comparers available inside the hanging face up and avoid creating your own you know scripts for doing the measurements I hope you understood the the reason behind you know introducing evaluate because it is going you don't need to actually you know type out the entire accuracy uh implement the accuracy process as a separate python function and then call the necessary model output and input all of these things are taken care only thing is that you need to initiate the particular particular instance and automatically the calculation happens that I'll be showing you how this works then you will understand why why this particular way of introduction is happening as I was telling when it comes to metrics there are almost 53 different metrics like these are the examples that I am showing here there are correlation metrics different types of correlation metrics there are sequence evaluation and there is meteor we have comparison metrics like McNamara these are comparisons between the models not just you know uh two parameters it's comparison between models but depending on how they uh how the how big the model is how fast they provide the inference how accurate they are and finally we have measurement that measures the data set like the label distribution the word count the word length the perplexity perplexity honest honesty and toxicity Etc most of the time the data sets are curated and uh also generated through various other means and these data sets might have certain biases might have some level of toxicity my toxicity in the sense of uh you know what kind of words and uh what kind of concepts are discussed inside the inside the Corpus Corpus is you know a set of Text data all these things can be measured this is the key point I wanted to you know discuss with you guys if you are going to think about uh you know measuring perplexity or measuring toxicity you have to think about various algorithms that you need to use for doing the measurement so if you have a text if you have to do it then you have to think about the algorithm right but all of these things have been abstracted from you from us and it has been automated also by using the hugging face evaluate Library this is you know the key point I want you guys to take away from this discussion uh most of the times we might not you know foray into natural language processing or foray into artificial intelligence mainly because of this barrier that you have to do everything that you have to learn how to uh do the write the algorithms you have to learn to implement all the concepts machine learning Concepts actually that's not the case once you understand how once you understand what is available in the in the open source libraries in the hugging phase up in in GitHub Etc you can actually use the use those information for Learning and then also you can use it for implementing in your projects yes you have to learn about the licenses that is attached with the data sets but when it comes to most of these functions or the algorithms they are open sourced even you can check before you Implement these things but most of the uh you know the functions that are available in hugging phase are open source models open source functions open source algorithms and open source models in fact the word count word length the tokenizing algorithms are actually explained in detail in hugging face up if you take a look at the documentation you will be able to find it out so open the the level of openness in huggingway service you know pretty good and we have to learn also we have to learn to use what is available for us that is the main you know intention again and again I am reiterating here okay now let us move forward when it comes to comes to all my videos I always talk about you know practicing and I I hope that you like the way of way of this particular discussion and if you do have some some feedback do leave a comment so that I will be able to improve myself do subscribe to my channel for further updates similar to hugging face up discussion that I am doing and also I will be you know posting more videos related to the library surrounding the large language models AI models so stay tuned for those videos also so do subscribe with that said now it's time for us to understand the evaluate Library by actually getting our hands dirty so how are we going to do that we are going to go to the collab notebook for that you will need the collab notebook itself the Jupiter notebook right that will be shared with you in in the GitHub repo that particular link will be shared to you in the YouTube description below so take a look at that this particular notebook will be shared with you so let us go to the browser for a moment and let me yeah this is The Notebook that is already be hosted inside the GitHub repo so you can you can take a look at that I am closing it and I'm also closing this so that my I will get some bandwidth now when you start the collab notebook you will have two uh install pip install all these three libraries times how much data sets and evaluate the reason for installing data sets and Transformers you will learn it at the end of this particular Jupiter notebook so stay tuned for that the library it's it's a straightforward you know installation process just import the evaluate library and after that we have to first and foremost use the load method so whenever you are going to load whenever you want any metric or comparison function or a measurement function all you need to use is load so you might be wondering how will hugging face up know whether I am trying to load a metric or a measurement if there is a name Clash so if that is the case then you can actually add the module type so in this case I am saying meteor is a metric module type so I am giving a additional function additional parameter here that I needed from the metric module and you can load it and you'll see that this particular loading process the loading of accuracy or loading of meteor pulls the script the build script for that particular metric or particular function so once you have done this the accuracy this particular python variable or this particular python instance is has the necessary functions and methods for doing the uh doing the necessary measurements so you will be seeing how to work with that in a couple of minutes so here I am showing you know couple of examples of loading various functions like mu TR I am loading the comparison function a exact match I'm loading the measurement function like label distribution Etc so I'm just showing you a couple of examples you can try other options also let me go back to the PDF for a moment I would request that you guys practice with various other metrics comparison comparators and measurement and measures here and see how the Google collab notebook responds the most important point is as you see here when I you know imported the meteor you saw additional Library the nltk data got installed so like this when you are going to pull other uh you know comparators or metrics you will have additional activity going on so you will have to learn that also it doesn't stop there so let me you know go forward if you want to check the list of uh you know functions so this is little bit complicated when when it comes to a hugging phase up because there are lots of uh related interrelated things here what when we say the list of evaluation modules they are talking about the list of evaluation metrics or the functions that is available inside a particular module so when I say module I mean measurement is a module comparison is a module and Metric is a module and when I say function accuracy is a function meteor is a function exact match is a function so in this slit is easier for us to understand so uh yeah I am hugging face Hub has their own reason for naming these these also as modules but for us for me I find that naming these as functions is much more logical because they have a function right they do a function you give some input to it it returns some output so they are functions and uh basically when you use evaluate.list evaluation modules and give the module type then you will get the list of functions that is available that is metric list has 53 as I was explaining let us go back so I was telling there will be 53 different metrics right so you can check that out in the browser also so I have checked that out in the jupyter notebook also so you see that I have done some done some coding for that I've just used the list evaluation modules for comparison metric and also for the measurement and you see the number of values a number of different functions by using the length uh length function in Python all these are you know basic python functions there's nothing new you can see the various attributes so this is where the point I was telling right uh it uh evaluate functions makes your life easier so all you need to do is initiate the initiate what is where is it initiate the python variable or the instance accuracy and then you can use the description you can use the various methods like a feature citation Etc and get the necessary information like for an example if I say accuracy.description this is an attribute you will find the definition as well as the formula for that and if you want to take a look at the features you will see how the input and output are available so predictions and references so all these things are you know inbuilt you don't you don't need to do anything and when you build your own metrics ensure that you create all these things also this is very important let me zoom in a bit I hope that it's uh it's sufficient anyway so the necessary notebook itself is available in the Google in the GitHub repo so you can just download it and refer it as this video is going on I I seriously urge you guys to do the practice in parallel so once you do the practice in parallel you will get up to speed really fast and then once you know once you have got some confidence then you can start using it for your particular application so that is the main intention of these kinds of introduction videos so now we saw how how the how the instantiation of uh the metrics or the comparators are done next we have to compute so when it comes to Computing the metric there are three different ways so you take you can use the compute method or you can use add and then compute or you can use add batch and compute when it comes to compute so you give all the values so you see that I am giving all the values that is available for me and I am Computing it okay there is no any batch so if this particular list is even 1000 or 10 000 I will push everything into this compute method and then I will run the accuracy dot compute and I will get the value outside like this if I want to do it uh you know one by one and I can do it using this this is just a just an example implementation you can do it the other ways also I just add the information so I add the prediction I add the references and then I compute it so understand the references is the ground truth and the prediction is what your model is giving and in real time when you are going to use these metrics you will be passing this uh where is the object so you'll be passing the accuracy or the meteor object directly into the trainer so like this particular object is there you will pass it directly to the trainer I'll be showing it in a couple of minutes and then you will not manually do the computation so the computation will be done by the trainer object only these things I'm you know showing it to you so that you guys can get confident with the process that is followed inside hiking face up and then we have batch process where you send it as batches and then computed all of the things are you know pretty uh you know same but this next step where you can combine multiple metrics like accuracy F1 Precision recall this is where hugging phase of evaluate Library shines you if you have to do all these things manually it's going to take a lot of code believe me and it is going to uh need you to create lot of functions uh you have to you know call those functions you have to attach those functions to various harnesses and then send it to your machine learning pipeline I mean it's it's going to be a mess to be honest with you guys and I have tried it and most of the times a lot of us actually give up at that situation because it's extremely challenging when you have to keep track of what is going on between multiple functions and multiple objects and in machine learning it's all about you know functions and objects and lot of instances so uh hugging face sub team and the evaluate team the Transformers Library team have done a great job in abstracting a lot of stuff from us so do make use of this as I was explaining all you need to do is you know call the combine and then you can use this metric acid is in the computation process so you see the output I am getting the accuracy F1 Precision recall everything is coming by just using the a compute here compute method this is what I was talking about the benefit of you know automation that is even this is not you know automation so what is going to come next is the peak of automation so now what happens is uh there is uh in in Transformers I will I hope that you guys have you know seen the earlier videos if not do take a look at the uh look at this uh the playlist that I shared it with you uh in my in my YouTube channel where I discuss about the tasks and the respective models so there I discuss about the Transformers pipeline and in the data sets library that I already that I already discussed in the earlier video that also will be linked here take a look at these two before you come to if you are unable to follow this video what happens in this particular cell is there is something called as evaluator uh module this particular sorry evaluator class this particular evaluator class can take three things it can take the model it can take the data it can take the metric and do all the evaluation for you automatically that is you are you can you know give the uh you can run the system sensory this is the level of you know automation that hugging face up is has created you'll see that I am importing this particular model and then I am sending the IMDb data set so if you are new to loading data set take a look at my the attached video that will help you out on these things I am only going to test load the test data set and then all I need to do is call the evaluator class on text classification task currently evaluator supports nine tasks only so especially NLP tasks and a couple of image and audio recognition tasks you can take a look at the documentation for that once you do that then you can get the results like this so you see what has happened is not just that it actually uses the total time that has taken it has showed you the samples per second the latency and a lot of information has been provided by using this task evaluator let us go back to the presentation so right now we have seen the entire process so we have seen how how evaluate solves various challenges that we face when it comes to measuring the machine learning models and using the measurement for training initial learning models now what you'll be doing is in the next video I'll be introducing the trainer module only the trainer model and it it will be pretty simple because the data set module is already discussed the evaluate model is already as you know has been discussed in this particular video and the trainer module is going to be pretty simple because in trainer not a lot of things can be tweaked because as a beginner if you're a beginner you might be tweaking the learning rate you might tweak the epoch you know basic things but the the overall architecture of a particular model like the number of neurons or the the type of neurons or the type of activation functions many of you may not even be aware of what is these things like what is activation function what is neurons to be honest with you guys yes you have to learn these things but if you are going to practice NLP you have to learn the significance of it you don't need to learn how to implement the neurons you don't need to learn how to implement the metrics right you need to know what these things are doing okay and what kind of problems always as I always say what kind of challenges it is solving once you know that implementation comes next and there are lots of smart people smart programmers and you know open source contributors who want to help us that is why they have created these libraries right and why would they want you guys to suffer Again by by making you to implement the neurons again there are you know excellent videos by Andres karpati Mr Andres karpati who explains how to implement the neural network from the scratch each and every line of the code is explained by him also that is also there are people like that also I am not denying that uh you shouldn't learn those stuff but what I'm saying is to come up to speed understand the basics understand how to come up with a product or an idea try to implement it at the as soon as possible and then move forward and as as you go forward you will get some more additional time you can learn these Concepts and then improve yourself in that way what will happen is you will have a good momentum of motivation and also along with the motivation we'll have good deal of progress too with all this said I hope that you like uh you understood the evaluate Library how this uh useful what are the metrics available what are the comparators and what are the measures available so I think you got over overview of all this stuff and do practice do leave a like for this particular video and share it with others subscribe to my channel for further updates as I told I'll be working on trainer module next from Transformers Library do up do subscribe for my channel and later I will be discussing further upon various different libraries like Haystack and also more information on you know vv8 Vector store I'll be I am planning for such videos also with that said I would like to you know come to the conclusion of this particular video with four words that is practice practice practice see you guys have a great time