Transcript for:
Emergence of Multi-Agent AI Innovations

hello Community I don't know if you noticed but a new class of AI has emerged now robots do stuff with AI and language is just an input look at those two robots we have now multi-agent that perform together and you might ask interesting but how can you train an AI complex with multiple agents here that include the robot task well you know at the beginning we had our large language model based on a Transformer autor regressive and we had three typical phases of the training we had the pre-training the fine tuning with here the supervised fine tuning and then an alignment phase with our reinforcement learning by human feedback later when we had the vision language model we switched to diffusion models with more advanced training techniques and after the diffusion model we had the diffusion Transformer and I have some video if you're interested in because we also move to video language model if you want and you know sar from openi in about 5 months I did a video where I showed you hey we can update the technology of Sora to make it even more competitive but time moved on and we are talking now in research about Vision language action models and now the latest research shows us that we move back again to the diffusion models as our main architecture and in the pre-training phase if you want we have now imitation learning with the component here of behavior cloning as a main pre-training instrument and for the fine-tuning phase now we have reinforcement learning based fine tuning and you might ask why because the complex city is Extreme you have Vision multiple cameras you have your language model and you have active interaction with the environment with the Robotics and if you think about one robotic arm this is nice but think about multiple so this imitation learning is now the very first attempt from the research Community where you say hey this is a type of learning where an agent learns to perform a task by mimicking an expert Behavior like a human guides now in a first demonstration the robotic arm to show exactly what to do and this coordinate PA is recorded so instead of learning from scratch and we have this with our large language model where you have to pre-training Google pre-t trins for a month and the latest llama three we pre-trained multiple multiple month half a year now instead of learning from from scratch our agents now our vision language action agents learn from real demonstration provided by an expert and the expert can be a human to choose the robotic structure what to do or you have a virtual eii or a huge Vision language action model that can do the complete environment simulation and the robotic simulation in one run so you understand we are now in the research Community we're not talking any about about Claud 3.5 sonid or will there be I don't know a GPT 5 we are much further advanced in the research and if you want to have a look I have here in my video where I explains the reinforcement learn based fine tuning as published by Princeton and UC see Barkley you see we're here into also some mathematical Concepts and I also showed you with the latest behavioral cloning we have now fast residual reinforcement learning and these algorithms were calculated here by MIT and Harvard to address here a different aspect of training robots for some real precise manipulation task so we are here at AI assembly where we have a non-controlled environment and the robot should perform I told you we have two obstacles we have to tackle with and this is this Precision that is not there and in this video here I showed you and my and how it found a way with a residual reinforcement learning based fine tuning on top of a frozen PC model they could establish now the Precision and accuracy that we need for very delicate operation Plus in my very last video I show you the second problem that we have have in imitation learning the domain shift problem there is now a solution with yeah particular predicted reward fine-tuning methodology that I introduced you in my very last video now you understand that here on those basis that we are working with we're working with mock of decision processes as our main instrument and of course we are working with anthropy regulated Mark of decision processes and I went through what is the definition of an mdp how do we have to regularize MTP what is here our equation what is here our reference policy that we want to optimize and then we said okay if this is the crucial question in reinforcement learning how to devise algorithms to solve here the optimization problems because those algorithms are later used as the fine-tuning algorithms of our new diffusion LS and we went to soft Q function and we have touched here about here our new toolbox that we have now in EI robotics but you know this is all nice and it is beautiful but the real challenge is if we have multi Ag and system like in my first image here on this video if you have four robotic arm for a medical operation you have multi-agent system and those agents have to be coordinated and I know what you would like to say yeah but we have Lang graph or L chain or L sequel or Lang whatever or if you want to go here in the trial and error automating phase hey a year ago a one and a half year ago we have DSP and just months ago I think Harvard did no Stanford did here an optimization of tax grad methodology that is even better than dpy but you know what no we want to go a step further we in AI research we are not try and error we apply science to the topics so we have now a multi-age Andi system where we have llms WMS W systems and they all work in our environment and there's a particular task that none single llm or VM can do but together the group of our EI intelligence they can perform perform the task so now you understand that we are working here on a deeper level here with a mock of decision process for all of those agents with all of their AI structure llm VM VA whatever and now the interesting part is how we do this how we train this now you might say well the easiest approach is here you can see every other eii system in your green environment that you you as an active agent or you simply react to the environment this is more or less the definition of an agent yeah so this is the simplest you are waiting looking around and then sometime when it's time for you to act you just react or B you as an active agent you're allowed to communicate with the other EI agents or maybe just the majority of those AI agent to inform yourself about their next action so you can share information with other EI system in this defined environment for particular task of course you know we have multi-agent reinforcement learning where a multiple agent use now reinforcement learning algorithm in a multi-agent setting considering action and rewards of the other agents but you know what much nicer is Game Theory and yes you guessed it in my next video we will apply Game Theory but at first we have to understand the mathematic I of Game Theory because we're not just looking at a simple form of Game Theory we're looking at Mark of game so this will be an extension of our mock of decision process entropy regulated end and end of a multi-agent scenario where our multiple agents will interact in the same environment at the same time in parallel and it cludes here some elements here as a classical Mark of decision process but with join action and reward for each agent and we will build a theory how we can integrate now Mark of Game Theory into our reinforcement learning or imitation learning phase now I think this is so important that starting today you will find a new playlist on my YouTube and this playlist is called imitation learning of agent Advanced orl and we do this for single agent imitation learning and multi-agent IM ation learning because this is really now a complete new branch of AI system we are not talking anymore about how do I fine-tune my 3.5 Sonet or what do I do with a mral large 2 I don't know what system now now those things do stuff and they just are passive in the background but they are actively approaching you in your environment so if you're interested in this topic my next video will be about the details of marov Game Theory it would be great to see you