Building Agentic AI: Challenges and Solutions

Hello and welcome everybody and thank you guys for joining us today for today's webinar. My name is Graham Steele, I'll be your host for today and I'm really excited for today's webinar and we're doing this with Toolhouse on overcoming the challenges of building a Gentic AI. Now to help with today's conversation, I'm joined by Daniele Bernardi, Toolhouse CEO, as well as now a regular on our program, Dan Lohman, a cloud engineer on our GrokCloud team. Daniele, Dan, thank you guys and welcome.

Thanks for having us. Thank you. Awesome. So Gentic workflows are all the rage these days. And I know many members of our community are tuning in right now to hear about how they can use Toolhouse and Grok to accelerate their development and deployment.

So I'm excited to dive in, but here's how today's going to work. I know you guys both have some prepared content and perhaps some demos to show the audience on using Toolhouse and Grok. which we'll dive into shortly, but we will also be taking questions that were submitted ahead of time by our community, as well as live from the audience. So if you're all tuning in today, if you have a question that you want us to hit on throughout the conversation relevant to today's topic, please submit it in the chat window and my team working behind the scenes will pass those over to us so we can address them accordingly. But before we dive into those questions, let's dive into the content we prepared.

Daniele, would you like to go first? Yeah, absolutely. I'm going to share my screen, obviously. So bear with me while I do that.

There we go. I can see my face, so I hope you can see that too. Yes. Hi, everyone.

My name is Daniele. Thanks, Graham. Thanks, Dan. I'm the CEO of Toolhouse.

Today, I want to obviously talk about agents and overcoming those challenges that kind of associated with it. First of all, real quick, if you look at the dictionary for the word agentic, this is taken from actual dictionary, that's the individual's ability to control their goals, actions and destiny. It comes from the word agency, which means the capacity to act or exert power.

In AI terms, what does it mean? It means AI that can perform actions. hopefully autonomously with little to no help.

How would that work, really? What is an action? Well, when you think about the architecture of agents, and I'm going to be very high level here, and then we're going to go back into the details as well with Dan. Think of an agent as a very well-equipped LLM with a defined prompt that can...

help this LLM be very specific about things it can do. When you prompt an agent this way, you'll basically equip the agent to come up with a plan. A plan for action, if you will.

This is actually a step-by-step set of things the LLM can do. So those turn into true, proper action steps the AI can take. This usually takes the form of two things. Tools, more on that later, and memory. The ability to retrieve previous conversations or previous details that the user has shared or previous generations.

All that obviously becomes the results that the user should be able to face. So we mentioned tools. What are exactly tools?

You might be familiar with the name, tool use, or function calling. That's the ability to interact with external client-side code to perform a wider variety of tasks. Very generic.

What does it mean in practice? What are examples of tools? We know notoriously LLMs can generate email content, but they cannot send it. So you can build a tool to send the email the LLM generates directly from the LLM itself.

Imagine you are... chatting with the model, the model drafts the email and then you can say, okay send this email to such and such. That can be achieved by building and running an external tool.

Get the current time. LLMs cannot sensoriously tell the time reliably, they will probably tell you that they won't have that capability, but you can feed a real-time clock into it and the LLM will be aware of current time. Remember something the user says, like we said, memory. This is the prime example of that.

Search the web, one of the most common scenarios for which you might want to retrieve fresh results from the internet. get the contents of a page. LMs are trained on data but some of the data may stay on the internet will be fresh or new or simply not aware at the time the knowledge cut off of the LM happened and so you might want to kind of corroborate that knowledge with fresher newer content and then obviously run the code that generates itself so similar to drafting an email and then sending it this is similar but with extra steps. You will need to have a sandbox environment that will run safely on an external surface. So when you generate that code, the code will then run and the execution will then be fed into the other one.

So it's kind of complex if you think of it. So let's think of how to use those tools together. Suppose you had a prompt like this, create a recipe for my favorite pizza.

it's the pizza I eat in Italy, remember? When you're done, send it to me via email."So this kind of implies that the LLM can corroborate its own knowledge with something that it cannot perform firsthand, like sending an email, obviously, or remembering things I said. In order to do that, we're probably going to plug three tools. A search tool to fetch favorite pizza recipes, memory to retrieve what was my preference of pizza. back when I first told the LLM, and then the ability to send the email. So when you need to build an agent, usually you need to build those tools as well. And so the way of doing that, and again I'm going to be very high level here, is going to, basically by building those tools, then you need to build a JSON definition schema to inform the LLM that those tools exist and can be used in a certain way. So basically, you need to prompt your own functions and each one of the arguments of that function you have on your local code base. Then you need to send that definition to the LLM and finally make that request. But there's more, obviously. All you have to do, the other things you have to do is to, every time the user sends a message, you need to process all this function call. So when a user sends a message like this, send an email to Daniele to congratulate him. then the LLM will respond with a message. If you fed the send email tool to the LLM, then the LLM might be willing to call this tool. And so you have to handle this call as it comes from the LLM, make sure that the LLM has passed the right parameters, and then produce a response that the LLM can understand so that it can generate the right output back to the user. This seems okay on the surface, but when it comes to code, it translates into uh large json descriptions and then code that you will have to maintain uh i took a screenshot of just like the section of uh the boilerplate code that i had to write for different projects um and uh this is a lot of code that basically you have to maintain and obviously keeps growing the more tools you have and we think no way well it's unnecessary evil there are obviously frameworks and things that may help, but the learning curve is steep. They're generally painful and opinionated. They all want to basically conform to their way to build those experiences, and they might not fit the architecture you have in mind. Most of them only work with select LMs, and they're usually the big private ones and the open-weight ones that are actually cheaper and faster. might not be as compatible and you still need to build and run your tools on your local machine so there's no really true advantage and that's why I actually built Toolhouse. Toolhouse is basically the first marketplace for AI function calling. Instead of writing and building these functions they're running in the cloud and you install them as if you would install apps on your phone. Basically think of an app store and all the like manual developer steps are taken away because we take care of the prompting, the execution, the actual building and testing of the function, and we optimize inputs and outputs for all the LLMs we support. And obviously, the LLMs that Grok supports in Grok Cloud are one of our favorites. So we optimize for discovery and speed. So you install tools like you install apps, like I mentioned. you don't have to write the j-fination definition schema and prompt things. We run those functions in the cloud. You can still run your tools locally if you have them, but obviously you can take advantage of our cloud if you want. You can also publish and monetize your tools. And they work across any LLM frameworks you are currently using. It's simply one of the best developer experiences in AI and we'll show you how. It looks like this. It's really an app store. You click install and in just one click you have a code interpreter, a web search or many other tools that we have. And the developer experience is really three lines of code. You instantiate the SDK, you feed the tools. that you previously installed into your completion call, and then you append the response from Toolhouse directly, because we take care of all the execution, even if you're streaming. You have debugging tools as well, so you can always see the results, the inputs and the outputs from each function that your ALM calls. And I'm going to put this just here. If you join today, as a token of appreciation for you listening here. you can get $150 worth of Toolhouse credits. Just go on join.toolhouse.ai or just simply go to our landing page, toolhouse.ai, and just follow the instructions there. That's pretty much it for a small introduction. I wanted to hand it over to Dan. I know Dan, you have built something with Toolhouse. So I'm actually very excited to see you in action with that. Great. Thanks, Daniele. That was awesome. Definitely the more that I learn and use tools and agents, it's just a really awesome application of LLMs. So I'll share my screen now and just go over a quick little demo that I have on how to use Toolhouse and Verac. While you pull that up, Dan, I just want to say, Daniele, that was probably one of the best high-level overviews of tool use. and the background behind it that I've seen to date. So thank you for that. Thank you. Definitely. Awesome. So yeah, thanks for having me. Real excited. When I played around with Toolhouse just last week, just getting familiar with it, was really excited about it for all the reasons that Daniele had outlined. Just really easy. And I think it's really, it's a great match, Toolhouse and Grok, because we're both focused on speed. Toolhouse. you know the speed and efficiency for developers building tools and obviously grok you know we hang our hat on inference speed so uh really you know we're both focused on on speed and lowering the time to create and execute these tools so i'll show you a quick demo here and this will be available on our grok api cookbook for any viewers here who do get access Toolhouse and have a Grok API key and want to play around for themselves. So we'll import packages first. Now, we're also going to use this pretty new model that we came out with at Grok. It's our tool use fine-tuned model. So it's been specifically fine-tuned from Meta's Lama 3 70B model. And there's an 8B version as well. And specifically for tool use and function calling. You know, we've noticed great improvement. I've heard great feedback and it has achieved state-of-the-art performance on the Berkeley Function Calling Leaderboard. So just another reason for Toolhouse and Grok to work really well together. We have this model that works great for tool calling and so we're going to use that here today. So first we're just going to get the tools that we're going to use here and so this get tools function just shows you all the tools that we have available. at Toolhouse. So we only have one. We just have the code interpreter function just to kind of keep it simple. But as Daniele mentioned, the tool is already right here. And it's really non-trivial. If I didn't have Toolhouse to write the function and the JSON schema, it's time consuming. And writing all of these descriptions and parameters myself manually takes a bit of time and quite a bit of tweaking as well. You know, the description when you're describing a tool, that's how the LLM knows which tool to call and how to input parameters. And so that can take a lot of trial and error to really get that right. So having this out of the box is really a game changer for tool use. So we have that, really it's pretty simple. We're gonna basically call a tool just like you'd normally would with Grok API. So we have our user message here. And pretty simple, we're going to ask the LLM to run this Python code. Just very simple, A equals this number, so print that division. So what we want the tool to do is to recognize this Python code and then execute it. So we have just a Jack completion response. Again, very similar to everything else that you would use Grok API for. But then for tools, we're going to just input the tools that we have. are already available from Toolhouse. And so we'll run that here. And as you can see, it's just taking a second there. But you can see the tools that are called. We have a code interpreter. It gave it this ID. And it identified successfully the code string that we want to execute. So again, that's really what's going on under the hood. It's identifying the tool that we want to use and then how to properly invoke that tool given the user prompt going into the llm so now we can execute the tool call um and so the way that you would do that with tool toolhouse is this run tools function uh so basically we're going out we're we're getting uh the llm response um and then inputting that back into the run tools and extending our messages from there. So you can see now on top of our user message, we now have an assistant message that just shows the tool call that was invoked and then the actual result of that tool call. And so you can see here this result of the tool call contains that division that we had asked the LLM to do in that Python code, which if you compare it to the actual code, lines up and finally we can just send all of those messages now. Again, we have user, which is our initial prompt, assistant, which is the tool that we are calling, and then the tool response, which is the output of that tool. Feed that all back into an LLM chat completion, and the final output is this. It's the result and just as the LLM would respond to that. ever saw. So you have just a quick little demo on how to use tool house with crack. Super easy. You know, they made it really easy to just get started and use tools out of box. Fantastic. Thank you, Dan. And great example. And again, folks, this will be available in our cookbook after our webinar today for you guys to take advantage of and start using it to hit the ground running versus starting for scratch. But Daniele and team definitely make it very easy if you want to start from scratch with Grok and Toolhouse. But with that, I did want to dive into the questions we've been getting from the audience as well as some that were submitted ahead of time by our community. So for the first question that we have from folks is, let's see, Daniele, this is for you. What is the best way to incorporate a self-checking mechanism into your agentic workflow, if you would? Yep. Absolutely. So that's one of the common challenges of building agents. The one technique that works pretty well is to give your agent an out. And so obviously, when you build an agent, it's kind of difficult to separate how it should use actions and how to prompt those actions and how you should prompt the general behavior. of your agent. So one common problem here is, for example, when you are equipping your agent with tools, for example, you have a send email tools and the tool for some reason may fail to send an email or to perform the action, the LLM may want to retry that action indefinitely. So a way to incorporate that self-checking mechanism. is to give the agent an out, for example, by saying, hey, if any of the tools fail, retry up to, say, three times, or do not retry. Instead, prompt the user for the action that they want to take. So make sure that you're explicitly clear in equipping your agent with guardrails and outs would be extremely... useful and efficient. Like in our test, we saw that this is probably the most effective technique that we've seen so far. Fantastic. Thank you, Daniele. Dan, this next one's for you. What models do you recommend using for agentic workflows? I know you're showing off our fine-tuned version of LAMA. 370b for tool use but is that the one you'd recommend are there any others yeah i mean i think those are probably our state of the art um again they have been fine-tuned specifically for tool use so they're really optimized for agentic workflows uh we have the 70 view model which i shared and also an 8b model which is trained off of llama 3 ap um i will also mention uh recently grok uh did support llamas or uh metas new Lama 3.1 models. And so while those aren't optimized for tool use, they're really just kind of good all around. So specifically for tool use, you know, I would use those tool use models. But the best models available that we have are Lama 3.1. Fantastic. Thank you, Dan. And this one was submitted by our live audience. And I think this goes to both you guys. Have Grok or toolhouse looked into specialized fine tuning of models and data communication methods that are higher density than normal text? Then do you want to take this? Can you repeat that crown? Sorry? Yeah. Have Grok or toolhouse looked into specialized fine tuning of models in data communication methods that are higher density than normal text? I'm not sure I mean, we do have the fine-tuned Lama 3 tool use models. As of right now, those are Grok's only fine-tuned models. So I mean, I wouldn't say, you know, we're always looking for developer feedback and suggestions. But since we have already fine-tuned those, you know, we kind of have some experience doing that already. So I wouldn't say never. On our side, we don't deal specifically with models ourselves. We might look into that in the future, maybe, but so far we're, I'm a big fan personally of the Lama 3 fine-tuned models from Grok. Our job is to make sure that whatever the LLM you choose and whatever the data you are passing to the LLM and the functions, we provide this horizontal infrastructure for you to run those tools efficiently. no matter what the LLM is. So if you switch from LLM3 to LLM3.1 or you go to Mixtral or Gemma, you should consistently see the same performance and that's what Toolhouse enables you to do. Awesome, thank you both. And Dan, this one's for you. What's the difference between the fine-tuned model and the original meta models? And I assume the audience member is referring to our fine-tuned LLM models for tool use. Yeah, so they are. We fine-tune them specifically with examples for tool use. I'm a little bit, it's a little bit over my head as far as exactly how that works under the hood. But I do know personally having used it and just developers hearing feedback, it outperforms the regular three Lama 3 models by a pretty substantial margin just in tool use. So, you know, general purpose, you probably want to use. just the llama 3 or llama 3.1 models um but for genetic workflows to you specifically uh the fine-tune models you know they're they're trained to better recognize uh how to how to use tools um so they're they're definitely superior there fantastic thank you dan daniele this next one's for you does toolhouse support remote tool execution say if i want to run a Want to run a long running task, does Toolhouse allow me to dispatch that task so it gets executed on the cloud and send back the results? It does. That's actually one of the things we excel at. All of you seen in Dan's demo is actually remotely executed. That's actually actually in line because we need to wait for the response from call interpreter, but that's obviously quite fast. If you wanted to dispatch a remote task, you can do so through Toolhouse. One thing to add is that if you want, you can still run your local tools like I mentioned before, but you can also run that same tool that you have locally in our Cloud, and so you can also do that dispatching using our own Cloud. Yes, in short. Fantastic. Let's see the next question. I think this is for you, Daniele, but Dan, please feel free to chime in if you have any. Anything to add on, but what would you recommend for memory and context handling agents, particularly when they need to revisit the user for the next action you take? Yeah, that's a great question. There are so many techniques and aspects to consider that, so the answer is going to be a little bit more nuanced than that. I'll preface with the fact that, yeah, memory is also one of the components we offer in the tool store. And the major challenge there is really to prompt the LLM to use the memory. Usually, memory is done in context in the sense that we see so many agents that are prompted by just like dropping all the contents of the memory directly into the system prompt. before you even need that context, which may be a little bit overkill. Some others are implemented with force to use, so the first function call you have is memory, which is not necessarily bad, but you might risk overflowing or adding more details than the LLM actually needs. So in terms of components and techniques both together, our approach has been to offer the memory component as a tool. And then we work with developers to make sure that they ask their agents to retrieve information about the user, to save context of the user where specific circumstances occur. That gives us the best performances that we've seen when it comes to really retrieving only the right information from the context. And then keeping that memory as a minimum, really. try to figure out based on the role of the agent what are the kind of aspects of memory that you really need to retrieve and so implementing something like rag or memory has worked we have a specific agent that's only tasked with summarizing some of the memories from the user and then passing it back to the main agent that needs to return the the outputs to the end user has worked pretty well And those are things you can basically implement with our memory, fetch memory store components right there on Toolhouse. Fantastic. And then let's see, next question is for you, Dan. And then Daniel, I guess, adding stuff from the Toolhouse side as well, but are there current production use cases happening and how is monitoring slash performance measurement integrated? So I know that we do have customers that are using tools. One that I guess an example, I don't know how widely used this is in production, but an example that our CEO, Jonathan Ross, gave recently of tool use might be booking a trip to Hawaii, where there's a lot of steps involved. And so you have an agent that first they want to ask you questions. Okay, well, where in Hawaii do you want to stay? Do you want beachside, et cetera? And then they're going to look for hotels, they're going to look for flights. So there's a lot of steps involved. And that's something that an agentic AI workflow could help solve. What was the second part of that question? I'm sorry. It is, let me pull this one back up. So are there current production use cases happening? And how is monitoring slash performance measurement integrated? Yes. So we actually just had a webinar recently with Weights and Biases on and tracing and monitoring for tool use. So that might be something to link to. It was a really great resource. Anish from Weights and Biases gave an awesome presentation there. So that's probably something to check out as well, where the tools that they had were really great for monitoring. And I think the example was for AI generated code and making sure that those work properly. So probably something to link to. Definitely, definitely. If you guys are interested in checking out any of our previous webinars, they are all available on the Grok YouTube channel for you guys to watch at your own pace. But Daniele, this next question is for you. And it is, what is the pricing model for Toolhouse? Is it based on the amount of compute or storage or something else? Right now, we're in private beta. So when you get to the early access stage, you just get credits. So we're not charging during this period, we want to just build an awesome product and we want usage. When you look at the medium to long term, our pricing model is going to be on a per execution basis and then you will have a monthly pricing if you want additional functionality, for example, if you want additional team usage team feature we're actually talking to them right before this uh webinar um so will the usage will uh will stay the same and you'll have pay as you go uh similar to uh what you pay per uh per execution on on other services um and then you can expect to get some advanced functionality uh for a reasonable monthly fee on the store side It has to be mentioned that if you're a publisher and you're interested in publishing tools that you have built, then you can actually monetize from every time a developer calls your function. So that's obviously a dynamic similar to an app store. Fantastic. Thank you, Daniele. This one's for both of you guys. Would you recommend installing the model locally, or do you guys have some API to use the tool models? So the LLMs, I mean, for Grok, we have a cloud-based service, Grok Cloud. And if you go on console.grok.com, you can create an account, create an API key, and everything is run there via that API. So a lot of examples there. So you wouldn't, I assume, referring to like the open source LLMs. So much easier to run that via Grok API. than to download those locally and run them locally. Plus one, you get the best possible mileage out of just going full cloud, even for development purposes. Your usage in development is going to be reasonable. And so you having the model hosted by Grow Cloud, you get best of these speeds and you can understand whether, especially if you're optimizing for latency, whether you can basically... remove all the kind of bias that your local environment may add. And so if you're trying to really figure out whether or not an execution is going to take longer or shorter, at least you eliminate that kind of concern by just hosting everything in the cloud. You know that the performance is going to be exactly the same of your users in production. And so you can just optimize on your code based on that. So you have the best possible performance very consistently. So I will always recommend going cloud. There might be some very niche use cases that may require that you install models locally, but I would say for 99% of the cases, go cloud. Fantastic. Thank you both. And then for both you guys, other than tool use, what are other challenges you see when it comes to building and running AI agents reliably? I can take that. Yeah, go for it. Go for it, Daniel. Yeah. So tool use is definitely a challenge, both from kind of even an education perspective. Like if you think that tool use as we know it right now is probably a year and a few months old. People still do a ton of things imprompt. And that is one of the other challenges because we try to cram everything into the agent's behavior. But we ignore that we can offload things to different aspects. To me, the major challenge is that. Major challenge has been really maintaining and running and optimizing your code for really time to first token, like reducing latency. And when you have such a widespread code base and you have so many tools at your disposal and the ecosystem changes so rapidly, then you have to really spend extra development time, dedicate extra resources to really just understanding, like do some software selection. You heard of LLM routing and that's becoming like super commonplace now and that's because many LLMs can perform slightly differently based on the task at hand or how the agent is prompted. So imagine just building all that, maintaining all that. That is to me the major challenge and I think with time we're going to see simplification of code bases happening because this is basically simply not sustainable. And that's one of the the main goals we wanted to achieve with Toolhouse. That's why our SDK is like those three lines of code, because obviously behind the scenes, we want to take care of all that complexity. All the boilerplates that's associated with function calling, you don't have to deal with that anymore. And if you switch from model to model, we also optimize the prompts. So it's like super tailored for specific models when we know that some models have... specific different behaviors on the same tool. So that's the kind of simplification that I see based on all the challenges with really maintaining an ever-growing code base and switching and dealing with different technologies that evolved super rapidly. Fantastic. And just to dig in there a little bit further, Daniel, so you guys have an agent working behind the scenes that is essentially reconstructing the prompt to be tailored for the particular model it routes to? Pretty much. It's not obviously super automated because there's always human evaluation. Human in the loop, for the time being, is a key asset there. But yeah, our job is to basically look out for any... new models, new changes, and make sure that we internalize that complexity and do most of the legwork for our users and for all developers out there. The other thing obviously we want to ensure is, as Dan actually showed in the Jupyter Notebook, is that you can inspect the JSON schema that we create, compose, and then send to the LLM. you always as a developer have the final say. If you want to do like some custom fine tuning based on what we provide out of the box, you are still in charge, you're still in control. And I think this is a major advantage there because eventually every single agent is going to be different. And that's going to be basically the beauty of AI, like equipping agents with like very custom behavior, just like we equip. agents with like quirky personalities or a different style or tone of voice, that's going to be where developers should spend most of their time. Like they should be building awesome experiences for their end users. And then for the rest, we hope to provide as much help as possible there with the infrastructure we have. And all we do is basically making the life easier for developers out there, especially when it comes to function calling. Fantastic. That's super cool stuff. Go ahead, Dan. Oh yeah, so just to kind of answer again about the challenges with agents and Danielle, you would hit on this latency. I mean, I think that's a big one. You know, at Grok, we talk a lot about low latency and speed. And so it's really visual if you go to like a chatbot, you know, to see tokens generated really quick versus really slowly. And it's nice, you know, obviously you'd prefer it to be fast, but really where it's a game changer is with an agentic workflow. So if you're trying to book a trip to Hawaii and you have maybe half a dozen agents running under the hood, that could be a difference of several minutes in terms of that time to first token, just low latency speed of each of those agents performing. Because when you have one that's dependent on the other and then you have this complex network, that's where low latency is going to be really important. Thank you, Dan. And Dan, this next question. is for you. And question is, what about privacy? Are my inputs and outputs from Grok private? Yeah, we don't store inputs and outputs. So totally private. And then for either one of you guys, can you speak to how many tools enterprises have at their disposal across multi agent systems? Curious to hear your experience. I'll take this. Yeah. This really depends on how tools are implemented, but it's upwards of tens of tools. So when you are building agents and you need to perform actions, that's the number you're hitting. Even from a smaller perspective, we see smaller companies, the number is about 10 to 15 at least. And those are a combination of basically the old tools that... they need to somewhat adapt and integrate and then have their data massaged. Most of the work there is to really iron out the quirks of LLMs. Thankfully, this is not the case with Grok, especially the LLAMA3 model, but I've seen other models. I don't want to mention them, but you probably know what I'm referring to. work a little with tool use. Some models are notoriously kind of bad at figuring out, for example, how to open a file and then process it with their own code interpreting tools that may offer out of the box. We saw enterprises struggling in using that, and so they resorted to things like Toolhouse to build a custom tool to just detect the file type. So they passed a location to a file, then Toolhouse opens it and determines the mine type, sends the mine type back to the LLM, and then the LLM can open the file more reliably based on the information that the toolhouse tool passed. That was able to give them more reliability and accuracy in the open file rates. Those are maybe small tools, but they're also bigger tools like Code Interpreter that runs in its own sandbox, it's completely isolated even from our environment. But when you think about all these tools, they pile up easily and immediately. It that's why you have like maybe tens of tools uh there and um yeah that's why you should be ready to basically build uh out like large function definitions because the the tools uh that the the agent can use may be like multiple and uh and varied we're actually building a functionality right now it's uh in our roadmap uh called bundles where you can actually you know out of the many tools you have you can just give a smaller group of tools to the agent and that reduces the risk of the agent selecting the wrong tool or calling more tools than it should. So you'd say you installed like 30 tools, but in that specific task, in the specific step, the agent only needs three. Then you can only pass those three tools to the agent while keeping all the configuration, all the other tools still installed. But what the agent sees is only those three tools. And that allows you to obviously without changing a line of code, because it's all done by a configuration setting, it allows the agent to be actually more precise. That's called bundles, and it's coming, I think, in a week and a half. We just started working on that, but the team is very fast in releasing. We release basically a functionality a week or so. Fantastic. More goodness to come from the tool house side, for sure. We are just about at time, so we'll do one last question. And Daniele, I think this one's for you. Are you guys seeing enterprises break down their tools to the most subtask level? Or do they aggregate a few actions under one tool? Yeah, no, they try to be as atomic as possible, like I mentioned. Maybe you have a single small tool to equip the airline with the current time. When you look at it, it's just like that single atomic function, probably it's like small, but it's... What? needs to be done or like opening files that's a file open and then determining what is the content of the file to produce a mime type so the more atomic the better but you also see a trend where you have like more sophisticated functions where a tool that you feed to an agent is another agent so it's very interesting to see how those things work together because through function calling you can actually build an agentic network that doesn't really use necessarily a framework. You just tell to a main LLM, to a main assistant, what are the agents at its disposal, and you can implement that sometimes even just in pure function calling. So it really depends. But in short, yeah, what I've seen is that people tend to break it down actions by function, but we're seeing as the education and knowledge on function calling expands that even enterprises started to be more sophisticated on how to organize their agents based on function calling and to basically turn functions or tools into proper agents that can be used either in a standalone way or together as a hub where you have many agents equipped with other agents and then have the communication happening that way. That's sick. Well, thank you for that. And That does put us just a little bit over time. So we're gonna have to wrap our session for today. But thank you all for tuning in and all of the great questions submitted to us for this topic. Daniele, Dan, thank you both for joining and helping with today's webinar. And be sure to take advantage of the tool house credits Daniele had presented at the top. And if you guys are not already building on Grok, you can grab your free API key at console.grok.com. And we will make this webinar available afterwards for you guys to watch at your own pace on our YouTube channel. So with that, we will close the day and thank you guys all for attending. Thank you.

Transcript for:Building Agentic AI: Challenges and Solutions

Transcript for:
Building Agentic AI: Challenges and Solutions