Azure AI Foundry Overview and Demo

Are we on? It is on. Hi. Number two. Test, test, test. Hello, hello. Hi, friends. Hello! Thank you for joining us today. A lot of people have worked very, very hard to make some really cool stuff here. We have a page and a half of demos, demos, demos, and just an hour to do it, and it's all live. Yes. All right. Let's do some stuff. Let's get rolling. Well, welcome, welcome, welcome. We're going to talk about, guess what? AI. Actually, I just realized something. You know, I've never actually had you on my podcast. I know. Can I record this real quick? We've met like 10 years ago, and we've talked about this so many times. Hang on. Let me just hit record. All right. Hi, I'm Scott Hanselman. This is another episode of Hansel Minutes. Today on the show, we have Yina Arenas talking about Azure AI Foundry. Yes. All right, thanks. Sorry, I'm going to keep that running. Please continue. So we're going to talk about AI. And as Scott said, I'm Yina. I lead product for Azure AI Foundry. And I'm here representing the work of many, many, many Engineers, marketeers, data scientists at Microsoft who are Bringing the platform for you. But what is azure ai foundry? what is the open, flexible and Secure platform that enables you as a developer to infuse ai In every single application that you do, whether it is a brand New application or something that you've been working for a while, you can use Azure AI Foundry to bring in all of the AI into your apps. And as a developer, it gives you, and we're going to talk about that in detail, about three core things. One is models. Second one is an agentic platform. And the third is around all of the observability tools that you need as a developer to take your idea to code and your code to production. Now, we wanted to, today, in order to make the Foundry real for you, we wanted to use an example of, like, a process that Scott does every single week. Yeah. As part of public. Hansel Minutes and looking at that end-to-end process and then maybe making it better with AI. So I've got this podcast that I do on the side. Microsoft has a very generous, what's called a moonlighting, if you do work on the side that's on the side. on your own computer in your own spare time. And I've been doing this podcast for the last 20-plus years. It's something that I do on my own. And I do it on my own money, and then I pay a young lady named Mandy Moore to be my editor. So it's just Mandy and I. And the show has, I just published episode 997. So this is actually, thank you. This is what we call a corpus, a large corpus of material here. There's over 500 hours of material. talking to cool people. In fact, I was actually almost delisted by a podcast directory when a young person sent an email saying that they believed that the podcast itself was AI-generated. Because they couldn't conceive about that amount of work. Now, I feel very strongly about this work, and as a human who created some art, I want to make sure that I'm going to be using AI in a way that only takes away the parts of the job that suck, like show notes. And if you've ever listened to the show, I am notorious for saying. I'm going to put that in the show notes, and then you'll never hear from me again. Never happens. Never happens. So 1,000 episodes, close to 1,000 episodes. Yep. Hours and hours and hours of audio files, lots of broken promises, lots of great people in the show too, right? Yeah. Oh, and I have to look up their bios and who they are and if they have a Wikipedia page or something, and it's a whole thing. And then they never email me back about their bios. There's a bunch of toil. There's figuring out the gas schedule. Packaging up the podcast. This is the part of the job that i Don't like. I think computers should do Toil. They should do work that is Dirty, dull, or dangerous. The things i don't want to do. That's what we did as part of showing all the capabilities Of azure foundry. We took this process that scott Has been doing for 20 years. The first step that we did is Take those close to 1,000 audio files and process them with ai to get transcriptions. And then continue the process we're going to build throughout the show today to erase that toil at scale. Right. And we wanted to do it in a way that put me in control because it's my show and I own this show. And I don't want to just copy paste my entire show into some chat bot in the cloud. I want to know what's going on at every step. Now, there's a lot of things that we do here. There's a lot of pain here on the left that steals time. There's dealing with the show notes. There's links. If I have an AI generate the links, do I even know if they're real links? There's text summarization, which I'm constantly doing. And I'm spending probably two or three hours of my spare time Every single week for the last 20 years with administrivia that I would like to kind of automate without losing quality and without keeping me out of the loop. I want a human in the loop. Exactly. Okay, so that's the setup for what you're going to see us building today on stage. And we're going to use, we're going to do the session in three chapters. We're going to first talk about models, and then we're going to show all of the different set of models that we use for this process. We're then going to talk about agents, and then we're going to show the different set of agents that we've built. And then last, we're going to talk about the different set of models that we've built. Talk about observability and the set of tools that you can use to As i mentioned before, do all of the monitoring, tracing, Logging that you need in order to take your applications to Production. Let's get going with models. First of all, azure i foundry has a lot of models. That was not always the case. Two years ago, the world was Very simple. We had just a few foundational Models. But the range of change has been unprecedented over the last couple of years. We now have tons of models that are pushing the efficient frontier and creating new choices. It's a great opportunity for you as a developer because you have great capabilities and functionality, but it also makes it quite complex to figure out what is the right model for the job. Well, in azure i foundry we have all of these different set of models and this at the conference we are announcing new set of capabilities on models. First we are extending the set of models that we both host and sell. In addition to the open ai family we are bringing in deep sea. Mistral, rock, meta, black forest lab and a set of models that not only provide that unified access but also enable you to easily switch them from your code. Now I'm going to show how do you go into the product. I'm going to switch to here we are. This is the UI for Azure AI foundry. Right here in the model catalog we're going to see the set of experiences that it offers for you. of all is a wide range of model selection. We have over 10,000 models now based on our partnership with HagenFace. We see here on the top the latest news of our announcements that we are making here today. And every single day, every single week, they're being updated. There's a set of models, all of the models that are listed here, and there's a set of dimensions that it could use, whether it is by model provider, by industry, by the set of capabilities, whether they're serverless or available on managed compute, the set of tasks that they support, support and all of those capabilities But if you want to just ask, for example, we have a built-in agent into our portal where you can ask, like, what is the set of models that, for example, if I'm doing, like, protein research, so I can say best model for protein research. Do you have a model to talk to you about the models to pick? Yeah. Top model. And I'm going to send that in. And I'm going to also show the leaderboards while that is going on. Get some space. Leaderboards is the place where you want to come and see what are the best models based on certain benchmarks. Right now we have benchmarks around quality, safety, cost, throughput. You can see the set of tradeoffs chart. If you want to select the best model depending on quality and cost. Comparison or a quality and throughput comparison and it Will show you what is the best model depending on that scenario. Are those leader boards real? that's not hard coded. No, this is not hard coded. We're constantly doing Benchmarks on the models and bringing those into the catalog. So if somebody's lower on the model or higher on the model, it's a competition to be better. Exactly. It's always a competition. Now, you can also use your own data to evaluate models. So, like, you can bring in and use your existing data set if you have one, or you can use an ai generated data set as well. Okay. There's so many people in the room. We're going to, the other thing that we are announcing at Bill, let's see if we get the answer from the models. Yes, we see these are our best models for protein research is EVO, DIFF, and BOMO and TamGen. Those are models that are coming in from our foundry labs. which is the way that we bring in all of the models from Microsoft Research into the catalog. Now let's look at another announcement. This is Model Router. Model Router takes the selection toil from your head. So it is a router on top of of the best models, reasoning models, the state-of-the-art models. Right now, only on OpenAI models, but soon across more models. And here, you can see I made it a simple question, where is Hanoi? And it used a nanomodel. It's a very small model to answer that very simple question. If I ask a little bit more complex question about planning a trip, I have a set of constraints, it's going to use, in this case, well, it used a mini model, which is like a better model for that job. And if I ask a different question, like, for example, what clothes should I bring? I don't want to go and waste money, waste energy, waste the environment on something like what's the capital of Hanoi. Exactly. So a nano model is tinier. It's going to use a small model again. And if I give it a more complicated query, then it's going to use something like a reasoning model for this. Now, I mentioned that we took close to 1,000 episodes. Right. And then we processed all of those. We actually used . To take the audio files and generate the transcriptions. But then we wanted to not just, we wanted to do a Destillation of a model so we can reduce the cost it takes to Process the next 1,000 episodes. Because hansel minutes international. is me and Mandy. Like, we don't have any money. So we would like to make our cost as low as possible. Upfront cost is concerning to me, but also I got to do another 1,000 episodes over the next 20 years. Right. I'd like that to be very, like, a penny. So it costs about, like, $100 to do the processing of the first 1,000 episodes. Okay. And then, like, we do it, we did a fine-tuned model, and the next 1,000 is gonna cost us about $1.50. $1.50. Yeah. Because we're using a custom model. model. faster and customized specifically for your job. And then right now at build we are announcing developer tier that enables you to do experimentation, reducing the hosting fees and there is no SLA but you can get to do all of the experiments on fine tuning. Okay. We're also announcing great things around foundry local. Yeah. So this is another thing that I feel is really cool is that I've got a machine here that is very that is very capable. This is a laptop. This is a... Studio 2. I've got a 4060 here. I can do some of this work locally. So we've got Azure AI Foundry in the cloud. We've got Azure Foundry local here. I want to be able to look up people's biographies, like yourself. Now, I could go out there and Google with Bing and then figure out your biography and then rewrite it for myself or copy-paste it. Or I could use a combination of something like perplexity to go and figure out a lot of information about you. and then summarize that and distill that down. So here we're going to start up Foundry Local with the PHY4 mini model, and we're going to start that on this local service. So this is running on local host. Now, you notice down here in the corner we just saw the GPU kick in, and we just loaded PHY4 mini into the local memory, and we'll go and say Ina. Now, this is going to come... The first part is going to be a call to perplexity, which is in the cloud, to collect me a huge chunk of data, And then we're going to go and run locally on the... the GPU and summarize and squish all of that. There we go. Now that's all running locally. Oh. Now this is exciting because you were apparently, is that true? No, that's not true. It got confused. That's awesome. You're fantastic. So you're not Tina. Did I misspell your name? Did you? I think I probably spelled it wrong. No? Let's do it again. Let's try. See, because AI can be wrong sometimes. I didn't realize that you were such a talent, a multi-talented person. Definitely not Latino. You were in Chicago. From Colombia. Well, no, you're not. You're Australian. Oh, no. Hey! There you go. There you go. So... You can see the little jump right here where the work happened locally on that machine. And then we'll go and we'll see what kind of musical theater background that I have when we run this as well. And you can notice the GPU memory as well. Cool. And I want to call out that it worked really, really hard for you because you're awesome and apparently I am, did not require a lot of effort, which is profoundly sad. But then I can just go and say... foundry service stop and then watch over here in the corner and then that GPU memory goes away. So I can do that work locally. All righty. That's chapter one, models. We saw all of the demos. We have packed sessions on models when going through... through the model catalog, model router, the fine-tuning capabilities, Foundry local, Azure OpenAI, multi-model models, reasoning models, how do you optimize all of your Gen AI applications, take the picture, don't miss those sessions. Now we're going to go to Chapter 2, and we're going to talk all about agents. So, first of all, what's an agent for you, Scott? Scott Niemann- Well, so I'm confused about the difference between agents and tools, because AIs don't have feet and they don't have hands and they can't do stuff, but sometimes you want an AI to to spin through some files. So I feel like that's a tool. That's a tool. I have been called a tool at work sometimes. It happens. But then I feel like an agent might be able to call multiple tools. But an agent has a task where it's like, take this and do it, and then let me know when you're finished. I mean, we've been doing process automation for ages. And the difference is now you're having an LLM. that is helping you drive the control flow of the program. It's making some of those decisions for you. And it has a set of tools that they can invoke. Those can be retrieving information, grounding with knowledge. They can be actually acting, like making an API request, folks. And then, you know, keeping a thread with memory of the current conversation, the current state. They can also support multi-model inputs, like conversational images, video, text, speech. And they can support a set of outputs that let them be invoked by other agents. So one of the key things that we have is the ability to orchestrate these agents. I think right now there's a lot of overload on when we say agents. Because people feel like they have to add a whole bunch of Functionality to a single agent. And what i usually like to Recommend is that instead of that, you create a smaller agent. that has a specific set of tasks, a specific functionality, and then that you can validate and evaluate, just like unit testing, and then, like, you can orchestrate them together to execute, actually, a process or a task. And we have two types of them that you will see, as you know, as you go through the product. The first one are connected agents, which basically treat each other as tools. They invoke another agent, just like an API request invocation. It gets a result, and it continues with the set of things of thing that they need to do. And the second one is more of a multi-agent workflow. Think about it, this is like where it's actually a human in the loop, it's a process that it needs to follow. It is not just letting the agents kind of do their thing, but actually, like, you know, there's a process. And that, the second one, is the one that we're going to be using today for the Hansel Minute podcast. Right, because I want that to fit into the way I do things now, where I put my file into a folder in OneDrive, Mandy picks it up, she does the hard work of editing and cleaning it up, then she will interact with an agent that's going to help us as a team member that's helping us do the stuff that we don't want to do, like show notes, then we'll validate the show notes, make sure that they're cool. So they're going to fit into our existing process of guest intake, packaging, and promotion. So we're going to show a whole bunch of these different set of agents, pieces of them, of the functionality that we've built. The voice orchestrator, we just saw bio generator, the guest sourcing transcript, the show notes. And we're going to see different parts of the product as we go through this. Now, There's multiple ways in which you can build agents, and I think this is worth a conversation. Yeah, I would say that it is a little bit confusing because that word agent gets overloaded. If you want to feel smart at work, just say agentic in a meeting, and people will think that you're fancy. This is an agentic flow. The way I've been thinking about it is when we started thinking about the cloud, right? You can do infrastructure as a service where you're buying a virtual machine, and that's very low level. That's your infrastructure. Then there's platform as a service where we're talking about Azure AI Foundry where it feels like there's a lot of power. It's a managed service. You don't have to think about Your own scaling your own gpus. You don't have to think about Managing your entire fleet. You don't have to think about Geo residency, all of these different things. Exactly. And then software as a service would be more like co-pilot studio where I basically have this run time that I don't have to think about much of anything except the business problem. So we're sitting squarely at the platform as a service position, which fits well with the way that I do stuff. And then at build, we're announcing the GA, general availability of the azure ai foundry agent service, which is that platform as a service that enables you to create agents declaratively and, you know, run them on the cloud, use different set of models, use different set of tools. The key difference from what we've seen being offered in the past is the enterprise readiness and the capabilities that it has to bring your own file storage, to connect to your enterprise systems, to support your organization's built-in network. You can bring your own memory, your own thread storage. It supports authentication. So, like, if you have to authenticate with an API, it supports passing all of the tokens. as well as connecting with a set of tools, like knowledge tools, like Fabric, SharePoint, Bing for web grounding, and Azure AI Search. Now... Another key thing I wanted to mention is how the foundry agent service is interoperable. Whether you are building the agent in our platform or you Build an agent in another platform and you want to Connect it with a foundry agent, we support a2a, mcp, we are Working with crew ai to support their languages as well in Addition to supporting the open ai assistance and responses api. Let's see this in action. Let's show them some of the some of the agents that we've built. All right, so we're switching over to your machine, and I think show notes, right, is the one that is hurting me the most. So we're here in agents. We're going to see the list of some of the agents that we created, and we're going to show, yes, show notes. Let's show them show notes. First thing that we're going to see here in show notes is that we are actually, for this particular agent, we're using the nano model that we're using. we fine-tune in the past. Yeah, and I really want to call this out, because this is important to me. I understand that, yes, I could upload my MP3 into a giant chat bot. That's a thing you could do. But that is really bringing a sledgehammer to a scalpel. It's going to cost me way more money, it's going to use way more power, it's completely unnecessary, and if I get to do it a thousand times, it's going to be a problem because I want to spend as little money as possible. This is a, is it distilled is the word? Yep. The distilled model that is a tiny model, the smallest possible model that doesn't give me relationship advice, it doesn't tell me how tall Brad Pitt is, it is trained on how to do show notes. And that's important. It does the exact thing I ask it to do and no more. So we're going to have paste. This is the transcription of one of your shows. Okay. And then we're going to see... Ooh, which show did you do? This is actually a show with one of my coworkers and dear friends, Raji. Oh, show 881, Raji Rajagopalan. We did a talk about her book. Exactly. So we can see here it's generating the show notes. It has all of the different takeaways and notes. Hang on. Notable quotes. Did I say something pithy? No, I think you said something very smart. Smart things happened there. Let's zoom in. Imposter syndrome means you're doing... something okay, that's good. Thoughts are not facts. Ooh, that's deep. I think that's Raji, not you. Oh, that was probably not me. Okay. There's a couple of cool things that I want you all to see here. View code. It opens up a view of this particular interaction that we're having. You can open in VS Code. It will open up a sample project for you right there in the browser. It has Python and C Sharp. I want to make sure to shout out to C Sharp. You know, he was very happy. I'm very happy when he's here. And then the other thing i want to show is agent catalog. So here in case you want to get started with a different set of Agents, we have microsoft build templates. We have also several from our partners. You saw in the keynote the one that was coming from Stanford, the healthcare example. So we have lots of different set of agents for you to get started. But one of the things that we did, and I'm going to go back to our agents here, my agents, was an orchestrator voice agent. Yeah. You saw me running things from the command line. I can do that. I can run a Python app. But I want Mandy on my team to be able to interact with this in a way that's comfortable. And we thought it would be cool to have the orchestrator router agent that has all of things to be something we could talk to, which would be cool. Well, let's try it. All right. Hey, are you there, and can you give me a list of the things that you can do to help me with the podcast production? Of course, Scott, I'm here to help streamline and enhance your podcast production. Here's a list of what I can do for you. Guest intake. Collaborate with the guest intake agent to brainstorm and refine a list of potential guests to invite on your show. Help you create personalized outreach strategies to approach notable guests in industry, society, and technology. Speaker bio creation. Use the bio generator to create a draft speaker bio. I'm going to go ahead and respectfully interrupt you. Hang in there for a second there, bud. Okay, cool. Awesome. So that's one of our agents. And then the next one we did was around search. Yeah. So this was something I thought was really exciting. Let's switch over to my machine here. Remember, it's my data, and I'm running it in my my Azure AI Foundry experience. So the model that we went and distilled is mine. And we can ask it questions now. And this was freaking amazing. Because my search right now on my site is basically I just load a giant JSON file and I can search for people's names. Now I can chat with 20 years of stuff and I can say, is Scott nice? And it's going to go and give me an AI-generated proof. Yes. That I'm... That you're actually nice? Well, they might say, well, I don't know. He's probably not very nice. So this is using Azure AI Search with... But it looks... Let me see. Look, it's got proof. Look, it's grounded. It's grounded. I was nice twice in a thousand episodes. Well, that's an easy question, right? Like, what are the characteristics of a person named Scott? Well, it could have been any Scott at this point. No, this here is a sub-query. Let's ask a more complex question. Yeah, let's ask a sophisticated question, because this is more than just rag. Right, because RAG is this augmented retrieval over data, but what if we asked a complicated series of questions like, you know, over the last 20 years, are there any recurring ideas or universal truths in the last thousand episodes? that have emerged. So this is not just like a one-shot. No. This is what happens with RAG that we've been doing. It's like going from 10 links to just like you get one shot to get the answer right from your index. Well, one of the things that we're announcing... It's dropping in proof. All of these different set of Let's see. It's done. Let's go look at this. So down here, look, it's Split it up into multiple questions, spent time on each Individual question, broke the sub queries up, and those sub Queries are not sub sentences. They're sub queries, then it Goes and retrieves the answer for those queries and brings Them back into the context of the model so it can give you the Answer. Pragmatism, mentorship, community. Cool. This is what we call agentic retrieval on Azure AI Search. Yeah, that's really cool. I'm going to be able to ask a lot of interesting questions. Okay, let's go to the next one, which is how we did the transcripts. Oh, yeah, yeah, yeah. So you can go in here, and you can define a schema. This is if you don't have 1,000 of something, right? Let's say that you want that kind of feature, but you don't have a large corpus to create. So we define the schema. We're using content understanding. Which is a service in azure ai foundry. And then we process the same podcast with our friend, Raji, and then we created all of the different set of field Results, the title, the summary, and some of the quotes. here as well? No, we did topics, summary, and title. And then we have here, it's identifying speaker one, speaker two. It turns out it identified me as speakers two and three because apparently my advertisement voice is different from my actual normal voice. So when I did the ad, it thought that was a different guy, which I thought was crazy. Yeah, so you can see here, you've got all of this is Raji speaking here, and then here's me jumping in. So it's identifying different speakers. And it's pulled out. topics about mindfulness and sustainable habits and all that kind of stuff. So that's another one of the agents that we got built connecting content understanding with an agent in AI foundry. Now, once we got all of those agents, did we miss showing anyone? No, I think we're good. Once we got all of those agents, we wanted to wire them up together in a multi-agent process. And for that, we're using semantic kernel and origin. And we have actually a lot of time right now. So let's make sure to dig in. And look at some of this code because I want to make sure that we're showing real code. Because everything we built here is legit. So this is C sharp. Before you show that, let me make a couple of comments on semantic kernel. So we've been working on agentic frameworks for quite a while. We released autogen and semantic kernel in 2023. And autogen has been coming in as an agentic framework from our Microsoft research organization. Semantic kernel is the one that we've been recommending for using in production. as we bring all of that innovation. We are bringing them together into one framework. And at build we are announcing they are using the same run time. And then we are going to convert them into a single framework as We go forward. Now, Here, we're using semantic kernel to bring all of these agents together, wire them up, and which we have the semantic kernel library, and we have all of the different agents that we're using. Yeah. So over here, we've got our content agent, which is going to go and do our show notes. We've got link verification, because we want to make sure that this is grounded in reality. When it runs through the show notes, oftentimes I'll say things like... Well, that's great, Gina. I'll be sure to put a link to your book in the show notes, and then you'll never hear from me again. We're recording right now. Let's ask. What do you want to put in the show notes? We want to put in the show notes, one, the reference for the session for BRK 155, which is our build talk. Right. We want to put the link to the AI Foundry website, which is ai.ashl.com. Right. But let's give it a little bit more complex. I think that you should put a link to my YouTube channel. I'm Scott, and I'd like you to put a link to my YouTube. Oh, and then put a link to HanselMinutes episode 881 with Raji as a callback link. Yes. So make sure you do that, please. That'd be cool. That'd be cool. Yeah, yeah, yeah. Okay, so this application here has these agents identified, and if we go and take a look at them, they are listed as semantic kernel functions, right? And then semantic kernel includes the ability to call an agent from it and participate in these kind of workflows. And then from them, we'll go and then just spin out the result, and then the result will end up being in a markdown file called show notes. Okay? Perfect. All right. Where are we here? Okay. Oh, and then I wanted to show the yaml. And the mermaid chart. Oh, the mermaid chart is good. Okay. So we all know that I don't like yaml. So we click here, and I don't have to look at yaml. So this is a mermaid chart, mermaid is great, that is showing me basically the results of that YAML, showing that, okay, we're going to go and we're going to do the transcript, we're going to analyze the content for the show, generate the show notes, we're going to go and verify each one of those links to make sure that they are legit, and then we're going to summarize the result and that will be the complete thing. That's awesome. So that's pretty cool. So all of that will happen here. While you're at it. One of the things that I forgot to mention, we're kind of speeding up because we're like, we have so much content. We're very excited, but we have a lot of content. You know, on model router, our current analysis tell us that when you're using, compared to GPT-401, and you're using model router, you can get up to 60% price off in the price. Okay, it's more efficient. More efficient, less expensive. What am I saying? It's less expensive. Well, it's got a less environmental hit. It's more efficient. It's smaller. The idea is you put the router in front of it and it's a load Balancer. That's the word i was trying To find is cheaper. It's 60% cheaper. And you only get one to two points hit in accuracy. Which is pretty significant. Yeah. Help me understand what percent of accuracy turns you into an Australian singer. I don't know. I think that's a very low one. Okay. Now let's show. Do we want to stop our podcast? Are we finished with our show? What else do we want to say? We want to say we talked about models. Okay, we talked about models. Digging into agents. What are the key things that we said about models? Well, there's a metric ton of them, 11,000 of them. There's models that are specific, not just large language models, but there are models of all different flavors, including specific models for things like protein, health care, finance, all those things. You have... I think you saw 11,000 models in there. All of those models can be fine tuned. One of the things i was interested in that i didn't Understand was you showed a graph like this on their fine Tuning, but i don't understand how you know that it's correct Or not. That one we specifically Didn't do an evaluation run. You want to get the loss to zero And the accuracy closer to one. Can you show me it not looking Good? i think it's always cool when cool when we do demos, but show me a failure. We got some failures. When we were doing the runs. Nobody likes to see a demo that works. This one didn't go through. That looks like i'm not a Scientist, but that doesn't look like a good line. Exactly. The line was heading towards the Middle and not necessarily trending towards one. This is saying whether or not it generated show notes correctly. What we do is verify that against actual show notes and Transcripts. This one is a good one. We talked about all the things and models. We talked about local. Then on agents, we talked About the agent service going ga. We forgot to mention, oh, let's look at all of the things that They can connect to. What did we forget? Yeah, exactly. And we can connect them to a Whole bunch of different things. So knowledge. Azure eye search, microsoft fabric, sharepoint. grounding with Bing search. You can do custom search, for example, if you just want them to go to a specific set of sites and not to like entire whole web, then you can just say custom search. And then a few other partners. It can't be overstated how important grounding is in this kind of context because not only do I want to save money and be efficient, but I don't want my chatbot to talk about things that aren't this. It has to be restricted. It has to be grounded. And if you're creating a chatbot for your own business, you're going to want to have the same experience. You don't want your coffee shop chat bot offering relationship advice or opinions about someone's shoes. It needs to focus on what we're doing here. And in this case, the Hansel Minutes one has not only been created into a nano model, but we're going to ground it in the material that we have, the knowledge that we have available. In addition to knowledge, you can add actions. So, like, if you have some deterministic workflows that you've already created in Azure Logic Apps, you can bring them in and connect them to. That's a great point. This is a thing that some people who are of a certain age might think that, well, are you reinventing cron jobs? Are you reinventing for loops and things like that? I have existing processes that work just fine. I'm not going to replace them because they work great. But I have Azure Functions. I have Azure Loader Apps. What I'll do is I'll tell the orchestrator agent about those things so then it can have those available as tools. And as long as you can describe them with an API, you can describe them with an OpenAPI v3. And then you can connect to any of those. I think it's time for us to go back to the... Okay, so let's switch back over to my machine. Do you want to call that? Is that the end of the show? I think we should do that at the end of the show. Okay, so that's the end of this show on Azure AI Foundry. Hopefully you made it this far into our podcast. Did you actually say my name correctly on that one? Lina? With a Y. Arenas. Y-I-N-A. Is that correctly spelled? That's correct. All right, cool. This has been another episode of Hansel Minutes, and we'll see you again next week. That's going to be like identified as speaker number four. Okay, we're not done yet. We're just going to process that through all of the different set of agents that we've created. Then we're going to go and talk about our third chapter, which is all about observability. So I'm going to kick off a process. You're going to kick off that process. I'm popping this over, putting recording.mp3. I want to call out... It was just created and it's real and then we're going to Start that, kick this process off here. I'm going to kick off this Dot net process. This will run for a little While and i'll let you know when it's done. Some of the other things that we saw, the azure, we forgot to Mention the visual studio extension. We can show that. We can show that. The api, the sdk, the Visual studio code extension, semantic kernel, more sessions for you to go and learn about that. Then we're going to go now to our final chapter, which is Our round foundry observability. So for observability, we have The set of tools to support your entire development life cycle. Whether you're starting to experiment or thinking about Going to production, you want to make sure that you can Continuously monitor the quality and safety of your solution. So we have those tools right here for you so that you can Generate the traces, send it over to azure monitor. And have all of those capabilities that enable you to Do debugging and make sure that there's, you hit the right level Of reliability that you want for your app. Does this use like otel, open telemetry? Yes. It's important to call it. Otel is one of the biggest things for folks in the Observability space, if you're not familiar with open Telemetry, it is this kind of quiet storm that is just making Everything better. In the old days, it was just Log files and grep and a regular expression. Right, and that was how we observed our software. And everyone knows that if you've got a problem and you've decided to use a regular expression to solve it, now you've got two problems. What OpenTelemetry does is it's effectively distributed log files, and when you have large, complicated, distributed systems like this, you want all that to come to one place. So you could even use a .NET Aspire dashboard or Grafana or any of these systems, any existing observability system that you have, you could observe the work of your agents. So when it comes to agents, agents have threats and run. Threat is the common memory for that conversation. And each run is the turns of that agent. Like a conversational thread. Exactly. And every time that threat is going through, we can see not Only the inputs and outputs and the metadata that is being Generated across every single call, but we can also see the Evaluations coming in. So for this particular case, We turned on evaluations around intent resolution, relevance, We have New set of evaluations as well that are releasing as part of Like the announcements that we're going to build that Include intent resolution as well but also task adherence Because you want to make sure that your agents actually stay On task and they're not going around just doing other things Or for example indirect jailbreak where if you have an Agent that is going out and getting information from the Web, what if it gets a piece of code that it shouldn't? So all of these different set of things we have. In the example earlier where we went out to the web to gain Information, we spelled your name wrong. I spelled your name wrong. And then we got the wrong Gina. At some point we could have a Score assigned to that and it would say i'm not sure if i'm Confident in that. Exactly. It will tell you. It will have a lower quality Score on, for example, relevance. This one was four out of five. It feels that it gave a decent response but it could have been better. It does not omit any key details and deserves a high score. Okay. That's cool. Yeah. One of the other things that we have in addition to that is... With your cicd pipelines. So we have here we can see we Have the evaluations, we have the evaluation overview where We bring in the different set of evals that we've run on our Agent and it's fully integrated with github as a github action. So every time that you're pushing new changes into Production you can run your pipelines and have those Evaluations run for you as well. And then we have integration with monitoring as well, like where you can see all of the operational metrics, including your tokens, like the average inference call and your duration of your inferences calls, the errors. Like, we can see how we were, in this particular example, our quality significantly improved over the last, you know, the last few days. Yeah, we've been working hard. As we've been working on this demo, yeah. All of these is also integrated with azure monitor. If you have your dashboards, you can go there and see all of your Yeah, i wanted to call that out. This integrates with azure Monitor's application insights, which is something i'm already Familiar with. We talked a little bit about Infrastructure as a service and we talked about platform as a Service, which is where we are sitting right now in the ai Landscape. Then we talked about software as A service where you're going to see things like copilot. I have found in my time as a cloud person, That I really sit in that platform as a service bot. I run my websites on Azure App Service. I use Azure Application Insights. This fits for me in the kind of stuff that I was already doing, which is why I was excited to be a part of this talk. Platform as a service is a sweet spot. One last thing that we will mention, which is very, very important, is what do we do around end-to-end security, right? And the integration with Foundry across all of the different stack of Microsoft security products. First, agents that you are creating are getting a specific identity on entra. So it's not like, so for those of you familiar with Azure active, with entra, with entra, all of them, all of the Different set of entities have, whether it is users, groups, Devices, they have an, they're registering the directory, right? Now agents are going to have their own type of id where you Can assign entitlements and do governance on top of them. You can see. I'm giving this agent who is acting on my behalf a specific Set of permissions and not more than that. The second one is around sending all of the information to purview. So things like data labeling, data protection, and the full Integration with all of the security products like purview And defender are part of what we have as capabilities in foundry. All righty. We are going to start the Sessions around observability and governance. Please take a look. We are going to go way deep And now let's don't forget to go back and. Okay. Let's switch back over to the Factory. One, two, three. So i'm a little sad. Your name is so challenging Apparently. Straight to yale. Nobody got that reference, but that's fine. So when it generated that we discussed the platform's Capabilities, models, agents, and observability tools. I shared my experience streamlining into the production process, and we discussed various features. So this is all, other than the misspelling of your name, totally legit. And we've got our topics. Then it went and it generated a, look, it spelled that correct. That's cool. Made it VTT, which is not just a text file, but an actual transcript file that I could use on YouTube or I could put into my podcast. You can see our different agents over here. That agent has a leading space. And then we have our link verifier, which is going to check all the links. So we scroll down a little bit. I like a nice ASCII table. I just feel strongly about that. Oh, did we say it was special, or does it just think we're special? We are special. We also said it's special. In this special episode, nearly 1,000. Okay, so this is all good. So it gave us a summary generating the notes. I want to open those notes up. Then we go and verify each of these. Looks like two of these didn't work, and they have a warning. We'll go and dig into those links. And then it went and summarized them. So let's jump into the output folder. Here's the show notes and the transcription. We'll go and look at show notes. And then let's do a little markdown magic. Ooh, rumpcious. Takeaways. Do we teach it to do quotes? There we go. Everything we built here is legit. It is indeed. It is pretty legit. Less expensive. That's good. Putting the crowd in the front. Grounding. Yeah, this is all cool. Ah, resources mentioned. Scott's YouTube channel. No link. Ah, scandalous. Oh, and then I don't know if I told you this or not. Actually, I did not. I actually was given permission last night at 11.30 at night to let people know that Notepad is going to support Markdown. There you go. So we've got our lovely thing loaded into... I feel like the ai foundry notepad collaboration. I love that. I love that collaboration. We're going to make markdown. That's pretty cool. I dig it. All righty. We got ourselves a show. That saved me a lot of work. What are you going to do with that, scott? I'm going to go and add a couple of small changes Because i'm the human in the loop. I'm going to give it to mandy who will edit the show and we'll Try to put it up as soon as we can. Sounds good. Very cool. Yeah, so Gus, let's do a recap and talk about where people can learn more. So remember that we tried to create this podcast factory that would make my life easier. Hopefully the two or three hours a week that I spend on this turns into 10 or 15 minutes and just an agent that does this work for me, and then I'll double check the work. One of the things we didn't get a chance to show you were some ideas on guest sourcing because I realized that there's a trend line on my show. And we can get recommendations about who else I might want to have on the show. Shocking that it hadn't recommended you until just now. You may have put your thumb on the scale on the AI. We've got the bio generator, and then we also could have an AI doing scheduling, because it's really challenging. There's a lot of emails going back and forth, talking to guests, trying to get them scheduled for the show. We've got a transcript generator that we saw, which is a custom model, and show notes, which was done on GPT-41 nano show notes. Fine-tune model. And then the link resolver, Make sure the links it puts up are legit links. Make sure they don't 404. And other things we could Potentially do would be generate copy for social and things like that. Localization has been a really interesting thing. We have models that could do that as well. Translation, bring it to different languages, that's Another thing we could have done. All righty. Well, with that, this is a way in which we, you hopefully saw saw today how Azure AI Foundry can help infuse AI into a different set of applications. We build a podcast factory. What factory are you going to build? How are you going to bring that investment to be instead of return of investment, return on your effort? We're hoping that you're going to see how AI can help reduce toil, can help take away all of these set of tasks that we don't necessarily want to do. You mentioned dumb, dangerous, and? Dull, dangerous, and dull. I want to do the fun stuff. And then we have a lot of customers like gainsight, Safier, heineken, more that are using ai today. Over 70,000 customers are using azure ai foundry to bring ai Into their organizations. Over 10,000 have already used The azure ai foundry agent service and are creating now Millions and millions of agents to like bring in and get Significant input, significant gains into their processes. We want to leave you with a call to action. Are you ready to create the future of AI? Thank you, folks. Yeah. Thank you very much. Take a picture. Take a picture of that and make sure that you remember all of the notable quotes from the show. And then sign up over here, my friends. Thank you. Thank you, folks. Good job.

Are we on? It is on. Hi.

Number two. Test, test, test. Hello, hello.

Hi, friends. Hello! Thank you for joining us today.

A lot of people have worked very, very hard to make some really cool stuff here. We have a page and a half of demos, demos, demos, and just an hour to do it, and it's all live. Yes. All right.

Let's do some stuff. Let's get rolling. Well, welcome, welcome, welcome.

We're going to talk about, guess what? AI. Actually, I just realized something. You know, I've never actually had you on my podcast. I know.

Can I record this real quick? We've met like 10 years ago, and we've talked about this so many times. Hang on. Let me just hit record.

All right. Hi, I'm Scott Hanselman. This is another episode of Hansel Minutes.

Today on the show, we have Yina Arenas talking about Azure AI Foundry. Yes. All right, thanks.

Sorry, I'm going to keep that running. Please continue. So we're going to talk about AI. And as Scott said, I'm Yina.

I lead product for Azure AI Foundry. And I'm here representing the work of many, many, many Engineers, marketeers, data scientists at Microsoft who are Bringing the platform for you. But what is azure ai foundry?

what is the open, flexible and Secure platform that enables you as a developer to infuse ai In every single application that you do, whether it is a brand New application or something that you've been working for a while, you can use Azure AI Foundry to bring in all of the AI into your apps. And as a developer, it gives you, and we're going to talk about that in detail, about three core things. One is models.

Second one is an agentic platform. And the third is around all of the observability tools that you need as a developer to take your idea to code and your code to production. Now, we wanted to, today, in order to make the Foundry real for you, we wanted to use an example of, like, a process that Scott does every single week. Yeah. As part of public.

Hansel Minutes and looking at that end-to-end process and then maybe making it better with AI. So I've got this podcast that I do on the side. Microsoft has a very generous, what's called a moonlighting, if you do work on the side that's on the side. on your own computer in your own spare time.

And I've been doing this podcast for the last 20-plus years. It's something that I do on my own. And I do it on my own money, and then I pay a young lady named Mandy Moore to be my editor. So it's just Mandy and I. And the show has, I just published episode 997. So this is actually, thank you.

This is what we call a corpus, a large corpus of material here. There's over 500 hours of material. talking to cool people.

In fact, I was actually almost delisted by a podcast directory when a young person sent an email saying that they believed that the podcast itself was AI-generated. Because they couldn't conceive about that amount of work. Now, I feel very strongly about this work, and as a human who created some art, I want to make sure that I'm going to be using AI in a way that only takes away the parts of the job that suck, like show notes.

And if you've ever listened to the show, I am notorious for saying. I'm going to put that in the show notes, and then you'll never hear from me again. Never happens.

Never happens. So 1,000 episodes, close to 1,000 episodes. Yep.

Hours and hours and hours of audio files, lots of broken promises, lots of great people in the show too, right? Yeah. Oh, and I have to look up their bios and who they are and if they have a Wikipedia page or something, and it's a whole thing.

And then they never email me back about their bios. There's a bunch of toil. There's figuring out the gas schedule. Packaging up the podcast. This is the part of the job that i Don't like.

I think computers should do Toil. They should do work that is Dirty, dull, or dangerous. The things i don't want to do. That's what we did as part of showing all the capabilities Of azure foundry. We took this process that scott Has been doing for 20 years.

The first step that we did is Take those close to 1,000 audio files and process them with ai to get transcriptions. And then continue the process we're going to build throughout the show today to erase that toil at scale. Right.

And we wanted to do it in a way that put me in control because it's my show and I own this show. And I don't want to just copy paste my entire show into some chat bot in the cloud. I want to know what's going on at every step.

Now, there's a lot of things that we do here. There's a lot of pain here on the left that steals time. There's dealing with the show notes. There's links.

If I have an AI generate the links, do I even know if they're real links? There's text summarization, which I'm constantly doing. And I'm spending probably two or three hours of my spare time Every single week for the last 20 years with administrivia that I would like to kind of automate without losing quality and without keeping me out of the loop.

I want a human in the loop. Exactly. Okay, so that's the setup for what you're going to see us building today on stage.

And we're going to use, we're going to do the session in three chapters. We're going to first talk about models, and then we're going to show all of the different set of models that we use for this process. We're then going to talk about agents, and then we're going to show the different set of agents that we've built. And then last, we're going to talk about the different set of models that we've built.

Talk about observability and the set of tools that you can use to As i mentioned before, do all of the monitoring, tracing, Logging that you need in order to take your applications to Production. Let's get going with models. First of all, azure i foundry has a lot of models.

That was not always the case. Two years ago, the world was Very simple. We had just a few foundational Models. But the range of change has been unprecedented over the last couple of years. We now have tons of models that are pushing the efficient frontier and creating new choices.

It's a great opportunity for you as a developer because you have great capabilities and functionality, but it also makes it quite complex to figure out what is the right model for the job. Well, in azure i foundry we have all of these different set of models and this at the conference we are announcing new set of capabilities on models. First we are extending the set of models that we both host and sell. In addition to the open ai family we are bringing in deep sea. Mistral, rock, meta, black forest lab and a set of models that not only provide that unified access but also enable you to easily switch them from your code.

Now I'm going to show how do you go into the product. I'm going to switch to here we are. This is the UI for Azure AI foundry.

Right here in the model catalog we're going to see the set of experiences that it offers for you. of all is a wide range of model selection. We have over 10,000 models now based on our partnership with HagenFace.

We see here on the top the latest news of our announcements that we are making here today. And every single day, every single week, they're being updated. There's a set of models, all of the models that are listed here, and there's a set of dimensions that it could use, whether it is by model provider, by industry, by the set of capabilities, whether they're serverless or available on managed compute, the set of tasks that they support, support and all of those capabilities But if you want to just ask, for example, we have a built-in agent into our portal where you can ask, like, what is the set of models that, for example, if I'm doing, like, protein research, so I can say best model for protein research.

Do you have a model to talk to you about the models to pick? Yeah. Top model.

And I'm going to send that in. And I'm going to also show the leaderboards while that is going on. Get some space.

Leaderboards is the place where you want to come and see what are the best models based on certain benchmarks. Right now we have benchmarks around quality, safety, cost, throughput. You can see the set of tradeoffs chart. If you want to select the best model depending on quality and cost. Comparison or a quality and throughput comparison and it Will show you what is the best model depending on that scenario.

Are those leader boards real? that's not hard coded. No, this is not hard coded. We're constantly doing Benchmarks on the models and bringing those into the catalog. So if somebody's lower on the model or higher on the model, it's a competition to be better.

Exactly. It's always a competition. Now, you can also use your own data to evaluate models. So, like, you can bring in and use your existing data set if you have one, or you can use an ai generated data set as well.

Okay. There's so many people in the room. We're going to, the other thing that we are announcing at Bill, let's see if we get the answer from the models.

Yes, we see these are our best models for protein research is EVO, DIFF, and BOMO and TamGen. Those are models that are coming in from our foundry labs. which is the way that we bring in all of the models from Microsoft Research into the catalog.

Now let's look at another announcement. This is Model Router. Model Router takes the selection toil from your head.

So it is a router on top of of the best models, reasoning models, the state-of-the-art models. Right now, only on OpenAI models, but soon across more models. And here, you can see I made it a simple question, where is Hanoi? And it used a nanomodel.

It's a very small model to answer that very simple question. If I ask a little bit more complex question about planning a trip, I have a set of constraints, it's going to use, in this case, well, it used a mini model, which is like a better model for that job. And if I ask a different question, like, for example, what clothes should I bring?

I don't want to go and waste money, waste energy, waste the environment on something like what's the capital of Hanoi. Exactly. So a nano model is tinier. It's going to use a small model again. And if I give it a more complicated query, then it's going to use something like a reasoning model for this.

Now, I mentioned that we took close to 1,000 episodes. Right. And then we processed all of those. We actually used . To take the audio files and generate the transcriptions.

But then we wanted to not just, we wanted to do a Destillation of a model so we can reduce the cost it takes to Process the next 1,000 episodes. Because hansel minutes international. is me and Mandy.

Like, we don't have any money. So we would like to make our cost as low as possible. Upfront cost is concerning to me, but also I got to do another 1,000 episodes over the next 20 years. Right. I'd like that to be very, like, a penny.

So it costs about, like, $100 to do the processing of the first 1,000 episodes. Okay. And then, like, we do it, we did a fine-tuned model, and the next 1,000 is gonna cost us about $1.50.

$1.50. Yeah. Because we're using a custom model.

model. faster and customized specifically for your job. And then right now at build we are announcing developer tier that enables you to do experimentation, reducing the hosting fees and there is no SLA but you can get to do all of the experiments on fine tuning.

Okay. We're also announcing great things around foundry local. Yeah.

So this is another thing that I feel is really cool is that I've got a machine here that is very that is very capable. This is a laptop. This is a...

Studio 2. I've got a 4060 here. I can do some of this work locally. So we've got Azure AI Foundry in the cloud. We've got Azure Foundry local here.

I want to be able to look up people's biographies, like yourself. Now, I could go out there and Google with Bing and then figure out your biography and then rewrite it for myself or copy-paste it. Or I could use a combination of something like perplexity to go and figure out a lot of information about you.

and then summarize that and distill that down. So here we're going to start up Foundry Local with the PHY4 mini model, and we're going to start that on this local service. So this is running on local host. Now, you notice down here in the corner we just saw the GPU kick in, and we just loaded PHY4 mini into the local memory, and we'll go and say Ina. Now, this is going to come...

The first part is going to be a call to perplexity, which is in the cloud, to collect me a huge chunk of data, And then we're going to go and run locally on the... the GPU and summarize and squish all of that. There we go. Now that's all running locally.

Oh. Now this is exciting because you were apparently, is that true? No, that's not true.

It got confused. That's awesome. You're fantastic. So you're not Tina.

Did I misspell your name? Did you? I think I probably spelled it wrong.

No? Let's do it again. Let's try. See, because AI can be wrong sometimes.

I didn't realize that you were such a talent, a multi-talented person. Definitely not Latino. You were in Chicago.

From Colombia. Well, no, you're not. You're Australian.

Oh, no. Hey! There you go. There you go.

So... You can see the little jump right here where the work happened locally on that machine. And then we'll go and we'll see what kind of musical theater background that I have when we run this as well.

And you can notice the GPU memory as well. Cool. And I want to call out that it worked really, really hard for you because you're awesome and apparently I am, did not require a lot of effort, which is profoundly sad. But then I can just go and say...

foundry service stop and then watch over here in the corner and then that GPU memory goes away. So I can do that work locally. All righty. That's chapter one, models. We saw all of the demos.

We have packed sessions on models when going through... through the model catalog, model router, the fine-tuning capabilities, Foundry local, Azure OpenAI, multi-model models, reasoning models, how do you optimize all of your Gen AI applications, take the picture, don't miss those sessions. Now we're going to go to Chapter 2, and we're going to talk all about agents. So, first of all, what's an agent for you, Scott? Scott Niemann- Well, so I'm confused about the difference between agents and tools, because AIs don't have feet and they don't have hands and they can't do stuff, but sometimes you want an AI to to spin through some files.

So I feel like that's a tool. That's a tool. I have been called a tool at work sometimes. It happens. But then I feel like an agent might be able to call multiple tools.

But an agent has a task where it's like, take this and do it, and then let me know when you're finished. I mean, we've been doing process automation for ages. And the difference is now you're having an LLM.

that is helping you drive the control flow of the program. It's making some of those decisions for you. And it has a set of tools that they can invoke. Those can be retrieving information, grounding with knowledge. They can be actually acting, like making an API request, folks.

And then, you know, keeping a thread with memory of the current conversation, the current state. They can also support multi-model inputs, like conversational images, video, text, speech. And they can support a set of outputs that let them be invoked by other agents.

So one of the key things that we have is the ability to orchestrate these agents. I think right now there's a lot of overload on when we say agents. Because people feel like they have to add a whole bunch of Functionality to a single agent. And what i usually like to Recommend is that instead of that, you create a smaller agent. that has a specific set of tasks, a specific functionality, and then that you can validate and evaluate, just like unit testing, and then, like, you can orchestrate them together to execute, actually, a process or a task.

And we have two types of them that you will see, as you know, as you go through the product. The first one are connected agents, which basically treat each other as tools. They invoke another agent, just like an API request invocation.

It gets a result, and it continues with the set of things of thing that they need to do. And the second one is more of a multi-agent workflow. Think about it, this is like where it's actually a human in the loop, it's a process that it needs to follow. It is not just letting the agents kind of do their thing, but actually, like, you know, there's a process.

And that, the second one, is the one that we're going to be using today for the Hansel Minute podcast. Right, because I want that to fit into the way I do things now, where I put my file into a folder in OneDrive, Mandy picks it up, she does the hard work of editing and cleaning it up, then she will interact with an agent that's going to help us as a team member that's helping us do the stuff that we don't want to do, like show notes, then we'll validate the show notes, make sure that they're cool. So they're going to fit into our existing process of guest intake, packaging, and promotion. So we're going to show a whole bunch of these different set of agents, pieces of them, of the functionality that we've built. The voice orchestrator, we just saw bio generator, the guest sourcing transcript, the show notes.

And we're going to see different parts of the product as we go through this. Now, There's multiple ways in which you can build agents, and I think this is worth a conversation. Yeah, I would say that it is a little bit confusing because that word agent gets overloaded.

If you want to feel smart at work, just say agentic in a meeting, and people will think that you're fancy. This is an agentic flow. The way I've been thinking about it is when we started thinking about the cloud, right?

You can do infrastructure as a service where you're buying a virtual machine, and that's very low level. That's your infrastructure. Then there's platform as a service where we're talking about Azure AI Foundry where it feels like there's a lot of power. It's a managed service. You don't have to think about Your own scaling your own gpus.

You don't have to think about Managing your entire fleet. You don't have to think about Geo residency, all of these different things. Exactly. And then software as a service would be more like co-pilot studio where I basically have this run time that I don't have to think about much of anything except the business problem. So we're sitting squarely at the platform as a service position, which fits well with the way that I do stuff.

And then at build, we're announcing the GA, general availability of the azure ai foundry agent service, which is that platform as a service that enables you to create agents declaratively and, you know, run them on the cloud, use different set of models, use different set of tools. The key difference from what we've seen being offered in the past is the enterprise readiness and the capabilities that it has to bring your own file storage, to connect to your enterprise systems, to support your organization's built-in network. You can bring your own memory, your own thread storage. It supports authentication.

So, like, if you have to authenticate with an API, it supports passing all of the tokens. as well as connecting with a set of tools, like knowledge tools, like Fabric, SharePoint, Bing for web grounding, and Azure AI Search. Now... Another key thing I wanted to mention is how the foundry agent service is interoperable. Whether you are building the agent in our platform or you Build an agent in another platform and you want to Connect it with a foundry agent, we support a2a, mcp, we are Working with crew ai to support their languages as well in Addition to supporting the open ai assistance and responses api.

Let's see this in action. Let's show them some of the some of the agents that we've built. All right, so we're switching over to your machine, and I think show notes, right, is the one that is hurting me the most. So we're here in agents.

We're going to see the list of some of the agents that we created, and we're going to show, yes, show notes. Let's show them show notes. First thing that we're going to see here in show notes is that we are actually, for this particular agent, we're using the nano model that we're using.

we fine-tune in the past. Yeah, and I really want to call this out, because this is important to me. I understand that, yes, I could upload my MP3 into a giant chat bot.

That's a thing you could do. But that is really bringing a sledgehammer to a scalpel. It's going to cost me way more money, it's going to use way more power, it's completely unnecessary, and if I get to do it a thousand times, it's going to be a problem because I want to spend as little money as possible.

This is a, is it distilled is the word? Yep. The distilled model that is a tiny model, the smallest possible model that doesn't give me relationship advice, it doesn't tell me how tall Brad Pitt is, it is trained on how to do show notes. And that's important. It does the exact thing I ask it to do and no more.

So we're going to have paste. This is the transcription of one of your shows. Okay. And then we're going to see... Ooh, which show did you do?

This is actually a show with one of my coworkers and dear friends, Raji. Oh, show 881, Raji Rajagopalan. We did a talk about her book.

Exactly. So we can see here it's generating the show notes. It has all of the different takeaways and notes.

Hang on. Notable quotes. Did I say something pithy?

No, I think you said something very smart. Smart things happened there. Let's zoom in. Imposter syndrome means you're doing... something okay, that's good.

Thoughts are not facts. Ooh, that's deep. I think that's Raji, not you.

Oh, that was probably not me. Okay. There's a couple of cool things that I want you all to see here. View code. It opens up a view of this particular interaction that we're having.

You can open in VS Code. It will open up a sample project for you right there in the browser. It has Python and C Sharp.

I want to make sure to shout out to C Sharp. You know, he was very happy. I'm very happy when he's here. And then the other thing i want to show is agent catalog. So here in case you want to get started with a different set of Agents, we have microsoft build templates.

We have also several from our partners. You saw in the keynote the one that was coming from Stanford, the healthcare example. So we have lots of different set of agents for you to get started. But one of the things that we did, and I'm going to go back to our agents here, my agents, was an orchestrator voice agent.

Yeah. You saw me running things from the command line. I can do that. I can run a Python app.

But I want Mandy on my team to be able to interact with this in a way that's comfortable. And we thought it would be cool to have the orchestrator router agent that has all of things to be something we could talk to, which would be cool. Well, let's try it.

All right. Hey, are you there, and can you give me a list of the things that you can do to help me with the podcast production? Of course, Scott, I'm here to help streamline and enhance your podcast production.

Here's a list of what I can do for you. Guest intake. Collaborate with the guest intake agent to brainstorm and refine a list of potential guests to invite on your show.

Help you create personalized outreach strategies to approach notable guests in industry, society, and technology. Speaker bio creation. Use the bio generator to create a draft speaker bio. I'm going to go ahead and respectfully interrupt you. Hang in there for a second there, bud.

Okay, cool. Awesome. So that's one of our agents.

And then the next one we did was around search. Yeah. So this was something I thought was really exciting.

Let's switch over to my machine here. Remember, it's my data, and I'm running it in my my Azure AI Foundry experience. So the model that we went and distilled is mine. And we can ask it questions now.

And this was freaking amazing. Because my search right now on my site is basically I just load a giant JSON file and I can search for people's names. Now I can chat with 20 years of stuff and I can say, is Scott nice? And it's going to go and give me an AI-generated proof.

Yes. That I'm... That you're actually nice?

Well, they might say, well, I don't know. He's probably not very nice. So this is using Azure AI Search with...

But it looks... Let me see. Look, it's got proof. Look, it's grounded.

It's grounded. I was nice twice in a thousand episodes. Well, that's an easy question, right? Like, what are the characteristics of a person named Scott?

Well, it could have been any Scott at this point. No, this here is a sub-query. Let's ask a more complex question. Yeah, let's ask a sophisticated question, because this is more than just rag. Right, because RAG is this augmented retrieval over data, but what if we asked a complicated series of questions like, you know, over the last 20 years, are there any recurring ideas or universal truths in the last thousand episodes?

that have emerged. So this is not just like a one-shot. No. This is what happens with RAG that we've been doing. It's like going from 10 links to just like you get one shot to get the answer right from your index.

Well, one of the things that we're announcing... It's dropping in proof. All of these different set of Let's see. It's done.

Let's go look at this. So down here, look, it's Split it up into multiple questions, spent time on each Individual question, broke the sub queries up, and those sub Queries are not sub sentences. They're sub queries, then it Goes and retrieves the answer for those queries and brings Them back into the context of the model so it can give you the Answer.

Pragmatism, mentorship, community. Cool. This is what we call agentic retrieval on Azure AI Search. Yeah, that's really cool.

I'm going to be able to ask a lot of interesting questions. Okay, let's go to the next one, which is how we did the transcripts. Oh, yeah, yeah, yeah. So you can go in here, and you can define a schema. This is if you don't have 1,000 of something, right?

Let's say that you want that kind of feature, but you don't have a large corpus to create. So we define the schema. We're using content understanding. Which is a service in azure ai foundry.

And then we process the same podcast with our friend, Raji, and then we created all of the different set of field Results, the title, the summary, and some of the quotes. here as well? No, we did topics, summary, and title. And then we have here, it's identifying speaker one, speaker two. It turns out it identified me as speakers two and three because apparently my advertisement voice is different from my actual normal voice.

So when I did the ad, it thought that was a different guy, which I thought was crazy. Yeah, so you can see here, you've got all of this is Raji speaking here, and then here's me jumping in. So it's identifying different speakers. And it's pulled out. topics about mindfulness and sustainable habits and all that kind of stuff.

So that's another one of the agents that we got built connecting content understanding with an agent in AI foundry. Now, once we got all of those agents, did we miss showing anyone? No, I think we're good.

Once we got all of those agents, we wanted to wire them up together in a multi-agent process. And for that, we're using semantic kernel and origin. And we have actually a lot of time right now. So let's make sure to dig in.

And look at some of this code because I want to make sure that we're showing real code. Because everything we built here is legit. So this is C sharp.

Before you show that, let me make a couple of comments on semantic kernel. So we've been working on agentic frameworks for quite a while. We released autogen and semantic kernel in 2023. And autogen has been coming in as an agentic framework from our Microsoft research organization.

Semantic kernel is the one that we've been recommending for using in production. as we bring all of that innovation. We are bringing them together into one framework. And at build we are announcing they are using the same run time.

And then we are going to convert them into a single framework as We go forward. Now, Here, we're using semantic kernel to bring all of these agents together, wire them up, and which we have the semantic kernel library, and we have all of the different agents that we're using. Yeah.

So over here, we've got our content agent, which is going to go and do our show notes. We've got link verification, because we want to make sure that this is grounded in reality. When it runs through the show notes, oftentimes I'll say things like...

Well, that's great, Gina. I'll be sure to put a link to your book in the show notes, and then you'll never hear from me again. We're recording right now.

Let's ask. What do you want to put in the show notes? We want to put in the show notes, one, the reference for the session for BRK 155, which is our build talk.

Right. We want to put the link to the AI Foundry website, which is ai.ashl.com. Right. But let's give it a little bit more complex. I think that you should put a link to my YouTube channel.

I'm Scott, and I'd like you to put a link to my YouTube. Oh, and then put a link to HanselMinutes episode 881 with Raji as a callback link. Yes. So make sure you do that, please.

That'd be cool. That'd be cool. Yeah, yeah, yeah. Okay, so this application here has these agents identified, and if we go and take a look at them, they are listed as semantic kernel functions, right? And then semantic kernel includes the ability to call an agent from it and participate in these kind of workflows.

And then from them, we'll go and then just spin out the result, and then the result will end up being in a markdown file called show notes. Okay? Perfect.

All right. Where are we here? Okay.

Oh, and then I wanted to show the yaml. And the mermaid chart. Oh, the mermaid chart is good.

Okay. So we all know that I don't like yaml. So we click here, and I don't have to look at yaml. So this is a mermaid chart, mermaid is great, that is showing me basically the results of that YAML, showing that, okay, we're going to go and we're going to do the transcript, we're going to analyze the content for the show, generate the show notes, we're going to go and verify each one of those links to make sure that they are legit, and then we're going to summarize the result and that will be the complete thing. That's awesome.

So that's pretty cool. So all of that will happen here. While you're at it.

One of the things that I forgot to mention, we're kind of speeding up because we're like, we have so much content. We're very excited, but we have a lot of content. You know, on model router, our current analysis tell us that when you're using, compared to GPT-401, and you're using model router, you can get up to 60% price off in the price.

Okay, it's more efficient. More efficient, less expensive. What am I saying? It's less expensive.

Well, it's got a less environmental hit. It's more efficient. It's smaller. The idea is you put the router in front of it and it's a load Balancer.

That's the word i was trying To find is cheaper. It's 60% cheaper. And you only get one to two points hit in accuracy.

Which is pretty significant. Yeah. Help me understand what percent of accuracy turns you into an Australian singer.

I don't know. I think that's a very low one. Okay.

Now let's show. Do we want to stop our podcast? Are we finished with our show? What else do we want to say? We want to say we talked about models.

Okay, we talked about models. Digging into agents. What are the key things that we said about models? Well, there's a metric ton of them, 11,000 of them.

There's models that are specific, not just large language models, but there are models of all different flavors, including specific models for things like protein, health care, finance, all those things. You have... I think you saw 11,000 models in there.

All of those models can be fine tuned. One of the things i was interested in that i didn't Understand was you showed a graph like this on their fine Tuning, but i don't understand how you know that it's correct Or not. That one we specifically Didn't do an evaluation run. You want to get the loss to zero And the accuracy closer to one. Can you show me it not looking Good?

i think it's always cool when cool when we do demos, but show me a failure. We got some failures. When we were doing the runs.

Nobody likes to see a demo that works. This one didn't go through. That looks like i'm not a Scientist, but that doesn't look like a good line. Exactly.

The line was heading towards the Middle and not necessarily trending towards one. This is saying whether or not it generated show notes correctly. What we do is verify that against actual show notes and Transcripts. This one is a good one. We talked about all the things and models.

We talked about local. Then on agents, we talked About the agent service going ga. We forgot to mention, oh, let's look at all of the things that They can connect to. What did we forget? Yeah, exactly.

And we can connect them to a Whole bunch of different things. So knowledge. Azure eye search, microsoft fabric, sharepoint. grounding with Bing search.

You can do custom search, for example, if you just want them to go to a specific set of sites and not to like entire whole web, then you can just say custom search. And then a few other partners. It can't be overstated how important grounding is in this kind of context because not only do I want to save money and be efficient, but I don't want my chatbot to talk about things that aren't this.

It has to be restricted. It has to be grounded. And if you're creating a chatbot for your own business, you're going to want to have the same experience.

You don't want your coffee shop chat bot offering relationship advice or opinions about someone's shoes. It needs to focus on what we're doing here. And in this case, the Hansel Minutes one has not only been created into a nano model, but we're going to ground it in the material that we have, the knowledge that we have available.

In addition to knowledge, you can add actions. So, like, if you have some deterministic workflows that you've already created in Azure Logic Apps, you can bring them in and connect them to. That's a great point.

This is a thing that some people who are of a certain age might think that, well, are you reinventing cron jobs? Are you reinventing for loops and things like that? I have existing processes that work just fine. I'm not going to replace them because they work great. But I have Azure Functions.

I have Azure Loader Apps. What I'll do is I'll tell the orchestrator agent about those things so then it can have those available as tools. And as long as you can describe them with an API, you can describe them with an OpenAPI v3.

And then you can connect to any of those. I think it's time for us to go back to the... Okay, so let's switch back over to my machine.

Do you want to call that? Is that the end of the show? I think we should do that at the end of the show. Okay, so that's the end of this show on Azure AI Foundry. Hopefully you made it this far into our podcast.

Did you actually say my name correctly on that one? Lina? With a Y. Arenas. Y-I-N-A.

Is that correctly spelled? That's correct. All right, cool.

This has been another episode of Hansel Minutes, and we'll see you again next week. That's going to be like identified as speaker number four. Okay, we're not done yet.

We're just going to process that through all of the different set of agents that we've created. Then we're going to go and talk about our third chapter, which is all about observability. So I'm going to kick off a process. You're going to kick off that process.

I'm popping this over, putting recording.mp3. I want to call out... It was just created and it's real and then we're going to Start that, kick this process off here. I'm going to kick off this Dot net process. This will run for a little While and i'll let you know when it's done.

Some of the other things that we saw, the azure, we forgot to Mention the visual studio extension. We can show that. We can show that.

The api, the sdk, the Visual studio code extension, semantic kernel, more sessions for you to go and learn about that. Then we're going to go now to our final chapter, which is Our round foundry observability. So for observability, we have The set of tools to support your entire development life cycle.

Whether you're starting to experiment or thinking about Going to production, you want to make sure that you can Continuously monitor the quality and safety of your solution. So we have those tools right here for you so that you can Generate the traces, send it over to azure monitor. And have all of those capabilities that enable you to Do debugging and make sure that there's, you hit the right level Of reliability that you want for your app. Does this use like otel, open telemetry? Yes.

It's important to call it. Otel is one of the biggest things for folks in the Observability space, if you're not familiar with open Telemetry, it is this kind of quiet storm that is just making Everything better. In the old days, it was just Log files and grep and a regular expression. Right, and that was how we observed our software.

And everyone knows that if you've got a problem and you've decided to use a regular expression to solve it, now you've got two problems. What OpenTelemetry does is it's effectively distributed log files, and when you have large, complicated, distributed systems like this, you want all that to come to one place. So you could even use a .NET Aspire dashboard or Grafana or any of these systems, any existing observability system that you have, you could observe the work of your agents. So when it comes to agents, agents have threats and run. Threat is the common memory for that conversation.

And each run is the turns of that agent. Like a conversational thread. Exactly. And every time that threat is going through, we can see not Only the inputs and outputs and the metadata that is being Generated across every single call, but we can also see the Evaluations coming in.

So for this particular case, We turned on evaluations around intent resolution, relevance, We have New set of evaluations as well that are releasing as part of Like the announcements that we're going to build that Include intent resolution as well but also task adherence Because you want to make sure that your agents actually stay On task and they're not going around just doing other things Or for example indirect jailbreak where if you have an Agent that is going out and getting information from the Web, what if it gets a piece of code that it shouldn't? So all of these different set of things we have. In the example earlier where we went out to the web to gain Information, we spelled your name wrong.

I spelled your name wrong. And then we got the wrong Gina. At some point we could have a Score assigned to that and it would say i'm not sure if i'm Confident in that.

Exactly. It will tell you. It will have a lower quality Score on, for example, relevance.

This one was four out of five. It feels that it gave a decent response but it could have been better. It does not omit any key details and deserves a high score. Okay. That's cool.

Yeah. One of the other things that we have in addition to that is... With your cicd pipelines. So we have here we can see we Have the evaluations, we have the evaluation overview where We bring in the different set of evals that we've run on our Agent and it's fully integrated with github as a github action. So every time that you're pushing new changes into Production you can run your pipelines and have those Evaluations run for you as well.

And then we have integration with monitoring as well, like where you can see all of the operational metrics, including your tokens, like the average inference call and your duration of your inferences calls, the errors. Like, we can see how we were, in this particular example, our quality significantly improved over the last, you know, the last few days. Yeah, we've been working hard.

As we've been working on this demo, yeah. All of these is also integrated with azure monitor. If you have your dashboards, you can go there and see all of your Yeah, i wanted to call that out.

This integrates with azure Monitor's application insights, which is something i'm already Familiar with. We talked a little bit about Infrastructure as a service and we talked about platform as a Service, which is where we are sitting right now in the ai Landscape. Then we talked about software as A service where you're going to see things like copilot.

I have found in my time as a cloud person, That I really sit in that platform as a service bot. I run my websites on Azure App Service. I use Azure Application Insights.

This fits for me in the kind of stuff that I was already doing, which is why I was excited to be a part of this talk. Platform as a service is a sweet spot. One last thing that we will mention, which is very, very important, is what do we do around end-to-end security, right? And the integration with Foundry across all of the different stack of Microsoft security products. First, agents that you are creating are getting a specific identity on entra.

So it's not like, so for those of you familiar with Azure active, with entra, with entra, all of them, all of the Different set of entities have, whether it is users, groups, Devices, they have an, they're registering the directory, right? Now agents are going to have their own type of id where you Can assign entitlements and do governance on top of them. You can see.

I'm giving this agent who is acting on my behalf a specific Set of permissions and not more than that. The second one is around sending all of the information to purview. So things like data labeling, data protection, and the full Integration with all of the security products like purview And defender are part of what we have as capabilities in foundry.

All righty. We are going to start the Sessions around observability and governance. Please take a look.

We are going to go way deep And now let's don't forget to go back and. Okay. Let's switch back over to the Factory.

One, two, three. So i'm a little sad. Your name is so challenging Apparently.

Straight to yale. Nobody got that reference, but that's fine. So when it generated that we discussed the platform's Capabilities, models, agents, and observability tools.

I shared my experience streamlining into the production process, and we discussed various features. So this is all, other than the misspelling of your name, totally legit. And we've got our topics. Then it went and it generated a, look, it spelled that correct.

That's cool. Made it VTT, which is not just a text file, but an actual transcript file that I could use on YouTube or I could put into my podcast. You can see our different agents over here.

That agent has a leading space. And then we have our link verifier, which is going to check all the links. So we scroll down a little bit. I like a nice ASCII table.

I just feel strongly about that. Oh, did we say it was special, or does it just think we're special? We are special. We also said it's special.

In this special episode, nearly 1,000. Okay, so this is all good. So it gave us a summary generating the notes.

I want to open those notes up. Then we go and verify each of these. Looks like two of these didn't work, and they have a warning. We'll go and dig into those links. And then it went and summarized them.

So let's jump into the output folder. Here's the show notes and the transcription. We'll go and look at show notes. And then let's do a little markdown magic. Ooh, rumpcious.

Takeaways. Do we teach it to do quotes? There we go.

Everything we built here is legit. It is indeed. It is pretty legit.

Less expensive. That's good. Putting the crowd in the front.

Grounding. Yeah, this is all cool. Ah, resources mentioned.

Scott's YouTube channel. No link. Ah, scandalous. Oh, and then I don't know if I told you this or not.

Actually, I did not. I actually was given permission last night at 11.30 at night to let people know that Notepad is going to support Markdown. There you go.

So we've got our lovely thing loaded into... I feel like the ai foundry notepad collaboration. I love that. I love that collaboration.

We're going to make markdown. That's pretty cool. I dig it. All righty.

We got ourselves a show. That saved me a lot of work. What are you going to do with that, scott?

I'm going to go and add a couple of small changes Because i'm the human in the loop. I'm going to give it to mandy who will edit the show and we'll Try to put it up as soon as we can. Sounds good. Very cool.

Yeah, so Gus, let's do a recap and talk about where people can learn more. So remember that we tried to create this podcast factory that would make my life easier. Hopefully the two or three hours a week that I spend on this turns into 10 or 15 minutes and just an agent that does this work for me, and then I'll double check the work.

One of the things we didn't get a chance to show you were some ideas on guest sourcing because I realized that there's a trend line on my show. And we can get recommendations about who else I might want to have on the show. Shocking that it hadn't recommended you until just now.

You may have put your thumb on the scale on the AI. We've got the bio generator, and then we also could have an AI doing scheduling, because it's really challenging. There's a lot of emails going back and forth, talking to guests, trying to get them scheduled for the show. We've got a transcript generator that we saw, which is a custom model, and show notes, which was done on GPT-41 nano show notes. Fine-tune model.

And then the link resolver, Make sure the links it puts up are legit links. Make sure they don't 404. And other things we could Potentially do would be generate copy for social and things like that. Localization has been a really interesting thing. We have models that could do that as well.

Translation, bring it to different languages, that's Another thing we could have done. All righty. Well, with that, this is a way in which we, you hopefully saw saw today how Azure AI Foundry can help infuse AI into a different set of applications.

We build a podcast factory. What factory are you going to build? How are you going to bring that investment to be instead of return of investment, return on your effort?

We're hoping that you're going to see how AI can help reduce toil, can help take away all of these set of tasks that we don't necessarily want to do. You mentioned dumb, dangerous, and? Dull, dangerous, and dull. I want to do the fun stuff. And then we have a lot of customers like gainsight, Safier, heineken, more that are using ai today.

Over 70,000 customers are using azure ai foundry to bring ai Into their organizations. Over 10,000 have already used The azure ai foundry agent service and are creating now Millions and millions of agents to like bring in and get Significant input, significant gains into their processes. We want to leave you with a call to action. Are you ready to create the future of AI?

Thank you, folks. Yeah. Thank you very much. Take a picture.

Take a picture of that and make sure that you remember all of the notable quotes from the show. And then sign up over here, my friends. Thank you. Thank you, folks.

Good job.

Transcript for:Azure AI Foundry Overview and Demo

Transcript for:
Azure AI Foundry Overview and Demo