Transcript for:
Exploring Generative AI in API Management

Hi everybody, welcome to today's podcast for Azure OnAir. We're at the Integrate Summit and today we're joined by Andre Kamenev, who's part of the API management team at Microsoft. Thank you for joining us, Andre. Thanks for having me.

Andrey is joining me and we're going to talk about generative AI and the relationship between that and API management. To start with, Andrey, can you just tell us a bit about your role at Microsoft and what kind of things you're involved in? Yes, I've been at Microsoft for almost eight years now.

I started as what you typically call in the industry as a pre-sales engineer. So I was a cloud solution architect working with customers on Azure and then I switched over to the engineering team. And now for the last year, a little bit more than a year, I work in Azure API management. I'm covering different API types like GraphQL, JRPG, the innovation that we were bringing into the product. But lately also on all of the Gen-AI capabilities that we built into the product.

One is Gen-AI gateway capabilities, which are helping customers to build products with Gen-AI, with open AI models and stuff like that. The second is the innovation in the co-pilot space where we tackled a couple of problems specific to Azure API Management with the co-pilot. So if I put myself in a customer's position then, so I'm building a solution that I believe I need Gen AI in it.

Tell me why API management could be important in that kind of architecture. Yeah, so when you start building intelligent applications, that's what we call them, like applications that use sort of large language models. Typically if you go with Azure OpenAI, like Azure makes it really simple for you to deploy a model, deploy endpoint.

You grab your API key, you import the SDK and you're good to go. But what's important to understand with Azure OpenAI and other LLMs is that there is a main currency that you're using, which is tokens. So tokens represent a chunk of text that model generates, and also when you send a prompt to a model, it also uses these tokens. Everything in this LLM world uses tokens. So everything is like quotas and limits, everything is expressed in tokens per minute.

That's also something that you're paid for. like if you're in a pay-as-you-go model you pay for token usage or there's another model for example with azure open ai service there is a model where you can pay for reserved capacity which is called provision throughput units and you start small you start with a single application as a customer you integrate as i mentioned you just import api key you're good to go but then you what if you want to scale what if you want to add more applications what if you want to add more um OpenAI endpoints, for example, you want to have one endpoint for your customers who are located in Canada, another one for customers located in Europe. And now you have different challenges.

Because as I mentioned, you pay for tokens and tokens is the main currency. Now you want to understand how your applications or multiple development teams, how they use those tokens. You want to track usage, you want to track usage to then do cross charges, to understand the consumption of those tokens. That's one thing.

um second thing is you want to limit uh those um applications you want to apply some sort of rate limiting to those because um as i mentioned on the model level on the ipmn or sorry on the azure open air level you have these tokens per minute which means that application a can consume the whole quarter and that means that for application b there is no tokens left yeah and that breaks the customer experience the user experience for those applications so you want to limit this These are the two challenges that we saw across our customer base. There were mostly requested features. And then, as I mentioned, you can have multiple Azure OpenAI endpoints. How do you load balance across those?

How do you make sure that, as I mentioned, there is a PTU, or Previsioned Turbo Unit, you want to use it at first because you paid upfront for this provisioned capacity, but then you want to failover to pay as you go instance. How do you solve that? So that's why I introduced GenIAI Gateway capabilities in Azure Pay Management to solve those.

particular challenges. So we introduced a couple of policies, for example, token limit policy, where you can specify that, oh, I want this application to have 1000 TPM limit on the subscription level, for example, because in Azure Web Management, you use subscriptions to identify different applications or teams, which is extremely useful in that case. And then cross charges like metrics.

You can do that with with general login requests, but We wanted to simplify it, so we introduced a new policy where you just specify application insights, namespace, and then we send those metrics for you to use to application insights. Yeah. And then probably you saw that recently we announced the GA for low-balance or circuit-break capabilities, which are extremely useful for this scenario that I described where you have multiple opening endpoints.

As you go along. Yeah. Yeah, yeah, yeah.

I've played around with that. It's quite cool. So when I hear you describe that scenario there, it's really, in my head, I'm kind of thinking, well, these are all the great features that API management's had for a long time, which really make, you know, as an API developer, I'm building my backend service.

And API management makes that easy for people to use it, and there's many features in it. It sounds very much like it's taken advantage of everything that was in API M already. to make my investment in generative AI usable by internal and external developers.

Is there any features you'd like to call out that specifically help that AI use case and beyond some of the things that people might have used for a typical application development? Yeah, so as you mentioned, what's important to understand here is that OpenAI API is just yet another API. It just has certain specific requirements. And for example, there is a problem with authentication as you mentioned we already have capabilities to build into the product that can help uh tackle this challenge yeah um i don't want to share my api key with like tens of different applications because i have only one api key for my opening point so what i can do instead with existing capabilities and api management i can configure managed identity authentication to open ai and now i don't need to store this key anymore i don't need to share this key with anyone anymore which vastly improves the security of the overall architecture.

Then on the specific policies, as we discussed already, like load balancer is not that specific, it just helps with this multiple OpenAI endpoints challenge. But then this Azure OpenAI token limit policy that we introduced that allows you to set a rate limit and not only on the rate per minute basis but also on token per minute basis. Yes.

So as I mentioned, we have multiple applications, we assign multiple tokens per minute per application, and that's how you basically solve this specific challenge, which comes up with a generative AI application. Yeah. So what about in terms of the adoption by customers? So I'm guessing you've had quite an aggressive ramp up of customers starting to leverage this use case. Can you give us an idea of kind of...

How many customers do you feel have done this kind of use case already? Are we talking hundreds, thousands? And what do you expect the growth to be like over the coming year?

I guess that's going to be really big. Yeah, so we already saw customers who were using Azure Pay Management even before we introduced these new capabilities, which we introduced during build, so a couple of weeks ago. But we already have hundreds of customers using AI with Azure OpenAI and other models, frankly.

So, yeah, I expect like a huge exponential growth because customers are adopting this technology. We were in the stage where customers were just trying out like setting up a couple of POCs here and there. But now there's scale and now they need a solution like Azure API Management to solve those challenges.

Yeah, I guess that, you know, with that aggressive adoption of AI, I guess API M really accelerates the maturity of the type of solution you can build. So I can really see that taking off big time. What about...

In terms of roadmap then over the coming months, what are the kind of things that excite you about what you do and that will help customers in this space? Yeah, so there are two challenges. First is there are a lot of security challenges when it comes to journey.

But yeah, there are things like prompt hacking, prompt jailbreaks and stuff like that. So we were looking at how we can solve this problem for our customers, maybe introducing some policies to, let's say, rewrite the prompt, rewrite the system prompt, maybe override it somehow, or maybe just. based on the regular expression to disallow or allow specific words or kind of keywords or list of keywords in the prompt or in the completion.

There's also a lot of challenges for keeping the content safety in place. For example, PII detection, detecting harmful content, if there is some leakage of information from the company. So we're looking at this as the next set of features.

that's one thing and um and the second thing is um currently we're focusing mostly on natural open ai which makes sense because that's not your service but there are also a lot of different um llms out there so we're looking at how we can integrate with those as well keeping the same capabilities with limiting the token usage um understanding the usage of those lm models and low balancing across multiple models so that's something that we are looking at how to support like i guess From what you described from my other chat with Mike, it sounds a bit like there could be a lot of Defender for API management opportunities in there as well about protecting that. Because that open API is going to be quite a key asset for a lot of companies, isn't it? And you're going to want to put really big... secure security. Do you have any ideas on things you'll be doing in that Defender space?

Yeah, I'm not sure about Defender space because Defender is kind of specific to APIs. And as I mentioned, there are specific needs when it comes to LLMs. As I mentioned, content safety is very specific to something that is generated by a model or even user-generated content. There is no easy way to check it.

So that's why you need to have other models to check the responses which are coming from LLMs. So yeah, I believe that we will be most likely will be integrating with some of the capabilities that already exist in Azure. So yeah.

I guess when I was thinking about that question, there's a question with the content about where's the line of separation between what's really the responsibility of OpenAI versus what's the responsibility of API, isn't it? So if we pivot for a second, then onto you. AI within API management.

So can you tell me about some of the investments the team have been making with things like Co-Pilot to help people build better APIs? Yeah. So I believe that during last Ignite, there was an introduction of Microsoft Co-Pilot for Azure. This is a broader Co-Pilot for everything which is related to Azure. It was announced as a gated public preview, but I believe right now during build, they announced that it's fully open.

So we integrated with this Microsoft Co-Pilot for Azure. Through what is called skills. So we introduced a couple of skills for Azure Pay Management, so that's kind of broader Microsoft Copilot can use, can identify that there is a customer or there is a user in the, for example, policy editor in Azure Pay Management, and they can ask questions about policies. We started with two scenarios. Our first is we understand that sometimes it's really hard to write the policy to understand what kind of policy should they use for a specific need so we introduced again generate policy skill into the microsoft copilot for azure where you can just open up a chat window and ask like for example um copilot i want to generate a policy for me to implement a rate limit you know 15 requests per second for example and then it returns a xml document that you can just copy and paste into the editor so that's one thing and second thing is um we over a lot of interactions with customers we figured out that sometimes it's kind of hard to understand what's already in there um in the in the policy configuration especially if it's not something that you created or someone from your team created previously and you're trying to understand what's going on there so we introduced a second skill which is explain policy so you can select the policy snippet right into the editor right click it and ask copilot to explain it to you and then it explains it pretty well um in plain english like what's what's going on there and it's not just explaining kind of what's where kind of the policies are used from documentation.

But it's also like describes that, oh, you have this name named value that is storing this or there is a validate job. It looks for this specific claim in the policy to like allow or disallow specific actions. So I guess like with it, with the policies, because you've got that hierarchy where you've got the root policy and then the API policy, then the operation, I guess that kind of co-pilot use case is really going to help you understand sort of like reverse engineer and if an api somebody else might have built and you can't kind of quite get your head around it it's yeah there's there's certainly a problem like of scopes because right now we are just focusing on whatever you have in the editor like right in front of you but one of the next steps could be like to expand this explain uh explain handler to like say like what is the effective policy for considering all the scopes that that i have here for this api What's your thoughts on things you might do in the future with Copilot then based on the feedback you've had so far? Yeah, so we definitely have some room for improvement for generate and explain policy handlers or skills from Microsoft Copilot for Azure.

So we're collecting feedback constantly. We just, as I mentioned, it went public preview. So we're getting more and more data to understand like how well it works for customers.

And then there are a lot of different use cases honestly, like we hear a lot of like generating specifications or generating SDKs or explaining some other configurations in Azure Payment Management. So yeah, there's a lot of place where we can apply co-pilot. I think one of the things I'd find quite interesting, if I can give some feedback, would be something like a what-if scenario. So if I've got co-pilot, I'm thinking about a change and it knows the API I've got and I'm like, what if I did this, what would happen? So to simulate that, I think that from a developer perspective, the kind of red-green refactor process of making a change, testing it, breaking it, fixing it, you know what I mean?

It's quite a time-consuming process. And if the co-pilot can help me, what if some of those scenarios would be quite an interesting use case? Yeah, testing is definitely one of the key scenarios that we hear from. customers and it's not also not only in the Azure management kind of portal but also in developer portal side like as a developer I want to test this thing and I don't really want to spend a lot of time writing those tests yeah Copilot can generate a lot of stuff for me because Copilot already has all of the context that I need to perform like yeah so if I can ask one more question Andre on just going back to the previous topic a little bit like the customer scenarios and do you find that the different models that I might be using for AI would affect things in API management that I might do to expose that or do you find that it kind of becomes very generic in that it doesn't matter too much which model you use, API management works in the same way? Yeah, we definitely see that the whole industry is going in the direction of kind of unifying the interface to those models.

So like for example, if you look at the like Lama or Mistral. They're pretty similar in the request response structure. We also see some innovation happening in the Azure AI Studio where I believe during build as well, they introduced the new inference API, which you can think about it as a common API for all of the models.

So whatever you deploy in Azure AI Studio, whether it's like Lama, Mistral, or here, we give you a common API that you can use. Yeah. And I think there will be a lot of unification happening going forward.

And I guess from a use case perspective, if I think about scenarios you've seen yourself, is there any particular customer scenario or demo scenario that you found really interesting? I'm just thinking to give listeners the idea of what can I do with this technology. We're talking quite low level about some of the models and some of the things that API... gives you but what's one of the interesting customer use cases that you've seen yeah there are a lot of assistance scenarios like chat assistance like a i have this database of documents inside my organization now i want to train this model or use this rack pattern that was mentioned here at integrate to train on this documents and retrieve data from from those documents easily for my employees within the organization for example or for external customers there is a question about the product that I'm selling as a company, now this co-pilot-like experiences or chat experience, now they can answer those questions pretty great instead of just having a call center responding.

So that scenario would API M's role be, I've got my documents, my data about my processes, OpenAI sits on top and then API management exposes it so I can have application developers can write an app that can execute those queries kind of thing. So yeah, we're also looking at adding more and more capabilities into Gen-EA Gateway. For example, this retrieval augmented generation pattern, you have to write everything in the code essentially, but you can offload this to the infrastructure layer, in our case as repair management, to do that instead. And I didn't mention that the policy that we introduced as a preview is semantic caching.

For example, there is a regular caching, but there is also a semantic caching where you can save a lot of tokens and a lot of latency on the request response. That's also typically something that is implemented on the application level, but with Azure API management capability, you now can flow the Azure API management and your developers can just focus on writing code for the interface and just general logic for your application instead of thinking about how should I write this for a generative AI use case. Perfect. So I think we're just about at time there, Andre. Thank you very much for joining us today.

I've really enjoyed that chat. It's a space I've only kind of had a chance to dip my toe in a little bit, so it's nice to hear from somebody on the product team telling us about all the features that you've got. We hope you enjoy Integrate this week and hopefully we can get some future chats with people from the team to delve into these topics a bit more. Thank you.