Innovative Structured Outputs in OpenAI API

Hi there. Hello, everyone. We're here to talk about structured outputs, an exciting new feature we shipped in the open ai api in august this year. It's been a huge unlock for developers working with llms. And in just a few short weeks, hundreds of thousands of developers have integrated it into their applications. My name is Atiyah Leti, and I lead API design at OpenAI. And I'm Michelle Pocras, and I'm the tech lead here for the API. Today, we're going to talk about three things. First, we're going to tell you why you need structured outputs. Then we'll talk about how the feature works. And finally, we'll tell you how we built it under the hood. So let's get right into it. Let's start at the beginning. The year is 2020, and OpenAI has just launched GPT-3. GPT-3 can write emails, draft blog posts, and generate movie scripts. It was great for text generation. Quickly, developers, all of you, started to find some exciting new applications for this technology. From generating in-game scripts like AI Dungeon, or drafting marketing materials like Copy.ai. Fast forward three years, in 2023, we launched GPT-4, a new breakthrough in LLM intelligence. For the first time, models were capable of advanced reasoning, following complex instructions, extracting information from long documents, and taking action on behalf of users. Developers took it to the next level, from building AI-powered productivity tools like Cursor, customer service agents like Klarinus, and language learning apps like Duolingo. All of these products had one thing in common. They were connecting LLMs to the outside world, from your code base to external APIs or on-device actions. To do this, they needed outputs to be structured, typically JSON. Here's an example. Let's say you're building a personal assistant, and you want to convert the user's message into an API call. The user might say, play the Beatles. Here's what you expect. You want a JSON object that says, play music for the API and the Beatles for the artist. But too often, here's what you get. The model begins with a preamble, saying, sure, here's the JSON to call the play music API. Now, this isn't particularly helpful for developers, since we need just the JSON, not the text surrounding it. This is a problem. LLM outputs are not always reliable, making it hard to integrate them into your apps. Now, as developers, we've tried all kinds of tricks to solve this. Some of you may have written prompts like this. Or even this. Now, sometimes it works, and sometimes you have to go all the way and write a custom markdown parser. That's obviously not how it should work. So, to solve this problem, in June last year, we launched function calling, a native way to define tools the model can call using JSON schema. You define the schema of the function, and the model outputs JSON adhering to it. Unfortunately, function calling wasn't perfect. Sometimes the model would output invalid json, such as by adding A trailing comma. We've all done that before. So in november, at dev day last year, we launched json mode. Json mode ensures valid json outputs, so you never see Another parsing error. But this still wasn't enough. The model could output the wrong type. like a string instead of a float, as you see here, or hallucinate a parameter entirely. This cat-and-mouse game to get reliable outputs can be really frustrating. Fundamentally, AI applications need reliable outputs. So to solve these problems once and for all, we introduced structured outputs to the API in August. Here's Michelle to tell you more. So structured outputs is a new feature designed to ensure that model-generated outputs will exactly match JSON schemas supplied by you, developers. You can think of structured outputs as the difference between suggesting to the model to use your schema and constraining it. You don't have to ask nicely anymore, and you can just supply the JSON schema that you need. And you might be wondering, if this is the right solution that finally solves the problem, why did it take us so long to get here? And the answer is twofold. First, we still believe that function calling is often the right abstraction for this functionality. And then finally, it actually took us a little while to build a solution that is performant for inference at scale while constraining your outputs. So we're going to talk a little bit more later about the engineering and research we undertook to make this work. But for now, let's talk about how you can use it. So structured outputs is available in two modes in the API. The first is function calling. as Addy just showed you, and you're probably already familiar with it. This feature allows our models to generate parameters for your tool calls, and it allows developers to connect LLMs with functionality in their applications. The second mode is the response format parameter. This parameter is useful when the model is responding directly to a user rather than emitting a function call. So let's start with function calling. If you've used it before, you're probably pretty familiar with the format of a request that you see here. You can see we're supplying a JSON schema in the tools section, and there's parameters for the function. This tells the model how to call our tool. And in our case, the function is called getWeather, and it has two parameters, location and unit. So location is a string type, and so is unit, but it's actually limited to just a few fields with the enum parameter. When you supply a function like this in our API, our systems will show the model this spec, and it will use that information to generate tool calls when it's appropriate. Enabling structured outputs here is really easy. With just one line of code, I can set strict to true, and this will enable structured outputs. This will use the supplied schema and always make sure the model's response will follow it. So that's structured outputs in function calling. Now let's look at a quick example in the playground. And before we get into it, I'll tell you a little bit about a startup I'm building. So I'm actually building an AI glasses product. They're pretty hot right now. They're really futuristic, and they're built on the OpenAI API. These glasses have a speaker in the stem to read the answers out from the assistant, and there's actually a little AR screen for the glasses. I actually want to make an internal admin dashboard. To help my team answer questions about orders that we've received And you know what their shipping status is. So i actually already have a database with order information And i want to connect this assistant to that information. So let's get started in the playground. So i've actually already created a query function To tell the assistant how to query my sql database. Let's take a quick look. So here's my function. It's called And it has kind of the properties that you would expect. So the first is table name. And we only have one table name, Orders. That's all we support so far. Then we have all of the columns of my table. So these are just straight from my database. And then, you know, the meat of this is the conditions that we Support for querying. So you'll notice here that we Have some operators. My database supports equals, Greater than, less than, and non-equals. And then we have like an order by pram. So just stuff you'd expect for a database. When i use this assistant, i'll set the system message to tell It today's date so it can be useful. And then a user comes in and they're asking to find all of The orders shipped september 1 or later. So let's give this a quick go. All right. You can see we've got a function callback. And it looks pretty good. I mean, the logic is right. We're checking that shippedAt is greater than or equal to September 1. But the discerning developer will Notice that this will likely cause your application to blow Up. And the reason for that is that We're using the greater than or equal to operator. However, our orm for some reason only supports these operators. And so the greater than or equal to will just not work. With structured outputs, we can set strict to true, and this will constrain the model to only use what you've provided. So let's save that and retry. Awesome. You can see this time we've actually used the greater than operator, and the model's done the logic to determine we actually need to check that we're greater than the last day of August and not greater than or equal to the first day of September. So through this, you can see how structured outputs will just eliminate a whole class of errors for your application. Now let's get back to response formats. So in the past, when you wanted the model to respond using JSON, you'd be using JSON mode. So back to my AI Glass. This is startup. Since they've got a speaker in the stem, I want them to read some of the system's response out loud, and I want to show a summarized version in the lenses. Before structured outputs, I would actually just put these instructions into the system message and use JSON mode. This is pretty nice. I would always get back JSON, but sometimes if the user asks for something specific, I would get back an extra key or the model will use the wrong type. With structured outputs, i can move these instructions into the response format parameter, like you see here. I'm going to use the description to explain to the model how to respond, with the voiceover and display parameters. So this way the model will always follow the format and use these two keys. So let's give this a quick go as well. Alright, I have another tab here, and I've actually created this schema, so let's take a quick look. So voiceover is our first string property, and we're telling the assistant that this will be read aloud via TTS, and to write out numbers and acronyms fully. Then we have the display property, and this is, you know, we don't have a ton of room on the glasses, so we're going to keep this tight, just five words. So I'm going to put this into the playground. You can see on the right here we have the response format option. I'm going to paste in my schema, and you'll see that strict is set to true, so structured outputs is on. Now I'm going to test out the glasses with a typical query. I might ask something like, how tall is a giraffe? When I run this, you see that I get the output in the form I've asked for. So we have the voiceover parameter, which has spelled out the words to make it easier for our TTS models to read them out. We also have the display field, which is just four words, which should fit right on the glasses. So there you have structured outputs with response formats. This is just a really quick demo, but I just get into something a little more interesting. Okay. So, that showed us the feature, but let's get into some more interesting demos and how you might use structured outputs in your applications. Let's imagine we're building a fictitious company called Convex. Convex is an AI-powered recruiting tool. It lets recruiters create job postings, submit referrals, and schedule interviews. Behind the scenes, Convex is a Node and React app that uses response formats to extract information from resumes and uses function calling to perform queries over the candidate data. Let's see it in action. So, I've gone ahead and created a job posting here for an ML engineer role that we're hiring at OpenAI. And you can see the job description, the hiring manager, and some of the candidates that have already applied. Now, before I came on stage, a promising candidate reached out to me and shared his resume. Let's take a look. Okay. Greg seems to have worked at Stripe and has a little bit of experience working in AI. He has a whole bunch of skills, including coding and Python, so seems like a promising candidate. Let's put in a referral. So I'm going to go click in the Add Candidate button here and select Greg's resume. And you can see that we're outputting the fields from the resume in real time. The model is using structured outputs to extract this information from the text that's within the PDF. Let's take a look at the model's response behind the scenes. So we see a JSON object with... Name, title, location, contact info, work experience, all of this has been extracted from the resume. Now, let's take a look at the code to see how this all works. So, behind the scenes, we're using a response format called resume. It has the fields we just saw, like name, title, location, and so on. And in particular, for work experience, you'll notice that I'm using an array. So structured output supports a wide subset of json schema, including arrays. That allows the model to output more than one work experience. Now, some of the javascript developers in the room might also notice that i'm using the library zod to define my schema. Zod is great. It lets me define my schemas in code and get runtime type safety. The OpenAI Node SDK has native support for Zod. You can define your schemas in it, and when the model starts responding, we parse the response back into the Zod object as well, and it supports streaming. Very similarly, the OpenAI Python SDK supports Pydantic natively. Okay, so let's add one more field to the format here. Let's add the github user name. I'm going to save that. Refresh my page, and let's add a referral one more time. Great. The model is responding, and you can see that the GitHub username was also extracted. Okay. So that was the first feature I wanted to demo, which is extracting information from unstructured data using response formats. Next, let's see how we can use structured outputs in function calling. So I'm going to go click View All here to open my candidate analysis screen. And I have a helpful AI assistant here that I can ask questions to analyze my data. Now, we'll see that candidates have applied from all over the country in San Francisco, New York, and Austin. But for this role, we're hiring in the SF office. So let's filter the candidates down. I'm going to say filter the candidates to those based in San Francisco. Let's press Enter. And we see that the model called the find candidates function with a criteria field. The field is location and the value is San Francisco. And this hit our backend API and returned a list of candidates that were filtered and are based in SF. We also noticed that the UI on the left has been updated to reflect these candidates. In this way, you can use function calling to control the UI of your application. Let's look at this behind the scenes. So in our code, i've defined a tool or a function called find candidates. Find candidates has a schema that includes a list of Criteria, and each criteria has a field, so that can be title, Location, and so on, and a value, which is the value to filter on. So that was a quick 101 example of using structured outputs with Function calling. Let's try something harder. Let's say graph these candidates by years of experience. You'll notice that the model is generating UI, a card with a header and a bar graph that shows all the candidates and their years of experience, a table with the rows as well. Now, this is not a pre-built UI. This is actually the model composing a set of React components dynamically. We can look at the schema here as well. We see that the top-level property is component, and it says card, and card has a list of children. That includes a header, a bar chart, and so on. Here's how the schema is defined behind the scenes. We have a tool called GenerateUI, and GenerateUI has one property called component. Component is an any of, of card, header, bar chart, and so on. And each of these schemas is defined later on in the defs parameter. Now, this is a really interesting feature of structured outputs, where you can actually use defs to define your schemas in one place and use them multiple times, and you can use recursive schema definitions as well. So you'll notice that the card schema here has a children, which then references component again. So in this way, a component can have a list of children that are components, and structured outputs handles this with no problem at all. Great. So now that we have candidates sorted by years of experience, let's go ahead and schedule some interviews. So I'm going to say schedule interviews for the candidates with, let's say for the top three candidates by years of experience with Michelle and Olivier. So what this is going to do is first go and check the availability of Michelle and Olivier on their calendars. Once it picks out a few good slots, it's going to schedule those interviews. And finally, it's going to email the candidates that their interviews have been booked. So let's press Enter. Awesome. So we see that the model is calling the Fetch Availability API and got some data back. It's now calling the Schedule Interviews API, and that succeeded. And finally, it's calling Send Emails with custom emails for each candidate, and that succeeded as well. So that's an example of a multi-step workflow using function calling, where each step benefits from structured outputs. Before this feature, if any one of these steps failed, the whole workflow would fail. And in production, if one of these had, let's say, a 1% error rate, the workflow would have an approximately 3% error rate. You can see how important structured outputs is for reliability of agentic workflows. Okay, so that's a little demo of how you can use structured outputs in real-world applications. You can use response formats to extract information from unstructured data. You can use function calling to generate UI. And finally, you can build agentic workflows with 100% reliability. To tell you about how all of this works under the hood, here's Michelle. Thanks, Adi. Super cool and can't wait to see it. To interview Greg later. Let's get into how structured outputs works under the hood. We're going to talk about three things. We're going to talk about the engineering implementation, the research we undertook to make our models better at format following, and then finally, some of the interesting API design decisions we made to ship this feature. To ship this, we actually decided to take an approach that combined both research and engineering to create a holistic solution. Doing just one of these would make a pretty good product, but together they are greater than the sum of their parts. There's actually more to structured outputs than just prompting our models differently. It creates some interesting trade-offs for developers, so we thought it would be useful to take you under the hood under those choices. On the engineering side, we decided to take an approach known as constrained decoding to ensure that our models would always follow schemas supplied by developers. And there's three components of constrained decoding that we're about to talk about. The first is LLM inference, how that works. Then we're going to get into token masking. And finally, we're going to talk about the subset of JSON schema that we support, all of those grammars. So let's start with LLM inference. LLMs operate on tokens, which is the vocabulary of a large language model. This is all of the labels that the model can output or produce. So let's get into a simple example of a model to show a small vocab. This is the classic example of an AI model. So it's a digit recognition model. I want to train a model to recognize this handwritten 3 and determine what digit it is. So the vocab of this model is just going to be 10 labels, the digits 0 through 9. And this is actually what a model like that would look like. So first we convert our input into some sort of computer representation. And in this case, we're going to take every pixel from my handwritten 3 and feed that into the model. Then we have the inner layers of the model. And then finally, at the end, the model is producing 10 values. And these are the predictions of our labels from 0 to 9. You can actually see here that the digit 3 has a 97% probability. So this digit is probably a 3. This is also how large language models work for inference in pretty broad strokes. Rather than predicting the digits 0 through 9, large language models predict, you know, language. In order to do this, they use labels that represent language. And one of the best ways we found for doing this is to use tokens. Tokens are like words or word fragments. And here we have some English text broken up into its tokens. This is actually the text from our blog post, and you can see that the token boundaries vary. Sometimes they're an entire word, like the word the, or sometimes they're a word fragment, like struct. These tokens make up the vocabulary of large language models, and at any point, the model can predict any of these. They're all valid. So this is unconstrained decoding, where there's no constraints on what can be produced. What you see on the screen are actually some of the real tokens in the GPT-4 tokenizer. So there's all kinds of parts of speech, like individual characters, so the exclamation point, to full words, like conveyor, at the bottom here. Now let's say we're using structured outputs to generate an output with this schema. So this schema has a single property. Value, and that's a number type. And we're partway through our generation already. So we've produced a curly brace, some white space, and then value. By default, when we sample, any token from the vocab can be chosen. So that can be a string or a boolean. But if we did produce a boolean, that would not match the schema. So, and a number like 42 would be valid. So this tells us that we actually need to limit which tokens can be produced next. And we can't just have the entire vocab of the model. To do this, we will use a technique known as token masking. And this constrains the tokens, which can be picked at the very end of sampling. So back to our digit example. Here's an example of token masking. Let's say I have some side info about these numbers. I know that they're all prime for some reason. And so I wouldn't want to decode, you know, 0, 1, 4, 6. any of the non-prime numbers. So this is actually what I would do. I would kind of mask out any of those predictions. So what we do after we generate the probabilities with a forward pass is then remove any labels that are not supplied so that we will only sample from a valid token. This is known as masking, and we're effectively ignoring any valid token. is that we don't want to be produced. Large language models are autoregressive, and that means they produce output one token at a time. Each time we sample a token, that output is fed back in right as input for the next inference step. This means when sampling JSON for your schemas, we actually need to update the mask tokens at every step of inference. And we can't just do it once at the beginning of the request. If that's not making sense, look at this example. So you can see first we're allowing the curly brace token. And then once we sample it, in step 2, you actually no longer See the curly brace in the mask. So you can see that from step to Step, we need to update these masks. Because this is happening at every inference step, we need this operation to be super, super fast. At the scale that some of your apps run, we need to make things lightning fast and keep inference as fast as possible. So you probably already know this, but token sampling happens on a GPU. And we actually do it one batch at a time. For each batch, we calculate probabilities for everything in that batch, and then we'll apply any token masks, and then we'll sample from that final probability distribution. And that sampling is right here in the orange segments. So as you can see here, we can actually calculate the probabilities and determine the masks in parallel. And this frees up precious GPU resources because we can do the mask determination on CPU. This is very helpful for us to keep things fast. You'll also notice that we need the mask to happen at about the same amount of time as calculating the probabilities. And this is also known as the time between tokens, and it varies based on the size of the model. And so we need to keep this under even something like 10 milliseconds. So a naive solution to this problem would involve determining these valid tokens from the current state at every step of inference. But like we said, we have 10 milliseconds, so we're not super likely to meet that budget. So we wanted to precompute as much work as possible and then reuse it when sampling to make mass computation more like a lookup and less like a lot of work. And just like you can build an index in your SQL database to speed up queries. we can actually build up an index with the specific JSON schema that you supply to make fetching these masks very fast during inference. There's actually a lot of work that goes into producing this index from the JSON schema. First, we convert the JSON schema into a grammar. And a grammar is actually a formal specification of a language. Once we have the grammar, we can create a parser. And a parser is a program that can check strings and see if they're part of this language. You can think of the parser as a program that takes in a JSON blob and tells you, does this match the schema or not? Finally, once we have a parser, we can iterate over all of the tokens in our vocabulary and all of the states that are possible in the parser, and this will determine which tokens are valid. This work lets us build an index. Our index is actually a tree, which is a prefix-based data structure that allows for O of 1 lookups. So during inference, we're running our parser, building up some states, and then every time we'll traverse this tree to find the leaf. The leaf node will tell us what the mask is for the next step. Generating this index is pretty computationally expensive. We have to go over all of the possible states. So we do it just once and then we cache it later for fast Lookups. This is why the first query to Structured outputs can take a little time, usually under ten Seconds. And then the following queries Are just as fast as you normally expect. Finally, let's talk about the kinds of grammars we support With structured outputs. A common approach in the open source community is using regular expressions for determining token masks. And this approach works quite well for simple or depth-limited schemas. So, for example, you can imagine a regular expression for our value schema from before. It's actually pretty tractable to implement with a regular expression, which we have here. And this regular expression has about what you'd expect. First we have the curly brace, and then we have some white space, then value, and then finally a regular expression for number types. However, you actually can't implement all of the expressiveness with JSON schema with regular expressions. They're missing basically the memory needed to store information about past lookups. So let's talk about why. Here's an example of a recursive schema from our blog post. So this schema is useful for generative ui. And it's kind of like what addy showed you just now as well. Each component can have a list of children, and those must Also themselves match the schema at the top level. So you can see the schema here uses the ref param to reference the parent. And there's actually no way to implement a regular expression to validate this because of the potential of recursive nesting. This is just a fundamental limitation of regular expressions. Because they don't have arbitrary memory to encode information about past outputs. To make this easier to grok, here's a really simple example of a language. Our language is defined as all of the strings that have matching open and close parens. And you can see we have a quick example here where all parentheses are matched. This is something that's so easy to explain, but it's not possible to implement with a regular expression. And to show you why, there's actually two attempts we have here to do so. So the first one is very simple. We just allow arbitrary open and close parens. But it's pretty clear why this won't work. You can have one open and two close parens. So it doesn't validate our language. We're kind of missing the memory of what happened before. Our other attempt here is much more complicated. And it works, but only up to three layers of nesting. If we go further than that, the regular expression doesn't behave properly. This is just a toy example, but we believe that recursive and deeply nested schemas are often critical for developers. So we really wanted to find a way to support them. We can do so by adding in a stack, and this gives us the memory we need to encode information about past outputs. So this stack will let us keep track of things like, how many open parens have I seen before? This approach is known as the CFG, or context-free grammar approach. It's a little trickier to implement, but it allows for far more expressiveness. This is the approach that we went with. And it's why it takes a little bit of time to build up this tree for inference. It's also why we support recursion and deep nesting. So that's the engineering side of structured outputs. We just talked about how LLM inference works, why token masking is a useful building block, and how this all comes together in the extensive subset of JSON schema that we support. We believe that this implementation results in the greatest set of trade-offs for developers. We've kept inference as fast as possible and we've supported a Very wide subset of json schema. And we believe the tradeoff of Waiting a little longer, that first request while you're Developing is worth it for most developers. So now let's get into how we've actually improved our models With research to work with structured outputs. Awesome. Thank you, Michelle. So that was a little bit of the engineering behind structured outputs. Next, let's talk about the research side. On the research side, we wanted to ship a model that was much better at following formats as specified. It's not enough to just constrain the model to a valid output. If those outputs are out of distribution or low probability for the model, because the model will often behave erratically. Some of you may have seen this as the infinite new line problem before. This happened with some of our older models that weren't trained on response formats. The model's natural inclination was to generate text that wasn't JSON, but because it was being forced to output JSON, the only valid and sufficiently probable token was the newline character. So the model would just repeat the newline token one after another all the way until it hits max tokens. So we wanted to avoid this problem, so we specifically trained our models to understand JSON schema And in particular, complex schema is better. It's also not enough for the model to just understand the schema. It needs to know, you know, what the quality of that field is. There's a semantic meaning behind the keys. For example, consider this action item schema. Each action item has a description, a due date, and an owner. To produce a good output, it's not enough to know that... a description is a string and the owner is a string. The model needs to know what kind of string goes into a description and what kind of string goes into owner. So to ensure the models are good at this, we trained them with a whole bunch of complex schemas, including nested schemas. So here are the results from our training process. This graph shows our models on the x-axis and their accuracy on one of our evals on the y-axis. The first three bars from the left are GPT-4, GPT-4 Turbo, and if... the original GPT-4. You can see that accuracy improved from about 36% to 86% over the last year. Now, at the very end here, you see the results for our latest model. The orange bar shows the model's results with just prompting, and it has an accuracy of about 85%. When you add in the newly trained response format, we see with the yellow bar that the accuracy improves to 93%. This is much better, but still not 100%. So to get to 100%, we can enable constraint decoding, as Michelle shared earlier, giving us a perfect score on the green bar. So in this way, combining research to improve the model with engineering to implement constraint decoding brings out the best possible results. Okay, finally, let's get into some interesting design trade-offs we made when creating this feature. We'll talk about three things, additional properties, required properties, and the order of properties. Let's start with additional properties. So one of the controversial API design decisions we had to make was deciding what to do with properties that were not defined in the schema. By default, JSON schema allows extra properties, so all additional properties are always allowed. Now, this is generally not the behavior we as developers expect. You can imagine errors like this, where we have a function, getWeather, that accepts two arguments. If the LLM produces an extra argument, we get a runtime error. So we decided to disallow additional properties by default in the OpenAI API. That said, this meant the default in our API was different than the default in JSON schema. That's not great, especially if a developer already has a predefined schema from elsewhere in their application. In general, one of our API design principles is that we prefer to be explicit rather than implicit. So we decided to require that developers have to pass in additional properties false in their schemas. This makes the API a little bit harder to use. It means you have to set this property every time. But it sets expectations better with developers. Okay, let's talk about required properties. By default, in JSON schema, all properties are optional. This is, again, not what we as developers expect. To go back to our earlier example of the getWeather function, if the LLM decided to skip one of the parameters, we would again get a runtime error. So to make this feature more intuitive, we decided that all properties were required by default. And once again, to set expectations, we require developers to pass this in in the required directive. Now, we do have a workaround for optional parameters, which is to make them nullable. This gives us the best of both worlds. You get optionality without the performance tradeoffs. Okay, finally, let's talk about the order of properties. By default, json schema has no ordering constraints, which Means the llm can produce properties in any order. But in the context of llms, order really matters. Having a strict ordering of properties can be really useful. For example, you can use an earlier field in your schema to condition the value of a later field, such as adding a chain of thought field to your schema. The model will first generate the chain of thought to explain its thinking and then generate the answer. This often improves the quality of the answer significantly. So to support this use case, we decided to generate fields in the same order that you define them in the schema. So that's api design. We wanted to make sure that the Api had good defaults while making the constraints Transparent to the developers. Okay. So to bring it all home, here's michelle. Awesome. So you can see that the Engineering and research halves of this project result in a Meaningful improvement, but only when combined do they offer the Best possible results. This is actually our ethos behind the work in the OpenAI API. We want to do the engineering and research work on our end to make the easiest-to-use API for developers. We want to solve problems like structured outputs for you so you can spend time working on what matters most, your application. We think structured outputs was the final puzzle piece for unlocking the full power of AI applications. Data extraction is now reliable. Function calls have the required parameters. And finally, agentic flows can work 100% of the time. Since we launched structured outputs in August, developers like you, from big companies to small startups, have been building amazing apps. We've seen customers like Shopify use structured outputs to reduce hallucinations and improve the reliability of their applications. We've also heard the excitement from all of you. It's been so great to see Structured Outputs solve so many problems for developers. And this is why we do what we do. Our mission at OpenAI is to build safe AGI for everyone. We build the API because we think working with developers, you all, is just critical to achieving that mission. You see the future before anyone else. And together, we can spread this technology to the furthest reaches of the world. As fellow engineers, we feel so lucky to serve you. So thank you for building us, and we're so excited to see what you do. Thank you.

My name is Atiyah Leti, and I lead API design at OpenAI. And I'm Michelle Pocras, and I'm the tech lead here for the API. Today, we're going to talk about three things. First, we're going to tell you why you need structured outputs. Then we'll talk about how the feature works.

And finally, we'll tell you how we built it under the hood. So let's get right into it. Let's start at the beginning. The year is 2020, and OpenAI has just launched GPT-3. GPT-3 can write emails, draft blog posts, and generate movie scripts.

It was great for text generation. Quickly, developers, all of you, started to find some exciting new applications for this technology. From generating in-game scripts like AI Dungeon, or drafting marketing materials like Copy.ai.

Fast forward three years, in 2023, we launched GPT-4, a new breakthrough in LLM intelligence. For the first time, models were capable of advanced reasoning, following complex instructions, extracting information from long documents, and taking action on behalf of users. Developers took it to the next level, from building AI-powered productivity tools like Cursor, customer service agents like Klarinus, and language learning apps like Duolingo.

All of these products had one thing in common. They were connecting LLMs to the outside world, from your code base to external APIs or on-device actions. To do this, they needed outputs to be structured, typically JSON.

Here's an example. Let's say you're building a personal assistant, and you want to convert the user's message into an API call. The user might say, play the Beatles. Here's what you expect. You want a JSON object that says, play music for the API and the Beatles for the artist.

But too often, here's what you get. The model begins with a preamble, saying, sure, here's the JSON to call the play music API. Now, this isn't particularly helpful for developers, since we need just the JSON, not the text surrounding it. This is a problem.

LLM outputs are not always reliable, making it hard to integrate them into your apps. Now, as developers, we've tried all kinds of tricks to solve this. Some of you may have written prompts like this. Or even this. Now, sometimes it works, and sometimes you have to go all the way and write a custom markdown parser.

That's obviously not how it should work. So, to solve this problem, in June last year, we launched function calling, a native way to define tools the model can call using JSON schema. You define the schema of the function, and the model outputs JSON adhering to it.

Unfortunately, function calling wasn't perfect. Sometimes the model would output invalid json, such as by adding A trailing comma. We've all done that before. So in november, at dev day last year, we launched json mode. Json mode ensures valid json outputs, so you never see Another parsing error.

But this still wasn't enough. The model could output the wrong type. like a string instead of a float, as you see here, or hallucinate a parameter entirely. This cat-and-mouse game to get reliable outputs can be really frustrating.

Fundamentally, AI applications need reliable outputs. So to solve these problems once and for all, we introduced structured outputs to the API in August. Here's Michelle to tell you more. So structured outputs is a new feature designed to ensure that model-generated outputs will exactly match JSON schemas supplied by you, developers. You can think of structured outputs as the difference between suggesting to the model to use your schema and constraining it.

You don't have to ask nicely anymore, and you can just supply the JSON schema that you need. And you might be wondering, if this is the right solution that finally solves the problem, why did it take us so long to get here? And the answer is twofold.

First, we still believe that function calling is often the right abstraction for this functionality. And then finally, it actually took us a little while to build a solution that is performant for inference at scale while constraining your outputs. So we're going to talk a little bit more later about the engineering and research we undertook to make this work.

But for now, let's talk about how you can use it. So structured outputs is available in two modes in the API. The first is function calling.

as Addy just showed you, and you're probably already familiar with it. This feature allows our models to generate parameters for your tool calls, and it allows developers to connect LLMs with functionality in their applications. The second mode is the response format parameter. This parameter is useful when the model is responding directly to a user rather than emitting a function call.

So let's start with function calling. If you've used it before, you're probably pretty familiar with the format of a request that you see here. You can see we're supplying a JSON schema in the tools section, and there's parameters for the function.

This tells the model how to call our tool. And in our case, the function is called getWeather, and it has two parameters, location and unit. So location is a string type, and so is unit, but it's actually limited to just a few fields with the enum parameter. When you supply a function like this in our API, our systems will show the model this spec, and it will use that information to generate tool calls when it's appropriate. Enabling structured outputs here is really easy.

With just one line of code, I can set strict to true, and this will enable structured outputs. This will use the supplied schema and always make sure the model's response will follow it. So that's structured outputs in function calling. Now let's look at a quick example in the playground. And before we get into it, I'll tell you a little bit about a startup I'm building.

So I'm actually building an AI glasses product. They're pretty hot right now. They're really futuristic, and they're built on the OpenAI API.

These glasses have a speaker in the stem to read the answers out from the assistant, and there's actually a little AR screen for the glasses. I actually want to make an internal admin dashboard. To help my team answer questions about orders that we've received And you know what their shipping status is. So i actually already have a database with order information And i want to connect this assistant to that information.

So let's get started in the playground. So i've actually already created a query function To tell the assistant how to query my sql database. Let's take a quick look. So here's my function.

It's called And it has kind of the properties that you would expect. So the first is table name. And we only have one table name, Orders.

That's all we support so far. Then we have all of the columns of my table. So these are just straight from my database.

And then, you know, the meat of this is the conditions that we Support for querying. So you'll notice here that we Have some operators. My database supports equals, Greater than, less than, and non-equals. And then we have like an order by pram.

So just stuff you'd expect for a database. When i use this assistant, i'll set the system message to tell It today's date so it can be useful. And then a user comes in and they're asking to find all of The orders shipped september 1 or later.

So let's give this a quick go. All right. You can see we've got a function callback.

And it looks pretty good. I mean, the logic is right. We're checking that shippedAt is greater than or equal to September 1. But the discerning developer will Notice that this will likely cause your application to blow Up. And the reason for that is that We're using the greater than or equal to operator. However, our orm for some reason only supports these operators.

And so the greater than or equal to will just not work. With structured outputs, we can set strict to true, and this will constrain the model to only use what you've provided. So let's save that and retry. Awesome.

You can see this time we've actually used the greater than operator, and the model's done the logic to determine we actually need to check that we're greater than the last day of August and not greater than or equal to the first day of September. So through this, you can see how structured outputs will just eliminate a whole class of errors for your application. Now let's get back to response formats.

So in the past, when you wanted the model to respond using JSON, you'd be using JSON mode. So back to my AI Glass. This is startup.

Since they've got a speaker in the stem, I want them to read some of the system's response out loud, and I want to show a summarized version in the lenses. Before structured outputs, I would actually just put these instructions into the system message and use JSON mode. This is pretty nice. I would always get back JSON, but sometimes if the user asks for something specific, I would get back an extra key or the model will use the wrong type. With structured outputs, i can move these instructions into the response format parameter, like you see here.

I'm going to use the description to explain to the model how to respond, with the voiceover and display parameters. So this way the model will always follow the format and use these two keys. So let's give this a quick go as well. Alright, I have another tab here, and I've actually created this schema, so let's take a quick look. So voiceover is our first string property, and we're telling the assistant that this will be read aloud via TTS, and to write out numbers and acronyms fully.

Then we have the display property, and this is, you know, we don't have a ton of room on the glasses, so we're going to keep this tight, just five words. So I'm going to put this into the playground. You can see on the right here we have the response format option.

I'm going to paste in my schema, and you'll see that strict is set to true, so structured outputs is on. Now I'm going to test out the glasses with a typical query. I might ask something like, how tall is a giraffe? When I run this, you see that I get the output in the form I've asked for. So we have the voiceover parameter, which has spelled out the words to make it easier for our TTS models to read them out.

We also have the display field, which is just four words, which should fit right on the glasses. So there you have structured outputs with response formats. This is just a really quick demo, but I just get into something a little more interesting.

Okay. So, that showed us the feature, but let's get into some more interesting demos and how you might use structured outputs in your applications. Let's imagine we're building a fictitious company called Convex.

Convex is an AI-powered recruiting tool. It lets recruiters create job postings, submit referrals, and schedule interviews. Behind the scenes, Convex is a Node and React app that uses response formats to extract information from resumes and uses function calling to perform queries over the candidate data.

Let's see it in action. So, I've gone ahead and created a job posting here for an ML engineer role that we're hiring at OpenAI. And you can see the job description, the hiring manager, and some of the candidates that have already applied. Now, before I came on stage, a promising candidate reached out to me and shared his resume. Let's take a look.

Okay. Greg seems to have worked at Stripe and has a little bit of experience working in AI. He has a whole bunch of skills, including coding and Python, so seems like a promising candidate.

Let's put in a referral. So I'm going to go click in the Add Candidate button here and select Greg's resume. And you can see that we're outputting the fields from the resume in real time.

The model is using structured outputs to extract this information from the text that's within the PDF. Let's take a look at the model's response behind the scenes. So we see a JSON object with...

Name, title, location, contact info, work experience, all of this has been extracted from the resume. Now, let's take a look at the code to see how this all works. So, behind the scenes, we're using a response format called resume. It has the fields we just saw, like name, title, location, and so on. And in particular, for work experience, you'll notice that I'm using an array.

So structured output supports a wide subset of json schema, including arrays. That allows the model to output more than one work experience. Now, some of the javascript developers in the room might also notice that i'm using the library zod to define my schema. Zod is great.

It lets me define my schemas in code and get runtime type safety. The OpenAI Node SDK has native support for Zod. You can define your schemas in it, and when the model starts responding, we parse the response back into the Zod object as well, and it supports streaming. Very similarly, the OpenAI Python SDK supports Pydantic natively. Okay, so let's add one more field to the format here.

Let's add the github user name. I'm going to save that. Refresh my page, and let's add a referral one more time.

Great. The model is responding, and you can see that the GitHub username was also extracted. Okay.

So that was the first feature I wanted to demo, which is extracting information from unstructured data using response formats. Next, let's see how we can use structured outputs in function calling. So I'm going to go click View All here to open my candidate analysis screen. And I have a helpful AI assistant here that I can ask questions to analyze my data. Now, we'll see that candidates have applied from all over the country in San Francisco, New York, and Austin.

But for this role, we're hiring in the SF office. So let's filter the candidates down. I'm going to say filter the candidates to those based in San Francisco. Let's press Enter. And we see that the model called the find candidates function with a criteria field.

The field is location and the value is San Francisco. And this hit our backend API and returned a list of candidates that were filtered and are based in SF. We also noticed that the UI on the left has been updated to reflect these candidates.

In this way, you can use function calling to control the UI of your application. Let's look at this behind the scenes. So in our code, i've defined a tool or a function called find candidates. Find candidates has a schema that includes a list of Criteria, and each criteria has a field, so that can be title, Location, and so on, and a value, which is the value to filter on. So that was a quick 101 example of using structured outputs with Function calling.

Let's try something harder. Let's say graph these candidates by years of experience. You'll notice that the model is generating UI, a card with a header and a bar graph that shows all the candidates and their years of experience, a table with the rows as well. Now, this is not a pre-built UI.

This is actually the model composing a set of React components dynamically. We can look at the schema here as well. We see that the top-level property is component, and it says card, and card has a list of children. That includes a header, a bar chart, and so on. Here's how the schema is defined behind the scenes.

We have a tool called GenerateUI, and GenerateUI has one property called component. Component is an any of, of card, header, bar chart, and so on. And each of these schemas is defined later on in the defs parameter.

Now, this is a really interesting feature of structured outputs, where you can actually use defs to define your schemas in one place and use them multiple times, and you can use recursive schema definitions as well. So you'll notice that the card schema here has a children, which then references component again. So in this way, a component can have a list of children that are components, and structured outputs handles this with no problem at all.

Great. So now that we have candidates sorted by years of experience, let's go ahead and schedule some interviews. So I'm going to say schedule interviews for the candidates with, let's say for the top three candidates by years of experience with Michelle and Olivier.

So what this is going to do is first go and check the availability of Michelle and Olivier on their calendars. Once it picks out a few good slots, it's going to schedule those interviews. And finally, it's going to email the candidates that their interviews have been booked. So let's press Enter. Awesome.

So we see that the model is calling the Fetch Availability API and got some data back. It's now calling the Schedule Interviews API, and that succeeded. And finally, it's calling Send Emails with custom emails for each candidate, and that succeeded as well.

So that's an example of a multi-step workflow using function calling, where each step benefits from structured outputs. Before this feature, if any one of these steps failed, the whole workflow would fail. And in production, if one of these had, let's say, a 1% error rate, the workflow would have an approximately 3% error rate. You can see how important structured outputs is for reliability of agentic workflows. Okay, so that's a little demo of how you can use structured outputs in real-world applications.

You can use response formats to extract information from unstructured data. You can use function calling to generate UI. And finally, you can build agentic workflows with 100% reliability.

To tell you about how all of this works under the hood, here's Michelle. Thanks, Adi. Super cool and can't wait to see it.

To interview Greg later. Let's get into how structured outputs works under the hood. We're going to talk about three things. We're going to talk about the engineering implementation, the research we undertook to make our models better at format following, and then finally, some of the interesting API design decisions we made to ship this feature. To ship this, we actually decided to take an approach that combined both research and engineering to create a holistic solution.

Doing just one of these would make a pretty good product, but together they are greater than the sum of their parts. There's actually more to structured outputs than just prompting our models differently. It creates some interesting trade-offs for developers, so we thought it would be useful to take you under the hood under those choices.

On the engineering side, we decided to take an approach known as constrained decoding to ensure that our models would always follow schemas supplied by developers. And there's three components of constrained decoding that we're about to talk about. The first is LLM inference, how that works.

Then we're going to get into token masking. And finally, we're going to talk about the subset of JSON schema that we support, all of those grammars. So let's start with LLM inference. LLMs operate on tokens, which is the vocabulary of a large language model. This is all of the labels that the model can output or produce.

So let's get into a simple example of a model to show a small vocab. This is the classic example of an AI model. So it's a digit recognition model. I want to train a model to recognize this handwritten 3 and determine what digit it is. So the vocab of this model is just going to be 10 labels, the digits 0 through 9. And this is actually what a model like that would look like.

So first we convert our input into some sort of computer representation. And in this case, we're going to take every pixel from my handwritten 3 and feed that into the model. Then we have the inner layers of the model.

And then finally, at the end, the model is producing 10 values. And these are the predictions of our labels from 0 to 9. You can actually see here that the digit 3 has a 97% probability. So this digit is probably a 3. This is also how large language models work for inference in pretty broad strokes. Rather than predicting the digits 0 through 9, large language models predict, you know, language.

In order to do this, they use labels that represent language. And one of the best ways we found for doing this is to use tokens. Tokens are like words or word fragments. And here we have some English text broken up into its tokens.

This is actually the text from our blog post, and you can see that the token boundaries vary. Sometimes they're an entire word, like the word the, or sometimes they're a word fragment, like struct. These tokens make up the vocabulary of large language models, and at any point, the model can predict any of these. They're all valid. So this is unconstrained decoding, where there's no constraints on what can be produced.

What you see on the screen are actually some of the real tokens in the GPT-4 tokenizer. So there's all kinds of parts of speech, like individual characters, so the exclamation point, to full words, like conveyor, at the bottom here. Now let's say we're using structured outputs to generate an output with this schema. So this schema has a single property.

Value, and that's a number type. And we're partway through our generation already. So we've produced a curly brace, some white space, and then value. By default, when we sample, any token from the vocab can be chosen. So that can be a string or a boolean.

But if we did produce a boolean, that would not match the schema. So, and a number like 42 would be valid. So this tells us that we actually need to limit which tokens can be produced next.

And we can't just have the entire vocab of the model. To do this, we will use a technique known as token masking. And this constrains the tokens, which can be picked at the very end of sampling. So back to our digit example. Here's an example of token masking.

Let's say I have some side info about these numbers. I know that they're all prime for some reason. And so I wouldn't want to decode, you know, 0, 1, 4, 6. any of the non-prime numbers.

So this is actually what I would do. I would kind of mask out any of those predictions. So what we do after we generate the probabilities with a forward pass is then remove any labels that are not supplied so that we will only sample from a valid token.

This is known as masking, and we're effectively ignoring any valid token. is that we don't want to be produced. Large language models are autoregressive, and that means they produce output one token at a time. Each time we sample a token, that output is fed back in right as input for the next inference step.

This means when sampling JSON for your schemas, we actually need to update the mask tokens at every step of inference. And we can't just do it once at the beginning of the request. If that's not making sense, look at this example. So you can see first we're allowing the curly brace token.

And then once we sample it, in step 2, you actually no longer See the curly brace in the mask. So you can see that from step to Step, we need to update these masks. Because this is happening at every inference step, we need this operation to be super, super fast. At the scale that some of your apps run, we need to make things lightning fast and keep inference as fast as possible. So you probably already know this, but token sampling happens on a GPU.

And we actually do it one batch at a time. For each batch, we calculate probabilities for everything in that batch, and then we'll apply any token masks, and then we'll sample from that final probability distribution. And that sampling is right here in the orange segments. So as you can see here, we can actually calculate the probabilities and determine the masks in parallel. And this frees up precious GPU resources because we can do the mask determination on CPU.

This is very helpful for us to keep things fast. You'll also notice that we need the mask to happen at about the same amount of time as calculating the probabilities. And this is also known as the time between tokens, and it varies based on the size of the model.

And so we need to keep this under even something like 10 milliseconds. So a naive solution to this problem would involve determining these valid tokens from the current state at every step of inference. But like we said, we have 10 milliseconds, so we're not super likely to meet that budget.

So we wanted to precompute as much work as possible and then reuse it when sampling to make mass computation more like a lookup and less like a lot of work. And just like you can build an index in your SQL database to speed up queries. we can actually build up an index with the specific JSON schema that you supply to make fetching these masks very fast during inference. There's actually a lot of work that goes into producing this index from the JSON schema. First, we convert the JSON schema into a grammar.

And a grammar is actually a formal specification of a language. Once we have the grammar, we can create a parser. And a parser is a program that can check strings and see if they're part of this language. You can think of the parser as a program that takes in a JSON blob and tells you, does this match the schema or not?

Finally, once we have a parser, we can iterate over all of the tokens in our vocabulary and all of the states that are possible in the parser, and this will determine which tokens are valid. This work lets us build an index. Our index is actually a tree, which is a prefix-based data structure that allows for O of 1 lookups.

So during inference, we're running our parser, building up some states, and then every time we'll traverse this tree to find the leaf. The leaf node will tell us what the mask is for the next step. Generating this index is pretty computationally expensive.

We have to go over all of the possible states. So we do it just once and then we cache it later for fast Lookups. This is why the first query to Structured outputs can take a little time, usually under ten Seconds.

And then the following queries Are just as fast as you normally expect. Finally, let's talk about the kinds of grammars we support With structured outputs. A common approach in the open source community is using regular expressions for determining token masks.

And this approach works quite well for simple or depth-limited schemas. So, for example, you can imagine a regular expression for our value schema from before. It's actually pretty tractable to implement with a regular expression, which we have here.

And this regular expression has about what you'd expect. First we have the curly brace, and then we have some white space, then value, and then finally a regular expression for number types. However, you actually can't implement all of the expressiveness with JSON schema with regular expressions. They're missing basically the memory needed to store information about past lookups.

So let's talk about why. Here's an example of a recursive schema from our blog post. So this schema is useful for generative ui. And it's kind of like what addy showed you just now as well. Each component can have a list of children, and those must Also themselves match the schema at the top level.

So you can see the schema here uses the ref param to reference the parent. And there's actually no way to implement a regular expression to validate this because of the potential of recursive nesting. This is just a fundamental limitation of regular expressions.

Because they don't have arbitrary memory to encode information about past outputs. To make this easier to grok, here's a really simple example of a language. Our language is defined as all of the strings that have matching open and close parens. And you can see we have a quick example here where all parentheses are matched.

This is something that's so easy to explain, but it's not possible to implement with a regular expression. And to show you why, there's actually two attempts we have here to do so. So the first one is very simple. We just allow arbitrary open and close parens.

But it's pretty clear why this won't work. You can have one open and two close parens. So it doesn't validate our language.

We're kind of missing the memory of what happened before. Our other attempt here is much more complicated. And it works, but only up to three layers of nesting. If we go further than that, the regular expression doesn't behave properly. This is just a toy example, but we believe that recursive and deeply nested schemas are often critical for developers.

So we really wanted to find a way to support them. We can do so by adding in a stack, and this gives us the memory we need to encode information about past outputs. So this stack will let us keep track of things like, how many open parens have I seen before? This approach is known as the CFG, or context-free grammar approach. It's a little trickier to implement, but it allows for far more expressiveness.

This is the approach that we went with. And it's why it takes a little bit of time to build up this tree for inference. It's also why we support recursion and deep nesting. So that's the engineering side of structured outputs. We just talked about how LLM inference works, why token masking is a useful building block, and how this all comes together in the extensive subset of JSON schema that we support.

We believe that this implementation results in the greatest set of trade-offs for developers. We've kept inference as fast as possible and we've supported a Very wide subset of json schema. And we believe the tradeoff of Waiting a little longer, that first request while you're Developing is worth it for most developers.

So now let's get into how we've actually improved our models With research to work with structured outputs. Awesome. Thank you, Michelle.

So that was a little bit of the engineering behind structured outputs. Next, let's talk about the research side. On the research side, we wanted to ship a model that was much better at following formats as specified.

It's not enough to just constrain the model to a valid output. If those outputs are out of distribution or low probability for the model, because the model will often behave erratically. Some of you may have seen this as the infinite new line problem before. This happened with some of our older models that weren't trained on response formats. The model's natural inclination was to generate text that wasn't JSON, but because it was being forced to output JSON, the only valid and sufficiently probable token was the newline character.

So the model would just repeat the newline token one after another all the way until it hits max tokens. So we wanted to avoid this problem, so we specifically trained our models to understand JSON schema And in particular, complex schema is better. It's also not enough for the model to just understand the schema.

It needs to know, you know, what the quality of that field is. There's a semantic meaning behind the keys. For example, consider this action item schema.

Each action item has a description, a due date, and an owner. To produce a good output, it's not enough to know that... a description is a string and the owner is a string. The model needs to know what kind of string goes into a description and what kind of string goes into owner.

So to ensure the models are good at this, we trained them with a whole bunch of complex schemas, including nested schemas. So here are the results from our training process. This graph shows our models on the x-axis and their accuracy on one of our evals on the y-axis. The first three bars from the left are GPT-4, GPT-4 Turbo, and if... the original GPT-4.

You can see that accuracy improved from about 36% to 86% over the last year. Now, at the very end here, you see the results for our latest model. The orange bar shows the model's results with just prompting, and it has an accuracy of about 85%.

When you add in the newly trained response format, we see with the yellow bar that the accuracy improves to 93%. This is much better, but still not 100%. So to get to 100%, we can enable constraint decoding, as Michelle shared earlier, giving us a perfect score on the green bar. So in this way, combining research to improve the model with engineering to implement constraint decoding brings out the best possible results.

Okay, finally, let's get into some interesting design trade-offs we made when creating this feature. We'll talk about three things, additional properties, required properties, and the order of properties. Let's start with additional properties. So one of the controversial API design decisions we had to make was deciding what to do with properties that were not defined in the schema. By default, JSON schema allows extra properties, so all additional properties are always allowed.

Now, this is generally not the behavior we as developers expect. You can imagine errors like this, where we have a function, getWeather, that accepts two arguments. If the LLM produces an extra argument, we get a runtime error. So we decided to disallow additional properties by default in the OpenAI API.

That said, this meant the default in our API was different than the default in JSON schema. That's not great, especially if a developer already has a predefined schema from elsewhere in their application. In general, one of our API design principles is that we prefer to be explicit rather than implicit. So we decided to require that developers have to pass in additional properties false in their schemas.

This makes the API a little bit harder to use. It means you have to set this property every time. But it sets expectations better with developers. Okay, let's talk about required properties. By default, in JSON schema, all properties are optional.

This is, again, not what we as developers expect. To go back to our earlier example of the getWeather function, if the LLM decided to skip one of the parameters, we would again get a runtime error. So to make this feature more intuitive, we decided that all properties were required by default.

And once again, to set expectations, we require developers to pass this in in the required directive. Now, we do have a workaround for optional parameters, which is to make them nullable. This gives us the best of both worlds.

You get optionality without the performance tradeoffs. Okay, finally, let's talk about the order of properties. By default, json schema has no ordering constraints, which Means the llm can produce properties in any order.

But in the context of llms, order really matters. Having a strict ordering of properties can be really useful. For example, you can use an earlier field in your schema to condition the value of a later field, such as adding a chain of thought field to your schema. The model will first generate the chain of thought to explain its thinking and then generate the answer.

This often improves the quality of the answer significantly. So to support this use case, we decided to generate fields in the same order that you define them in the schema. So that's api design. We wanted to make sure that the Api had good defaults while making the constraints Transparent to the developers. Okay.

So to bring it all home, here's michelle. Awesome. So you can see that the Engineering and research halves of this project result in a Meaningful improvement, but only when combined do they offer the Best possible results.

This is actually our ethos behind the work in the OpenAI API. We want to do the engineering and research work on our end to make the easiest-to-use API for developers. We want to solve problems like structured outputs for you so you can spend time working on what matters most, your application.

We think structured outputs was the final puzzle piece for unlocking the full power of AI applications. Data extraction is now reliable. Function calls have the required parameters. And finally, agentic flows can work 100% of the time.

Since we launched structured outputs in August, developers like you, from big companies to small startups, have been building amazing apps. We've seen customers like Shopify use structured outputs to reduce hallucinations and improve the reliability of their applications. We've also heard the excitement from all of you. It's been so great to see Structured Outputs solve so many problems for developers.

And this is why we do what we do. Our mission at OpenAI is to build safe AGI for everyone. We build the API because we think working with developers, you all, is just critical to achieving that mission.

You see the future before anyone else. And together, we can spread this technology to the furthest reaches of the world. As fellow engineers, we feel so lucky to serve you. So thank you for building us, and we're so excited to see what you do. Thank you.

Transcript for:Innovative Structured Outputs in OpenAI API

Transcript for:
Innovative Structured Outputs in OpenAI API