Launch of Image Generation in ChatGPT

good morning everybody today we have one of the most fun cool things we have ever launched people have been waiting this waiting for this for a long time uh we know we've made you wait but we think it's really worth it and we think you're going to love it we are launching native images in Chad GBT image generation has been around for a while in fact one of the first things that we ever were known for was the original dolly but image generation has been largely a novelty uh you've been able to make some cool art with it and people have done amazing things but it has not had the power to be really useful in a wide variety of ways the thing that we're going to launch today um is native image generation in our 40 model and it's such a huge step forward that the best way to uh the best way to explain it to you is just to show it which we'll do very soon um but this is this is really something that we have been excited about bringing to the world for a long time we think that if we can uh offer image generation like this creatives educators small business owners students way more will be able to use this and do all kinds of new things with AI that they couldn't before and um really the best thing to do is just to show it to you so I'd like to introduce Gabe who is the lead researcher uh and really the primary driver of this product and we will I'll hand it over hey so I'm Gabe uh lead researcher hey I'm Proful i'm the head of multimodal research so okay I'm going to jump right in with uh with a demo and uh the reason I'm starting off with a demo is because I'm also using these demos as my speaker notes so uh it's a bit handy now um two years ago when we first started this project uh we were interested in sort of like uh maybe a scientific question about what native support for image generation would look like in a model as powerful as GBD4 we didn't know the answer to that question but a year later when the model was done training we saw really exciting signs of life so you know we featured this in a forum blog post if some of you will remember and um you know we saw that the model could render paragraphs of text for example you could combine images in really very interesting and novel ways and I think we spent a lot of time just playing with this model and I felt that sense of like joy and excitement you know I haven't felt for a very long time maybe even since GPD2 i haven't either this was one of those really wow moments it was a wow moment but that model was still a bit rough around the edges so you know it's um you know it it it it you know sometimes made typos it you know it was it it was kind of unreliable I would say and um so over the last year I've been refining this model to make it more accessible and more uh user friendly to the average person and so um okay the image is generating as you can see and um um let me see so it seems to have gotten all the text i don't see any typos which is good okay it's still amazing to me to see uh and you know image generation with perfect text it shouldn't be that impressive but somehow we've been waiting for this for so long and every time it happens it's like wow that's so cool yeah and and the number of things that like this image had to get right in the instructions like the you know what you want focused on not like the that it should be a point of view image and where we are and then sort of to get you know having the text like like that's just uh this is still amazing to me yeah and point of view images are actually really hard to do and this kind of looks like what we see right now it looks like you were just taking it yeah all right well I am going to begin my demo by taking a selfie of all of us so give me a nice expression yeah okay and I'm going to ask chat GPT to make it into an anime frame so in this case it's not just getting the context of my text prompt but it's also getting this image and it can use both of these to produce a really nice image for us and this is possible because we train for as an omniodel so you know it's a model of not just language but images audio all modalities in and out it understands them it can generate them and it can you know seamlessly work across these things and we have spent a lot of effort to you know make useful products like first advanced voice mode where audio just works seamlessly and now this where images just work seamlessly across the board it is so cool that we're finally getting towards this truly integrated multimodal model that just does everything yeah and in this case you know it gives the user a lot more control because you know I might want a specific style or I might want to use a specific previous image I have or you know a design palette or something and they can provide all of this context to chat GPT you can just use all of this and you know produce the thing you want it becomes more controllable oh okay all right we can you know you're already seeing the sky behind us the plants by the way this goes live today in Chachi PT and Sora um I think we're already started so if you also want to make an anime version of yourself you can you can now do that yeah I think it's already out to all pro and uh plus should be done pretty soon amazing nice it'll be available to free users too i I see I have my little beard there i see your expressions and my perfect the hand sign and my hand sign too nice what should we do next with this can we make a meme out of it oo make it into a meme that's on game speaker notes what do you want to um you know one of the like common memes inside of OpenAI is feel the AGI i have no idea what AI will think about that but let's try it i do feel the AGI yeah and in this case right that animating is so good yeah um and in this case you know the model is seeing all of the past context as well and you know it uses all of its knowledge of you know language and beams and everything to give us a new rendition and this multi-turn nature makes it even more useful to people right like I can ask for any edit I want if it gets it wrong I can just be like "Hey you know fix that thing." I think that you know is taking us into a direction of making these more like tools not toys for people and I think I'm really excited by that speaking of memes how much like how much do you think for knows about like common internet memes in general like if we had picked I I think it knows a lot and in fact when we first put this out to you know people inside Open Air most of what we got was memes from people maybe Gabe can tell you more about that yeah I mean you know uh memes were like one of the number one use cases for this model in our internal version and yeah I was just thinking about you know memes and why this use case you know kind of struck a chord with the company and I what I realized is that you know as in the last nine months as I've been working on this model uh I've been doing this kind of like meditative exercise where I sort of like uh look at all the images around me and I realized I'm just surrounded by you know hundreds of images maybe a day and you know all these images you know Not necessarily the most aesthetic or beautiful images but they were all created with intent they were all you know like memes they were all created to you know to persuade to inform to educate these are the workhorse images that you know comprise our everyday life and what I'm very excited is that I'll be able to be giving this power to create workhouse images to everyone in the world in Chbt speaking of this power um we are giving a much higher degree of creative expression and creative freedom than we normally do um and so what we'd like is for the model to not be offensive if you don't want it to be but if you want it to be within reason really let people create what they need and what they want to what they want and uh I you know we may not get the line there perfectly on day one but we think given what Gabe just said we want to lean pretty far into creative freedom and let people get maximum utility out of this model and uh you know we're excited to see what people will do with it yeah me too let's look at the meme we got that's great um okay so thank you guys very much and we're going to welcome a few other research and product people to show some more stuff unless either of you have anything else no thanks thank you okay so in addition to building uh you know all the great research that went into this we really wanted to work hard to make it a great product experience as well um and so if my colleagues want to introduce themselves maybe starting with Alan we will then show you a few more things hi I'm Alan and I'm a research scientist at OpenAI hi my name is Bench i'm an engineer on ch hi my name is Lou i'm a research scientist at OpenAI so uh as our models get more capable uh their knowledge of the world is deepening uh but so far they've really only been able to express themselves in either text or code and I think what's really exciting about this release is that now these models can actually visualize what they know and externalize it in a visual way so the prompt that I'm going to try is make a collable page of manga describing the theory of relativity and just for fun we'll ask it to add some humor how well do you find that the model understands like visual humor versus just funny text i think that given that this prompt is like so vague it'll be interesting to see you know what kind of wild cardy stuff uh the model comes up with um this is really just like it leveraging the world knowledge that it has writing maybe an extended version of the prompt and then giving us a nice image uh but you know if you have a much more detailed sense of the kind of story that you want to convey in this kind of thing like a manga or an image or in general you can definitely do that this model is very good at following instructions and in the blog post that we just put out there's a lot of nice examples of uh of how you can do exactly that um by the way images are much slower than previous uh our previous image generation thing but like unbelievably better we think it's super super worth the wait uh we also will be able to make it faster over time um but you know it's just it's like quite the uh the ratio of quality to time we think is is already great yeah um and it looks like it's given us not only some English but a different language here but yeah I think in general um we're hoping that this model's ability to not only generate images but also blend in precise text in the right ways uh makes it not only a tool for imagination but also for for learning and for communication it did add some humor yeah i like the layout yeah definitely quite colorful on the software that's beautiful thanks Alan so Alan just showed us how much this model can shine in professional and educational environments but what I love the most about this model is how accessible it is to everyone for someone like me who don't have professional artistic skills but still enjoys expressing my creativity to show you what I meant I've prepared something special let's kick it off um so I was inspired by this trading card in my hand that I got from our Sora launch and I thought we it will be really cool if we can design a new one in the same style for photo image generation so I took a photo of it in the morning this is what it looks like but instead of having the giant cat king here I would like to have my dog Sani to be the main character and this is a photo of my dog he's cute um and I've also included a couple details that I would like to see on the card including the name of the model the year and some ability where like I could highlight and also the weight and height for Sanji and let's see what the model comes up with why is the giant cat king Visora i have no idea but I feel the the trading card Dora was designed by some professional designer so it would be amazing if we can actually use our model to generate that yeah I think our model has come a long way in terms of just very precise text rendering so it'll be super cool to see how well it does with this detailed instruction can I see the original card yes oh it's very nice it looks like it's already reviewed we should do these for every launch these are cool i guess now we can make them with a machine yeah we should definitely do that yeah and yeah Sanji is snowboarding which is something I've never seen him doing in real life but it'll be cool the text is also very crisp yes yes and got it got all the stats correct that's amazing um thank you for letting me share this little creative moment with you and now I'm excited to pass it on to Lou to show you more innovative ways of using our product yeah sure i'm very happy to share that with everyone today so we've seen the generation from Alan and Monta so today I'm going to do something very special here so I'm going to make a memorable coin based on the generations from these two and also another two pictures that is in our background so I'm going to first copy the pictures from Alan and also the pictures from Mchao and the rest of the two are the background we show here in the demo so I would like to also use a special hex code here so as you can see this special hex code is a spring color because for and this launch both launched in spring so I would like to it to be a unique color for us and also I would like to include the text for image gen and today's date on this memorial cone so we can make a souvenir for us for today so you can see this model is trained in non auto reggressive way so it is able to understand both text and multiple images in context and seamlessly render it in a very harmonious way in a coin so it's able to can you imagine how this coin will look like based on this not easily from that but I'm excited to see yeah yeah me too me too that's what I'm thinking of also so we are seeing here so we have the fur image gen and we have the bear that is the artistic bear there the radio there and also Alan's manga and I was still amazing Sanji that's so cool cool that's very cool i want one of those yes I agree so now I'm I'm going to make it a transparency background because we really wanted this coin to be printed out so we can have this coin physically for us so as you can see the model not only can understand context in one turn it is also understanding the context across multiple terms in context so from today we can just chat to chat GBT in a more visual way and uh this is just a very simple example i make a transparency background you can also talk to the model for example imagine how will this coin look like on the back side or we can make a unique color for Alan and Munchchow and me to have a different unique color for each one and other than making the background transparent how good will it be at keeping the actual coin itself consistent between the two yeah it's very good at keeping the editing consistent so that is also to see you can use CHP from today to do image editing and image refinement in Chad and using a very chatty language cool here so we see the coin here and it's in transparent background now and it's keeping the consistency between the previous generation that's awesome yeah it's very cool well we're so excited to get this out to the world uh it goes live today in Chacht and Sora it will come to the API soon we we really think this is a huge step forward in what AI models are capable of doing visually and we cannot wait to see what you all will create thank you very much and again congrats cheers thank you thank you [Music]

Transcript for:Launch of Image Generation in ChatGPT

Transcript for:
Launch of Image Generation in ChatGPT