Google I/O 2023 AI Innovations Overview

Google's ambitions and artificial intelligence. Google launches Gemini. AI is rolling out to work. And it's completely changing the way we work.

You know, a lot has happened in a year. There have been new beginnings. We found new ways to find new ideas and new solutions to age-old problems.

Sorry about your shirt. We dreamt of things. Never too old for a treehouse.

We trained for things. Alright, let's go, go, go! And learned about...

this thing. We found new cats. Took the next step. And made the big leap. Cannonball!

We fill days like they were weeks and more happened in months than has happened in years Three eggs Things got bigger like way bigger And it wasn't all just for him or for her It was for everyone And you know what? We're just getting started. Hi everyone, good morning. Welcome to Google I.O. It's great to have all of you with us.

We have a few thousand developers with us here today at Shoreline. Millions more are joining virtually around the world. Thanks to everyone for being here.

For those of you who haven't seen I.O. before, it's basically Google's version of the era store, but with fewer costume changes. At Google, though, we are fully in our Gemini era. You'll hear a lot about that today.

Before we get into it, I want to reflect on this moment we are in. We've been investing in AI for more than a decade and innovating at every layer of the stack. Research, product, infrastructure, we're going to talk about it all today.

Still, we are in the very early days of the AI platform shift. We see so much opportunity ahead for creators, for developers, for startups, for everyone. Helping to drive those opportunities is what our Gemini era is all about. So let's get started.

A year ago on this stage, we first shared our plans for Gemini, a frontier model built to be natively multimodal from the very beginning that could reason across text, images, video, code, and more. It's a big step in turning any input into any output, an IO for a new generation. Since then, we introduced the first Gemini models, our most capable yet. They demonstrated state-of-the-art performance on every multimodal benchmark, and that was just the beginning.

Two months later, we introduced Gemini 1.5 Pro, delivering a big breakthrough in long context. It can run 1 million tokens in production consistently, more than any other large-scale foundation model yet. We want... everyone to benefit from what Gemini can do.

So we have worked quickly to share these advances with all of you. Today, more than 1.5 million developers use Gemini models across our tools. You're using it to debug code, get new insights, and build the next generation of AI applications.

We've also been bringing Gemini's breakthrough capabilities across our products in powerful ways. We'll show examples today. across search, photos, workspace, Android, and more. Today, all of our 2 billion user products use Gemini, and we've introduced new experiences too, including on mobile, where people can interact with Gemini directly through the app.

Now available on Android and iOS and through Gemini Advanced, which provides access to our most capable models. Over 1 million people have signed up to try it in just three months. and it continues to show strong momentum. One of the most exciting transformations with Gemini has been in Google search.

In the past year, we've answered billions of queries as part of her search generative experience. People are using it to search in entirely new ways and asking new types of questions, longer and more complex queries, even searching with photos and getting back the best the web has to offer. We've been testing this experience outside of labs and we are encouraged to see not only an increase in search usage, but also an increase in user satisfaction. I'm excited to announce that we will begin launching this fully revamped experience, AI Overviews, to everyone in the US this week and will bring it to more countries soon. There's so much innovation happening in search.

Thanks to Gemini, we can create much more powerful search experiences. including within our products. Let me show you an example in Google Photos. We launched Google Photos almost nine years ago.

Since then, people have used it to organize their most important memories. Today, that amounts to more than six billion photos and videos uploaded every single day. And people love using photos to search across their life. With Gemini, you're making that a whole lot easier. Say you're at a parking station ready to pay.

but you can't recall your license plate number. Before, you could search photos for keywords, then scroll through years'worth of photos, looking for the right one. Now, you can simply ask photos.

It knows the cars that appear often, it triangulates which one is yours, and just tells you the license plate number. And Ask Photos can also help you search your memories in a deeper way. For example, you might be reminiscing about your daughter Lucia's early milestones.

You can ask photos, when did Lucia learn to swim? You can even follow up with something more complex. Show me how Lucia's swimming has progressed.

Here, Gemini goes beyond a simple search, recognizing different contexts from doing laps in the pool, to snorkeling in the ocean, to the text and dates on our swimming certificates, and photos packages it up all together in a summary, you can really take it all in and relive amazing memories all over again. We are rolling out Ask Photos this summer with more capabilities to come. Unlocking knowledge across formats is why we build Gemini to be multimodal from the ground up.

It's one model with all the modalities built in. So not only does it understand each type of input, it finds connections between them. Multimodality radically expands the questions we can ask and the answers we will get back.

Long context takes this a step further, enabling us to bring in even more information. Hundreds of pages of text, hours of audio. A full hour of video or entire code repos, or if you want, roughly 96 cheesecake factory menus.

For that many menus, you need a 1 million token context window, now possible with Gemini 1.5 Pro. Developers have been using it in super interesting ways. Let's take a look.

I remember the announcement, the 1 million token context window, and my first reaction was, there's no way they were able to achieve this. I wanted to test its technical skills. So I uploaded a line chart.

It was temperatures between Tokyo and Berlin and how they vary across the 12 months of the year. So I got in there and I threw in the Python library that I was really struggling with. And I just asked it a simple question. And it nailed it.

It could find specific references to comments in the code and specific requests that people had made and other issues that people had had but then suggest a fix for it that related to what I was working on. I immediately tried to kind of crash it so I took you know four or five research papers I had on my desktop and It's a mind-blowing experience when you add so much text and then you see the kind of amount of tokens you added is not even at half the capacity. It felt a little bit like Christmas because you saw things kind of peppered up to the top of your feed about like, oh wow, I built this thing.

Or, oh, it's doing this and I would have never expected. Can I shoot a video of my possessions and turn that into a searchable database? So I ran to my bookshelf and I shot a video just panning my camera along the bookshelf. And I fed the video into the model. It gave me the titles and authors of the books, even though the authors weren't visible on those book spines.

And on the bookshelf, there was a squirrel nutcracker sat in front of the book, truncating the title. You could just see the word, Sight C, and it still guessed the correct book. The range of things you can do with that is almost unlimited.

And so at that point, for me, it was just like a click, like, this is it. I thought, like, I had, like, a super power in my hands. It was poetry.

It was beautiful. I was so happy. It's just, this is going to be amazing. This is going to help people. This is kind of where the future of language models are going.

Personalized to you, not because you trained it to be personal to you, but personal to you because you can give it such a vast understanding of who you are. We've been rolling out Gemini 1.5 Pro with long context in preview over the last few months. We made a series of quality improvements across translation, coding, and reasoning. You'll see these updates reflected in the model starting today. I'm excited to announce that we are bringing this improved version of Gemini 1.5 Pro to all developers globally.

In addition today, Gemini 1.5 Pro with 1 million context is now directly available for consumers in Gemini Advanced and can be used across 35 languages. 1 million tokens is opening up entirely new possibilities. It's exciting, but I think we can push ourselves even further.

So today, we are expanding the context window to 2 million tokens. We are making it available for developers in private preview. It's amazing to look back and see just how much progress we have made in a few months.

This represents the next step on our journey towards the ultimate goal of infinite context. Okay, so far we have talked about two technical advances, multimodality and long context. Each is powerful on its own, but together they unlock deeper capabilities and more intelligence.

Let's see how this comes to life with Google Workspace. People are always searching their emails in Gmail. We are working to make it much more powerful with Gemini.

Let's look at how. As a parent, you want to know everything that's going on with your child's school. Okay, maybe not everything, but you want to stay informed. Gemini can help you keep up. Now we can ask Gemini to summarize all recent emails from the school.

In the background, it's identifying relevant emails, even analyzing attachments like PDFs, and you get a summary of the key points and action items. So helpful. Maybe you were traveling this week and you couldn't make the PTA meeting. The recording of the meeting is an hour long. If it's from Google Meet, you can ask Gemini to give you the highlights.

There's a parents group looking for volunteers. You're free that day. Of course, Gemini can draft a reply. There are countless other examples of how this can make life easier. Gemini 1.5 Pro is available today in Workspace Labs, and Aparna will share more later on.

We just looked at an example with... text outputs. But with a multi-modal model, we can do so much more. To show you an early demo of an audio output in Notebook LM, here's Josh.

Hey everyone. Last year at I.O., we introduced Notebook LM, a research and writing tool grounded in the information you give it. Since then, we've seen a lot of momentum with students and teachers using it.

And today... Gemini 1.5 Pro is coming to Notebook LM and it's great. Let me show you. So here we are in Notebook LM.

You can load it up with all the materials here on the left. In this notebook, I've been using it with my younger son and I've added some of his science worksheets, a few slide decks from his teacher, and even an open source textbook full of charts and diagrams. With 1.5 Pro, it instantly creates...

this notebook guide with a helpful summary and can generate a study guide, an FAQ, or even quizzes. But for my son Jimmy, he really learns best when he can listen to something. So we've prototyped a new feature with Gemini and it's called audio overviews.

Notebook LM is going to take all the materials on the left as input and output them into a lively science discussion personalized for him. Let's take a listen. So let's dive into physics.

What's on deck for today? Well, we're starting with the basics. Force and motion.

And that, of course, means we have to talk about Sir Isaac Newton and his three laws of motion. Ah, yes. The foundation for understanding how objects move and interact. Ah, yes. This is where multimodal really shines.

Now, it generated this audio discussion based on that text material. And what's amazing is that my son and I can join into the conversation and steer it whichever direction we want. When I tap, join.

Hold on, we have a question. What's up, Josh? Yeah, can you give my son Jimmy a basketball example? Hey, Jimmy, that's a fantastic idea. Basketball is actually a great way to visualize force and motion.

Let's break it down. Okay, so first, imagine a basketball just sitting there. on the court.

It's not moving, right? That's because all the forces acting on it are balanced. The downward pull of gravity... Pretty cool, right?

I gotta say, the first time my son heard this, he should have seen how big his eyes got because he was gripped. They were talking to him. He was learning science through the example of basketball, his favorite sport. Now what's interesting is under the hood, you saw that Gemini had used some of the concepts of gravity, Sir Isaac Newton, but nothing in there was about basketball.

It connected the dots and created that age-appropriate example for him. And this is what's becoming possible with the power of Gemini. You can give it lots of information in any format, and it can be transformed in a way that's personalized and interactive for you. Back to you Sundar. Thanks, Jor.

The demo shows the real opportunity with multimodality. Soon you'll be able to mix and match inputs and outputs. This is what we mean when we say it's an IO for a new generation. And I can see you all out there thinking about the possibilities. But what if we could go even further?

That's one of the opportunities we see with AI agents. Let me take a step back and explain what I mean by that. I think about them as intelligent systems that show reasoning, planning, and memory, are able to think multiple steps ahead, work across software and systems, all to get something done on your behalf, and most importantly, under your supervision.

We are still in the early days, and you'll see glimpses of our approach throughout the day. But let me show you the... kinds of use cases we are working hard to solve. Let's start with shopping. It's pretty fun to shop for shoes and a lot less fun to return them when they don't fit.

Imagine if Gemini could do all the steps for you. Searching your inbox for the receipt, locating the order number from your email, filling out a return form, and even scheduling a pickup. That's much easier, right? Let's take another example that's a bit more complex. Say you just moved to Chicago.

You can imagine Gemini and Chrome working together to help you do a number of things to get ready. Organizing, reasoning, synthesizing on your behalf. For example, you will want to explore the city and find services nearby, from dry cleaners to dog walkers.

You'll have to update your new address across dozens of websites. Gemini can work across these tasks and will prompt you for more information when needed. So you're always in control. That part is really important as we prototype these experiences.

We are thinking hard about how to do it in a way that's private, secure, and works for everyone. These are simple use cases, but they give you a good sense of the types of problems you want to solve by building intelligent systems that think ahead, reason, and plan all on your behalf. The power of Gemini with multimodality long context, and agents brings us closer to our ultimate goal, making AI helpful for everyone.

We see this as how we will make the most progress against our mission, organizing the world's information across every input, making it accessible via any output, and combining the world's information with the information in your world in a way that's truly useful for you. To fully realize the benefits of AI, SUNDAR PICHAI-We'll continue to break new ground. Google DeepMind is hard at work. To share more, please welcome for the first time on the I.O. stage, Sir Demis.

DEMIS HADDAD-Thanks, Sundar. It's so great to be here. Ever since I was a kid playing chess for the England junior team, I've been thinking about the nature of intelligence.

I was captivated by the idea of a computer that could think like a person. It's ultimately why I became a programmer and studied neuroscience. I co-founded DeepMind in 2010 with the goal of one day building AGI, Artificial General Intelligence, a system that has human-level cognitive capabilities. I've always believed that if we could build this technology responsibly, its impact would be truly profound and it could benefit humanity in incredible ways. Last year, we reached a milestone on that path when we formed Google DeepMind, combining AI talent from across the company into one super unit.

Since then, we've built AI systems that can do an amazing range of things, from turning language and vision into action for robots, navigating complex virtual 3D environments, solving Olympiad-level math problems, and even discovering thousands of new materials. Just last week, we announced our next-generation AlphaFold model. It can predict the structure and interactions of nearly all of life's molecules, including how proteins interact with strands of DNA and RNA. This will accelerate vitally important biological and medical research from disease understanding to drug discovery.

And all of this was made possible with the best infrastructure for the AI era, including our highly optimized tensor processing units. At the center of our efforts is our Gemini model. It's built up from the ground up to be natively multimodal because that's how we interact with and understand the world around us. We've built a variety of models for different use cases.

You've seen how powerful Gemini 1.5 Pro is, but we also know from user feedback that some applications need lower latency and a lower cost to serve. So today we're introducing Gemini 1.5 Flash. Flash is a lighter weight model compared to Pro. It's designed to be fast and cost efficient to serve at scale while still featuring multimodal reasoning capabilities and breakthrough long context.

Flash is optimized for tasks where low latency and efficiency matter most. Starting today, you can use 1.5 Flash and 1.5 Pro with up to 1 million tokens in Google AI Studio and Vertex AI. And developers can sign up to try 2 million tokens.

We're so excited to see what all of you will create with it. And you'll hear a little more about Flash later on from Josh. We're very excited by the progress we've made so far with our family of Gemini models, but we're always striving to push the state of the art even further. At any one time, we have many different models in training, and we use our very large and powerful ones to help teach and train our production-ready models.

Together with user feedback, this cutting-edge research will help us to build amazing new products for billions of people. For example, in December, we shared a glimpse into the future of how people would interact with multimodal AI, and how this would end up powering a new set of transformative experiences. Today we have some exciting new progress to share about the future of AI assistance that we're calling Project Astra. For a long time we've wanted to build a universal AI agent that can be truly helpful in everyday life.

Our work making this vision a reality goes back many years. It's why we made Gemini multimodal from the very beginning. An agent like this has to understand and respond to our complex and dynamic world just like we do.

It would need to take in and remember what it sees so it can understand context and take action. And it would have to be proactive, teachable and personal, so you can talk to it naturally without lag or delay. While we've made some great strides in developing AI systems that can understand multimodal information, getting response time down to something conversational is a difficult engineering challenge. Building on our Gemini model, we've developed agents that can process information faster by continuously encoding video frames, combining the video and speech input into a timeline of events, and caching this for efficient recall.

We've also enhanced how they sound with a wider range of intonations. These agents better understand the context you're in and can respond quickly in conversation, making the pace and quality of interaction feel much more natural. Here's a video of our prototype which we'll see has two parts. Each part was captured in a single take in real time. Okay, let's do some tests.

Tell me when you see something that makes sound. I see a speaker which makes sound. What is that part of the speaker called?

That is the tweeter. It produces high frequency sounds. Give me a creative alliteration about these. Creative crayons color cheerfully.

They certainly craft colorful creations. What does that part of the code do? This code defines encryption and decryption functions. It seems to use AESCBC encryption to encode and decode data based on a key and an initialization vector, IV.

That's right. What neighborhood do you think I'm in? This appears to be the King's Cross area of London. It is known for its railway station and transportation connections. Do you remember where you saw my glasses?

Yes, I do. Your glasses were on the desk near a red apple. What can I add here to make this system faster?

Adding a cache between the server and database could improve speed. What does this remind you of? Schrödinger's cat.

All right, give me a band name for this duo. Golden stripes. Nice. Thanks, Gemini. I think you'll agree it's amazing to see how far AI has come, especially when it comes to spatial understanding, video processing, and memory.

It's easy to envisage a future where you can have an expert assistant by your side through your phone or new exciting form factors like glasses. Some of these agent capabilities will come to Google products like Gemini app later this year. For those of you on site today, you can try out a live demo version of this experience in the AI sandbox area.

Next, let's take a look at how our innovations are helping people bring new creative ideas to life. Today we're introducing a series of updates across our generative media tools with new models covering image, music and video. Over the past year we've been enhancing quality, improving safety and increasing access.

To help tell this story, here's Doug. Thanks, Demis. Over the past few months, we've been working hard to build a new image generation model from the ground up, with stronger evaluations, extensive red teaming, and state-of-the-art watermarking with SynthID. Today, I'm so excited to introduce Imagine3.

It's our most capable image generation model yet. Imagine3 is more photorealistic. You can literally count the whiskers on its snout. With richer details, like this incredible sunlight in the shot, and fewer visual artifacts or distorted images.

It understands prompts written the way people write. The more creative and detailed you are, the better. And Imagine 3 remembers to incorporate small details, like the wildflowers or small blue bird, in this longer prompt. Plus, this is our best model yet for rendering text, which has been a challenge for image generation models. In side-by-side comparisons, independent evaluators preferred Imagine 3 over other popular image generation models.

In sum, Imagine 3 is our highest quality image generation model so far. You can sign up today to try Imagine 3 in ImageFX, part of our suite of AI tools at labs.google, and it'll be coming soon to developers and enterprise customers in Vertex AI. Another area full of creative possibility is generative music. I've been working in this space for over 20 years, and this is by far the most exciting year of my career.

We're exploring ways of working with artists to expand their creativity with AI. Together with YouTube, we've been building Music AI Sandbox, a suite of professional music AI tools that can create new instrumental sections from scratch, transfer styles between tracks, and more. To help us design and test them, we've been working closely with incredible musicians, songwriters, and producers.

Some of them made even entirely new songs in ways that would have not been possible without these tools. Let's hear from some of the artists we've been working with. I'm gonna put this right back into the Music AI tool. The same boom boom boom boom boom.

What happens if Haiti meets Brazil? Dude, I have no clue what's about to be spread out. This is what excites me.

As a hip-hop producer, we dug in the crates. We play in these vinyls, and the part where there's no vocal, we pull it, we sample it, and we create an entire song around that. So right now, we're digging in the infinite crate.

It's endless. Where I found the AI really useful for me, this way to fill in the sparser elements of my loops. Okay, let's try bongos.

We're going to put viola. We're going to put rhythmic clapping. And we're going to see what happens there. Woo!

And it makes it sound, ironically, at the end of the day, a little more human. So then this is entirely Google's loops right here. These are gloops.

It's like having this weird friend that's just like, try this, try that. And then you're like, oh, okay, yeah, no, that's pretty dope. Yeah, groupin'up, I said I'm groupin', groupin', I groupin'up, I said go out. The tools are capable of speeding up the process of what's in my head, getting it out.

You're able to move light speed with your creativity. This is amazing. That right there?

I think this really shows what's possible when we work with the artist community on the future of music. You can find some brand new songs from these acclaimed artists and songwriters on their YouTube channels now. There's one more area I'm really excited to share with you. Our teams have made some incredible progress in generative video.

Today, I'm excited to announce our newest, most capable generative video model called Veo. Vio creates high-quality 1080p videos from text, image and video prompts. It can capture the details of your instructions in different visual and cinematic styles.

You can prompt for things like aerial shots of a landscape or time lapse, and further edit your videos using additional prompts. You can use Vio in our new experimental tool called VideoFX. We're exploring features like storyboarding and generating longer scenes. Vio gives you unprecedented creative control.

Techniques for generating static images have come a long way, but generating video is a different challenge altogether. Not only is it important to understand where an object or subject should be in space, it needs to maintain this consistency over time, just like the car in this video. VIA builds upon years of our pioneering generative video model work, including GQN, Fanaki, Walt, VideoPoet, Lumiere, and much more. We combine the best of these architectures and techniques to improve consistency, quality, and output resolution.

To see what Veo can do, we put it in the hands of an amazing filmmaker. Let's take a look. Well, I've been interested in AI for a couple of years now.

We got in contact with some of the people at Google and they had been working on something of their own. So we're all meeting here at Gilgal Farms to make a short film. The core technology is Google DeepMind's generative video model that has been trained to convert. input text into output video.

It looks good. We are able to bring ideas to life that were otherwise not possible. We can visualize things on a timescale that's 10 or 100 times faster than before.

When you're shooting, you can't reiterate as much as you wish. And so we've been hearing that feedback that it allows for like more, more optionality. More iteration, more improvisation. But that's what's cool about it, it's like you can make a mistake faster. That's all you really want at the end of the day.

At least in art, it's just to make mistakes fast. So, using Gemini's multimodal capabilities to optimize the model training process, Vio is able to better capture the nuance from prompts. So this includes cinematic techniques and visual effects, giving you total creative control. Everybody's going to become a director, and everybody should be a director.

Because at the heart of all of this is just storytelling. The closer we are to being able to tell each other our stories, the more we'll understand each other. These models are really enabling us to be more creative and to share that creativity with each other.

Over the coming weeks... Some of these features will be available to select creators through video effects at labs.google, and the wait list is open now. Of course, these advances in generative video go beyond the beautiful visuals you've seen today.

By teaching future AI models how to solve problems creatively, or in effect, simulate the physics of our world, we can build more useful systems that can help people communicate in new ways and thereby advance the frontiers of AI. When we first began this journey to build AI more than 15 years ago, we knew that one day it would change everything. Now that time is here, and we continue to be amazed by the progress we see and inspired by the advances still to come on the path to AGI.

Thanks, and back to you Sundar. Thanks, Dennis. A huge amount of innovation is happening at Google DeepMind. It's amazing how much progress we have made in a year. Training state-of-the-art models requires a lot of computing power.

Industry demand for ML compute has grown by a factor of one million in the last six years, and every year it increases tenfold. Google was built for this. For 25 years, we have invested in world-class technical infrastructure.

From the cutting-edge hardware that powers search to our custom tensor processing units that power our AI advances, Gemini was trained and served entirely on our 4th and 5th generation TPUs. And other leading AI companies like Anthropic have trained their models on TPUs as well. Today, we are excited to announce the 6th generation of TPUs called Trillion.

Trillium delivers a 4.7x improvement in compute performance per chip over the previous generation. It's our most efficient and performant TPU today. We'll make Trillium available to our cloud customers in late 2024. Alongside our TPUs, we are proud to offer CPUs and GPUs to support any workload. That includes the new Axion processors we announced last month, our first custom ARM-based CPU with industry-leading performance and energy efficiency. We are also proud to be one of the first cloud providers to offer NVIDIA's cutting-edge Blackwell GPUs.

Available in early 2025. We are fortunate to have a long-standing partnership with NVIDIA, excited to bring Blackwell's capabilities to our customers. Chips are a foundational part of our integrated end-to-end system. From performance-optimized hardware and open software to flexible consumption models, this all comes together in our AI hypercomputer. A groundbreaking supercomputer architecture.

Businesses and developers are using it to tackle more complex challenges with more than twice the efficiency relative to just buying the raw hardware and chips. Our AI hypercomputer advancements are made possible in part because of our approach to liquid cooling in our data centers. We've been doing this for nearly a decade, long before it became state-of-the-art for the industry. And today our total deployed fleet capacity for liquid cooling systems is nearly one gigawatt and growing. That's close to 70 times the capacity of any other fleet.

Underlying this is the sheer scale of our network, which connects our infrastructure globally. Our network spans more than two million miles of terrestrial and subsea fiber over 10 times the reach of the next leading cloud provider. We'll keep making the investments necessary to advance AI innovation and deliver state-of-the-art capabilities. And one of our greatest areas of investment and innovation is in our founding product, Search. 25 years ago, we created Search to help people make sense of the waves of information moving online.

With each platform shift, We have delivered breakthroughs to help answer your questions better. On mobile, we unlock new types of questions and answers using better context, location awareness, and real-time information. With advances in natural language understanding and computer vision, we enable new ways to search with your voice or a hum to find your new favorite song or an image of that flower you saw on your walk.

Now you can even circle to search. Those cool new shoes you might want to buy. Go for it. You can always return them later.

Of course, Search in the Gemini era will take this to a whole new level. Combining our infrastructure strengths, the latest AI capabilities, our high bar for information quality, our decades of experience connecting you to the richness of the web, the result is a product that does the work for you. Google Search This generative AI at the scale of human curiosity, and it's our most exciting chapter of Search Yet.

To tell you more, here's Liz. Thanks, Sundar. With each of these platform shifts, we haven't just adapted. We've expanded what's possible with Google Search. And now, with generative AI, search will do more for you than you ever imagined.

So whatever's on your mind, and whatever you need to get done, just ask. And Google will do the Googling for you. All the advancements you'll see today are made possible by a new Gemini model, customized for Google Search.

What really sets this apart is our three unique strengths. First, our real-time information with over a trillion facts about people, places, and things. Second, our unparalleled ranking and quality systems, trusted for decades to get you the very best of the web.

And third, the power of Gemini, which unlocks new agent capabilities right in Search. By bringing these three things all together, we're able to dramatically expand what's possible with Google Search. Yet again, this is search in the Gemini era. So let's dig in.

You've heard today about AI overviews and how helpful people are finding them. With AI Overviews, Google does the work for you. Instead of piecing together all the information yourself, you can ask your question.

And as you see here, you can get an answer instantly, complete with a range of perspectives and links to dive deeper. As Sundar shared, AI Overviews will begin rolling out to everyone in the US, starting today, with more countries soon. And by the end of the year, AI Overviews will come to over a billion people in Google Search.

But this is just the first step. We're making AI overviews even more helpful for your most complex questions, the type that are really more like 10 questions in one. You can ask your entire question with all its sub-questions and get an AI overview in seconds.

To make this possible, we're introducing multi-step reasoning in Google Search, so Google can do the researching for you. For example, let's say you've been trying to get into yoga and pilates. Finding the right studio can take a lot of research.

There's so many factors you need to consider. Soon you'll be able to ask search to find the best yoga or pilates studios in Boston and show you details on their intro offers and the walking time from Beacon Hill. As you can see here, Google gets to work for you, finding the most relevant information and bringing it together into your AI overview.

You get some studios with great ratings and their introductory offers. And you can see the distance for each. Like this one, it's just a 10-minute walk away. Right below, you see where they're located, laid out visually.

And you've got all this from just a single search. Under the hood, our custom Gemini model acts as your AI agent, using what we call multi-step reasoning. It breaks your bigger question down into all its parts. and it figures out which problems it needs to solve and in what order. And thanks to our real-time info and ranking expertise, it reasons using the highest quality information out there.

So since you're asking about places, it taps into Google's index of information about the real world with over 250 million places and updated in real time, including their ratings, reviews, business hours, and more. Research that might have taken you minutes or even hours, Google can now do on your behalf in just seconds. Next, let me show you another way multi-step reasoning in Google Search can make your life that much easier.

Take planning, for example. Dreaming of trips and meal plans can be fun, but doing the work of actually figuring it all out? No, thank you. With Gemini and Search, Google does the planning with you. Planning is really hard for AI to get right.

It's the type of problem that takes advanced reasoning and logic. After all, if you're meal planning, you probably don't want mac and cheese for breakfast, lunch, and dinner. Okay, my kids might, but say you're looking for a bit more variety. Now you can ask Search to create a three-day meal plan for a group that's easy to prepare.

And here you get a plan with a wide range of recipes from across the web. This one for overnight oats looks particularly interesting, and you can easily head over to the website to learn how to prepare them. If you want to get more veggies in, you can simply ask Search to swap in a vegetarian dish. And just like that, Search customizes your meal plan. And you can export your meal plan or get the ingredients as a list just by tapping here.

Looking ahead, you can imagine asking Google to add everything to your preferred shopping cart. Then we're really cooking. These planning capabilities mean Search will be able to help plan everything from meals and trips to parties, dates, workout routines, and more. So you can get all the fun of planning without any of the hassle.

You've seen how Google Search can help with increasingly complex questions and planning. But what about all those times when you don't know exactly what to ask and you need some help brainstorming? When you come to search for ideas, you'll get more than an AI-generated answer. You'll get an entire AI-organized page, custom-built for you and your question.

Say you're heading to Dallas to celebrate your anniversary, and you're looking for the perfect restaurant. What you get here breaks AI out of the box, and it brings it to the whole page. Our Gemini model uncovers the most interesting angles for you to explore and organizes these results.

into these helpful clusters. Like you might never have considered restaurants with live music, or ones with historic charm. Our model even uses contextual factors like the time of the year.

So since it's warm in Dallas, you can get rooftop patios as an idea. And it pulls everything together into a dynamic whole page experience. You'll start to see this new AI-organized search results page when you look for inspiration. Starting with dining and recipes and coming to movies, music, books, hotels, shopping, and more.

Today you've seen how you can bring any question to search. And Google takes the work out of searching. But your questions aren't limited to words in a text box. And sometimes even that picture can't tell the whole story. Earlier, Demis showed you our latest advancements in video understanding, and I'm really excited to share that soon you'll be able to ask questions with video right in Google Search.

Let me introduce Rose to show you this in a live demo.. Thank you, Liz. I have always wanted to be a record player.

And I got this one and some vinyls at Yarsel recently. But um, when I go to play it, this thing keeps sliding off. I have no idea how to fix it or where to even start.

Before, I would have pieced together a bunch of searches to try to figure this out. Like, what make is this record player? What's the model?

And what is this thing actually called? But now, I can just ask with a video. So let's try it.

Let's do a live demo. I'm going to take a video and ask Google, why will this not stay in place? And in a near instant, Google gives me an AI overview. I guess some reasons this might be happening and steps I can take to troubleshoot. So let's like first, this is called a tonger.

Very helpful. And it looks like it may be unbalanced. And there's some really helpful steps here. And I love that because I'm new to all this. I can check out this helpful link from Audio-Technica to learn even more.

So that was pretty quick. Let me walk you through what just happened. Thanks to a combination of our state-of-the-art speech models, our deep visual understanding, and our custom Gemini model, Search was able to understand the question I asked out loud, break down the video frame by frame, each frame, was fed into Gemini's long context window you heard about earlier today, so search could then pinpoint the exact make and model of my record player and make sense of the motion across frames to identify the tonearm was drifting.

Search fanned out and combed the web to find relevant insights from articles, forums, videos, and more. And it stitched all of this together into my AI overview. The result was music to my ears.

Back to you, Liz. Everything you saw today is just a glimpse of how we're reimagining Google Search in the Gemini era. We're taking the very best of what makes Google, Google, all the reasons why billions of people turn to Google Search and have relied on us for decades.

And we're bringing in the power of Gemini's agent of capabilities. So Google will do the searching, the researching, the planning, the brainstorming, and so much more. All you need to do is just ask.

You'll start to see these features rolling out in Search in the coming weeks. Opt into Search Labs to be among the first to try them out. Now let's take a look at how this all comes together in Google Search this year. I got that new, new, new, new, new thing.

I got that extra, got the real deal. When I just do, Do my thing, do my thing, do my thing. I got that hustle, that straight up swag.

I got that no, no, no, no, no, no thing. Do my thing, do my thing. I just do it like that, do my thing I got the new show No, no, no, no Since last May, we've been hard at work making Gemini for Workspace even more helpful for businesses and consumers across the world. Tens of thousands of customers have been using HelpMeWrite, HelpMeVisualize, HelpMeOrganize since we launched.

And now, we're really excited that the new Gemini-powered side panel will be generally available next month. One of our customers is a local favorite right here in California, Sports Basement. They rolled out Gemini for Workspace to the organization, and this has helped improve the productivity of their customer support team by more than 30%. Customers love how Gemini grows participation in meetings with automatic language detection and real-time captions now expanding. To 68 languages.

We are really excited about what Gemini 1.5 Pro unlocks for Workspace and AI Premium customers. Let me start by showing you three new capabilities coming to Gmail Mobile. This is my Gmail account. Okay, there's an email up top from my husband. Help me sort out the roof repair thing, please.

Now, we've been trying to find a contractor to fix our roof, and with work travel, I have clearly dropped the ball. It looks like there's an email thread on this with lots of emails that I haven't read. And luckily for me, I can simply tap the summarize option up top and skip reading this long back and forth.

Now, Gemini pulls up this helpful mobile card as an overlay, and this is where I can read a nice summary of all the salient information that I need to know. So I see here that we have a quote from Jeff at Green Roofing, and he's ready to start. Now, I know we had other bids, and I don't remember the details.

Previously, I would have had to do a number of searches in Gmail and then remember and compare information across different emails. Now, I can simply type out my question right here in the mobile card and say something like, Compare my roof repair bids by price and availability. This new Q&A feature makes it so easy to get quick answers on anything in my inbox. For example, when are my shoes arriving? Or what time do doors open for the Knicks game?

without having to first search Gmail, then open the email, and then look for the specific information and attachments, and so on. Anyway, back to my roof. It looks like Gemini has found details that I got from two other contractors in completely different email threads, and I have this really nicely organized summary, and I can do a quick comparison.

So it seems like Jeff's quote was right in the middle, and he can start immediately, so green roofing it is. I'll open that last email from Jeff and confirm the project. And look at that.

I see some suggested replies from Gemini. Now, what is really, really neat about this evolution of smart reply is that it's contextual. Gemini understood the back and forth in that thread and that Jeff was ready to start. So offers me a few customized options based on that context. So, you know, here I see I have declined the service, suggest a new time.

I'll choose proceed and confirm time. I can even see a preview of the full reply simply by long pressing. This looks reasonable, so I'll hit send.

These new capabilities in Gemini and Gmail will start rolling out this month to Labs users. Okay, so one of the really neat things about workspace apps like Gmail, Drive, Docs, Calendar, is how well they work together. And in our daily lives, we often have information that flows from one app to another. Like, say, adding a calendar entry from Gmail, or creating reminders from a spreadsheet tracker. But what if Gemini could make these journeys totally seamless?

Perhaps even automate them for you entirely. So let me show you what I mean with a real-life example. My sister is a self-employed photographer, and her inbox is full of appointment bookings, receipts, client feedback on photos, and so much more. Now, if you're a freelancer or a small business, you really want to focus on your craft and not on bookkeeping and logistics. So let's go to her inbox and take a look.

Lots of unread emails. click on the first one. It's got a PDF that's an attachment from a hotel as a receipt. And I see a suggestion in the side panel. Help me organize and track my receipts.

Let's click on this prompt. The side panel now will show me more details about what that really means. And as you can see, there's two steps here.

Step one, create a Drive folder and put this receipt and 37 others it's found into that folder. Make sense? Step two, extract the relevant information from those receipts in that folder into a new spreadsheet. Now, this sounds useful.

Why not? I also have the option to edit these actions or just hit OK. So let's hit OK. Gemini will now complete the two steps described above, and this is where it gets even better. Gemini offers you the option to automate this so that this particular workflow is run on all future emails.

Keeping your drive folder and expense sheet up to date with no effort from you. Now we know that creating a complex spreadsheet like this can be daunting for most people. But with this automation, Gemini does the hard work of extracting all the right information from all the files and in that folder and generates this sheet for you. So let's take a look.

Okay, it's super well organized. And it even has a category for expense type. Now, we have this sheet. Things can get even more fun.

We can ask Gemini questions. Questions like, show me where the money is spent. Gemini not only analyzes the data from the sheet, but also creates a nice visual to help me see the complete breakdown by category.

And you can imagine how this extends to all... sorts of use cases in your inbox like travel expenses, shopping, remodeling projects, you name it. All of that information in Gmail can be put to good use and help you work, plan, and play better.

Now this particular ability to organize your attachments and drive and generate a sheet and do data analysis via Q&A. We'll be rolling out to labs users this September, and it's just one of the many automations that we're working on in Workspace. Workspace in the Gemini era will continue to unlock new ways of getting things done.

We're building advanced agentive experiences, including customizing how you use Gemini. Now, as we look to 2025 and beyond, we're exploring entirely new ways of working. with AI. Now with Gemini, you have an AI-powered assistant always at your side.

But what if you could expand how you interact with AI? For example, when we work with other people, we mention them in comments and docs, or we send them emails, we have group chats with them, etc. And it's not just how we collaborate with each other, but we each have a specific role to play in the team. And as the team works together, we build a set of collective experiences and context to learn from each other. We have the combined set of skills to draw from when we need help.

So how could we introduce AI into this mix and build on this shared expertise? Well, here's one way. We're prototyping a virtual Gemini-powered teammate.

This teammate has an identity, a workspace account, along with a specific role and objective. Let me bring Tony up to show you what I mean. Hey, Tony.

Hey, Aparna. Hey, everyone. Okay, so let me start by showing you how we set up this virtual teammate. As you can see, the teammate has his very own account, and we can go ahead and give it a name. We'll do something fun like Chip.

Chip's been given a specific job role. at the set of descriptions on how to be helpful for the team. You can see that here.

And some of the jobs are to monitor and track projects. We've listed a few out, to organize information and provide context, and a few more things. Now that we've configured our virtual teammate, let's go ahead and see Chip in action. To do that, I'll switch us over here to Google Chat. First, when planning for an event like I.O., we have a ton of chat rooms for various purposes.

Luckily for me, Chip is in all of them. To quickly catch up, I might ask a question like, anyone know if our I-O storyboards are approved? Because we've instructed Chip to track this project, Chip searches across all the conversations and knows to respond with an answer.

There it is. Simple, but very helpful. Now, as the team adds Chip to more group chats, more files, more email threads.

Chip builds a collective memory of our work together. Let's look at an example. To show you, I'll switch over to a different room. How about Project Sapphire over here?

And here we are discussing a product release coming up, and as usual, many pieces are still in flight, so I can go ahead and ask, are we on track for launch? Chip gets to work not only searching through everything it has access to, but also synthesizing what's found and coming back with an up-to-date response. There it is.

A clear timeline, a nice summary, and notice even in this first message here, Chip flags a potential issue the team should be aware of. Because we're in a group space, everyone can follow along, anyone can jump in at any time, as you see someone just did, asking Chip to help create a doc to help address the issue. A task like this could take me hours. Dozens of hours, Chip can get it all done in just a few minutes, sending the doc over right when it's ready.

And so much of this practical helpfulness comes from how we've customized Chip to our team's needs and how seamlessly this AI is integrated directly into where we're already working. Back to you, Aparna. Thank you, Tony.

Now, I can imagine a number of virtual types of... a number of different types of virtual teammates configured by businesses to help them do what they need. Now we have a lot of work to do to figure out how to bring these agent of experiences like virtual teammates into workspace, including enabling third parties to make their very own versions of Chip.

We're excited about where this is headed, so stay tuned. And as Gemini and its capabilities continue to evolve, we're diligently bringing that power directly into Workspace to make all our users more productive and creative both at home and at work. And now over to Sissy to tell you more about Gemini App. Our vision for the Gemini app is to be the most helpful personal AI assistant by giving you direct access to Google's latest AI models. Gemini can help you learn, create, code, and anything else you can imagine.

And over the past year, Gemini has put Google's AI in the hands of millions of people with experiences designed for your phone and the web. We also launched Gemini Advanced, our premium subscription for access to the latest AI innovations from Google. Today, we'll show you how Gemini is delivering our most intelligent AI experience. Let's start with the Gemini app. which is redefining how we interact with AI.

It's natively multimodal, so you can use text, voice, or your phone's camera to express yourself naturally. And this summer, you can have an in-depth conversation with Gemini using your voice. We're calling this new experience Live.

Using Google's latest speech models, Gemini can better understand you and answer naturally. You can even interrupt. while Gemini is responding, and it will adapt to your speech patterns. And this is just the beginning. We're excited to bring the speed gains and video understanding capabilities from Project Astra to the Gemini app.

When you go live, you'll be able to open your camera so Gemini can see what you see and respond to your surroundings in real time. Now, the way I use Gemini isn't the way you use Gemini. So we're rolling out a new feature that lets you customize it for your own needs and create personal experts on any topic you want. We're calling these Gems.

They're really simple to set up. Just tap to create a Gem, write your instructions once, and come back whenever you need it. For example, here's a gem that I created that acts as a personal writing coach. It specializes in short stories with mysterious twists, and it even builds on the story drafts in my Google Drive. I call it the cliffhanger curator.

Now, gems are a great time saver when you have specific ways that you want to interact with Gemini again and again. Gems will roll out in the coming months, and our trusted testers... are already finding so many creative ways to put them to use.

They can act as your yoga bestie, your personal sous chef, a brainy calculus tutor, a peer reviewer for your code, and so much more. Next, I'll show you how Gemini is taking a step closer to being a true AI assistant by planning and taking actions for you. Now, we all know that chatbots can give you ideas for your next vacation.

But there's a lot more that goes into planning a great trip. It requires reasoning that considers space-time logistics and the intelligence to prioritize and make decisions. That reasoning and intelligence all come together in the new trip planning experience in Gemini Advanced. Now it all starts with a prompt. Okay, so here we go.

We're going to Miami. My son loves art. My husband loves seafood.

And our flight and hotel details are already in my Gmail inbox. Now, there's a lot going on in that prompt. Everyone has their own things that they want to do.

To make sense of these variables, Gemini starts by gathering all kinds of information from search and helpful extensions like Maps and Gmail. It uses that data to create a dynamic graph of possible travel options, taking into account all of my priorities and constraints. The end result is a personalized vacation plan presented in Gemini's new dynamic UI.

Now based on my flight information, Gemini knows that I need a two and a half day itinerary. And you can see how Gemini uses spatial data to make decisions. Our flight lands in the late afternoon, so Gemini skips a big activity that day and finds a highly rated seafood restaurant close to our hotel.

Now on Sunday, we have a jam-packed day. I like these recommendations, but my family likes to sleep in, so I tap to change the start time, and just like that, Gemini adjusted my itinerary for the rest of the trip. It moved our walking tour to the next day and added lunch options near the Street Art Museum to make the most of our Sunday afternoon. This looks great.

It would have taken me hours of work, Checking multiple sources, figuring out schedules, and Gemini did this in a fraction of the time. This new trip planning experience will be rolling out to Gemini Advanced this summer, just in time to help you plan your own Labor Day weekend. Alright, we saved the best for last.

You heard Sundar say earlier that starting today, Gemini Advanced subscribers get access to Gemini 1.5 Pro with 1 million tokens. That is the longest context window of any chatbot in the world. It unlocks incredible new potential in AI, so you can tackle complex problems that were previously unimaginable.

You can upload a PDF up to 1,500 pages long, or multiple files to get insights across a project. And soon you can upload as much as 30,000 lines of code or even an hour-long video. Gemini Advanced is the only chatbot that lets you process this amount of information. Now just imagine how useful this will be for students. Let's say you've spent months on your thesis and you could really use a fresh perspective.

You can upload your entire thesis, your sources, your notes, your research. And soon interview audio recordings and videos too. So Gemini has all of this context to give you actionable advice.

It can dissect your main points, identify improvements, and even role play as your professor. So you can feel confident in your work. And check out what Gemini Advanced can do with your spreadsheets, with the new data analysis feature launching in the coming weeks. Maybe you have a side hustle selling handcrafted products. But you're a better artist than accountant, and it's really hard to understand which products are worth your time.

Simply upload all of your spreadsheets and ask Gemini to visualize your earnings and help you understand your profit. Gemini goes to work calculating your returns and pulling its analysis together into a single chart so you can easily understand which products are really paying off. Now, behind the scenes, Gemini writes custom Python code to crunch these numbers. And of course your files are not used to train our models. Oh, and just one more thing, later this year, we'll be doubling the long context window to 2 million tokens.

We absolutely can't wait for you to try all of this for yourself. Gemini is continuing to evolve and improve at a breakthrough pace. We're making Gemini more multimodal, more agentive, and more intelligent, with the capacity to process the most information of any chatbot in the world. And as you heard earlier, we're also expanding Gemini Advance to over 35 supported languages available today. But of course, what makes Gemini so compelling?

is how easy it is to do just about anything you want with a simple prompt. Let's take a look. Help me think of titles, might tell all a memoir.

What's something smart I can say about Renoir? Generate another image of a cat playing guitar. If a girl calls me a snack, how do I reply?

Yeah, that's how it works, you're doing AI. Make this email sound more professional before I hit send. What's a good excuse to cancel dinner with my friends? We're literally sitting right here. There's no wrong way to prompt Yeah, you're doing A-R There's no wrong way to prompt Just prompt your prompt in the prompt bar Or just generate an image of a cat playing guitar You know I can do other stuff, right?

Hi everyone! It's great to be back at Google I.O. Today, you've seen how AI is transforming our products across Gemini, Search, Workspace, and more. We're bringing all these innovations right onto your Android phone, and we're going even further to make Android the best place to experience Google AI. This new era of AI is a profound opportunity to make smartphones truly smart.

Our phones have come a long way in a short time, but if you think about it, it's been years since the user experience has fundamentally transformed. This is a once-in-a-generation moment. to reinvent what phones can do. So we've embarked on a multi-year journey to reimagine Android with AI at the core.

And it starts with three breakthroughs you'll see this year. First, we're putting AI-powered search right at your fingertips, creating entirely new ways to get the answers you need. Second, Gemini is becoming your new AI assistant on Android.

there to help you anytime. And third, we're harnessing on-device AI to unlock new experiences that work as fast as you do while keeping your sensitive data private. Let's start with AI-powered search.

Earlier this year, we took an important first step at Samsung Unpacked by introducing Circle to Search. It brings the best of search directly into the user experience. So you can go deeper on Anything you see on your phone without switching apps. Fashionistas are finding the perfect shoes.

Home chefs are discovering new ingredients. And with our latest update, it's never been easier to translate whatever's on your screen, like a social post in another language. And there are even more ways Circle to Search can help. One thing we've heard from students is that they're doing more of their schoolwork directly on their phones and tablets.

So we thought, could Circle the Search be your perfect study buddy? Let's say my son needs help with a tricky physics word problem, like this one. My first thought is, oh boy, it's been a while since I've thought about kinematics. If he's stumped on this question, instead of putting me on the spot, he can circle the exact part he's stuck on and get step-by-step instructions, right where he's already doing the work.

Ah. Of course, final velocity equals initial velocity plus acceleration times elapsed time. Right. I was just about to say that. Seriously though, I love how it shows how to solve the problem, not just the answer.

This new capability is available today. And later this year, Circle to Search will be able to tackle more complex problems involving symbolic formulas, diagrams, graphs, And more. Circle to Search is only on Android.

It's available on more than 100 million devices today, and we're on track to double that by the end of the year. You've already heard from Sissy about the incredible updates coming to the Gemini app. On Android, Gemini is so much more. It's becoming a foundational part of the Android experience.

Here's Dave to share more. Hey, everyone. A couple of months ago, we launched Gemini on Android. And like Circle to Search, Gemini works at the system level.

So instead of going to a separate app, I can bring Gemini right to what I'm doing. Now we're making Gemini context aware, so it can anticipate what you're trying to do and provide more helpful suggestions in the moment. In other words, to be a more helpful assistant. So let me show you how this works.

And I have my shiny new Pixel 8a here to help me. So my friend Pete is asking if I want to play pickleball this weekend. And I know how to play tennis, sort of. I had to say that for the demo.

But I'm new to this pickleball thing. So I'm gonna reply and try to be funny, and I'll say, is that like tennis but with pickles? This would be actually a lot funnier with a meme, so let me bring up Gemini to help with that. And I'll say, create image of tennis with pickles.

Now, one you think you'll notice is that the Gemini window now hovers in place above the app so that I stay in the flow. Okay, so that generated some pretty good images. What's nice is I can then drag and drop any of these directly into the messages app below.

So, like so. Cool, let me send that. Alright, so Pete was typing and he says, he sent me a video on how to play pickleball.

Alright, thanks Pete, let's tap on that. That launches YouTube. But you know, I only have one or two burning questions about the game.

And I can bring up Gemini to help with that. And because it's context aware, Gemini knows I'm looking at a video, so it proactively shows me an Ask This Video chip. So let me tap on that, and now I can ask specific questions about the video. So for example, what is, is, can't type, the to bounce rule?

Because that's something that I've heard about but don't quite understand in the game. By the way, this uses signals like YouTube's captions, which means you can use it on billions of videos. So give it a moment and there. I get a nice, succinct answer. The ball must bounce once on each side of the court after a serve.

Okay, cool. Let me go back to messages. And Pete's followed up.

And he says, you're an engineer, so here's the official rule book for pickleball. Okay, thanks, Pete. Pete's very helpful, by the way.

Okay, so we tap on that, launches a PDF. That's an 84-page PDF. I don't know how much time Pete thinks I have.

Anyway, us engineers, as you all know, like to work smarter, not harder. So instead of trolling through this entire document, I can pull up Gemini to help. And again, Gemini anticipates what I need and offers me an ask this PDF option.

So if I tap on that, Gemini now ingests all of the rules to become a pickleball expert. And that means I can ask very esoteric questions like, for example, R, spin. serves allowed.

And let's hit that, because I've heard that rule may be changing. Now, because I'm a Gemini advanced user, this works on any PDF and takes full advantage of the long context window. And there's just lots of times when that's useful. For example, let's say you're looking for a quick answer in an appliance user manual.

And there you have it. It turns out, nope, spin serves are not allowed. So Gemini not only gives me a clear answer to my question, it also shows me exactly where in the PDF to learn more. Awesome. OK, so that's a few of the ways that we're enhancing Gemini to be more context-aware and helpful in the moment.

And what you've seen here are the first really many new ways that Gemini will unlock new experiences at the system level. And they're only available on Android. You'll see these and more coming to hundreds of millions of devices over the next couple of months. Now, building Google AI directly into the OS elevates the entire smartphone experience. And Android is the first mobile operating system to include a built-in, on-device foundation model.

This lets us bring Gemini goodness from the data center right into your pocket. So the experience is faster while also protecting your privacy. Starting with Pixel later this year, We'll be expanding what's possible with our latest model, Gemini Nano with multi-modality.

This means your phone can understand the world the way you understand it. So not just through text input, but also through sights, sounds, and spoken language. Let me give you an example. 2.2 billion people experience blindness or low vision. So several years ago, we developed TalkBack, an accessibility feature that helps people navigate their phone through touch and spoken feedback.

Helping with images is especially important. In fact, my colleague Caro, who uses TalkBack, will typically come across 90 unlabeled images per day. Thankfully, TalkBack makes them accessible. And now we're taking that to the next level with the multimodal capabilities of Gemini Nano. So when someone sends Caro a photo, she'll get a richer and clearer description of what's happening.

Or let's say Caro is shopping online for an outfit. Now she can get a crystal clear description of the style and cut to find the perfect look. Running Gemini.nano on-device helps minimize the latency, and the model even works when there's no network connection.

These improvements to TalkBack are coming later this year. Let me show you another example of what on-device AI can unlock. People lost more than $1 trillion to fraud last year.

And as scams continue to evolve across techs, phone calls, and even videos, Android can help protect you from the bad guys, no matter how they try to reach you. So let's say I get rudely interrupted by a not-known caller right in the middle of my presentation. Hello?

Hi, I'm calling from SafeMore Bank's security department. Am I speaking to Dave? Yeah, this is Dave.

Kind of in the middle of something. We've detected some suspicious activity on your account. It appears someone is trying to make unauthorized charges.

Oh yeah, what kind of charges? We can't give you specifics over the phone, but to protect your account... I'm going to help you transfer your money to a secure account we've set up for you.

And look at this. My phone gives me a warning that this call might be a scam. Gemini Nano alerts me the second it detects suspicious activity, like a bank asking me to move my money to keep it safe.

And everything happens right on my phone. So the audio processing stays completely private to me and on my device. We're currently testing this feature and we'll have more updates to share later this summer.

And we're really just scratching the surface of the kinds of fast, private experiences that on-device AI unlocks. Later this year, Gem and I will be able to more deeply understand the content of the screen without any information leaving your phone, thanks to the on-device model. So remember that. pickleball example earlier, Gemini and Android will be able to automatically understand the conversation and provide relevant suggestions, like where to find pickleball clubs near me. And this is a powerful concept that will work across many apps on your phone.

In fact, later today at the developer keynote, you'll hear about how we're empowering our developer community with our latest AI models and tools like Gemini Nano and Gemini in Android Studio. Also, Stay tuned tomorrow for our upcoming Android 15 updates, which we can't wait to share with you. As we said at the outset, we're reimagining Android with Gemini at the core.

From your favorite apps to the OS itself, we're bringing the power of AI to every aspect of the smartphone experience. And with that, let me head over to Josh to share more on our use for developers. Thank you. Thanks, Dave. It's amazing to see Gemini Nano do all of that directly on an Android phone.

That was our plan all along, to create a natively multimodal Gemini in a range of sizes, so you all, as developers, can choose the one that works best for you. Throughout the morning, you've heard a lot about our Gemini 1.5 series. And I want to talk about the two models you can access today. 1.5 Pro, which is getting a series of quality improvements. that go out right about now, and the brand new 1.5 Flash.

Both are available today globally in over 200 countries and territories. You can go over to AI Studio or Vertex AI if you're a Google Cloud customer to give them a try. Now, both of these models are natively multimodal.

That means you can interleave text, images, audio, and video as inputs. and pack that massive 1 million token context window. And if you go to ai.googledev today, you can sign up to try the 2 million token context window for 1.5 Pro. And we're also adding a bunch of new developer features, starting with video frame extraction.

That's going to be in the Gemini API. Parallel function calling, so you can return more than one function call at a time. And my favorite, context caching.

So you can send all of your files to the model once and not have to resend them over and over again. That should make the long context even more useful and more affordable. It ships next month.

Now we're using Google's infrastructure to serve these models. So developers like all of you can get great prices. 1.5 Pro.

is $7 per 1 million tokens. And I'm excited to share that for prompts up to 128K, it'll be 50% less for $3.50. And 1.5 Flash will start at $0.35 per 1 million tokens.

Now, one thing you might be wondering is which model is best for your use case. Here's how we've been thinking about it on the team. We use 1.5 Pro for complex tasks, where you really want the highest quality response. And it's okay if it takes a little bit longer to come back. We're using 1.5 Flash for quick tasks, where the speed of the model is what matters the most.

And as a developer, you can go try them both out today and see what works best for you. Now, I'm going to show you how it works here in AI Studio. the fastest way to build with Gemini.

We'll pull it up here, and you can see this is AI Studio. It's free to use. You don't have to configure anything to get going.

You just go to aistudio.google.com, log in with your Google account, and you can just pick the model here in the right that works best for you. So one of the ways we've been using 1.5 Flash is to actually learn from customer feedback about some of our Labs products. Flash makes this possible with its low latency. So what we did here is we just took a bunch of different feedback from our customer forums.

You can put it into Flash, load up a prompt, and hit run. Now in the background, what it's going to do is it's going to go through that 93,000 token pile of information. And you can see here, start streaming it back. Now this is really helpful because it pulls out the themes for us.

It gives us all the right places where we can start to look. And you can see this is from some of the benefits from Notebook LM, like we showed earlier. Now, what's great about this is that you can take something like this in AI Studio, prototyped here in 10 seconds, and with one click in the upper left, get an API key, or over here in the upper right, just tap Get Code.

And you've got all of the model configurations, the safety settings, ready to go straight into your IDE. Now, over time, if you find that you need more enterprise-grade features, you can use the same Gemini 1.5 models and the same configurations right in Vertex AI. That way, you can scale up with Google Cloud as your enterprise needs grow. So that's our newly updated Gemini 1.5 Pro and the new 1.5 Flash, both of which are available today globally. And you'll hear a lot more about them in the developer keynote later today.

Now let's shift gears and talk about Gemma, our family of open models which are crucial for driving AI innovation and responsibility. Gemma is built from the same research and technology as Gemini. It offers top performance and comes in lightweight 7B and 2B sizes. Now since it launched less than three months ago, it's been downloaded millions of times across all the major model hubs. Developers and researchers have been using it and customizing the base Gemma model and using some of our pre-trained variants like Recurrent Gemma and Code Gemma.

And today's newest member, PolyGemma, our first vision language open model, and it's available right now. It's optimized. for a range of image captioning, visual Q&A, and other image labeling tasks.

So go give it a try. I'm also too excited to announce that we have GEMMA 2 coming. It's the next generation of GEMMA, and it will be available in June. One of the top requests we've heard from developers is for a bigger GEMMA model. But it's still going to fit in a size that's easy for all of you to use.

So in a few weeks, we'll be adding a new 27 billion parameter model to GEMMA 2. And here's what's great about it. This size is optimized by NVIDIA to run on next-gen GPUs and can run efficiently on a single TPU host in Vertex AI. So this quality to size ratio is amazing because it'll outperform models more than twice its size.

We can't wait to see what you're going to build with it. To wrap up, I want to share this inspiring story from India, where developers have been using Gemma and its unique tokenization to create Navarasa, a set of instruction-tuned models to expand access to 15 Indic languages. This builds on our efforts to make information accessible in more than 7,000 languages around the world. Take a look. Language is a very interesting problem to solve actually.

Given India has a huge variety of languages and it changes every five kilometers. When technology is developed for a particular culture, it won't be able to solve and understand the nuances of a country like India. One of Gemma's features is an incredibly powerful tokenizer, which enables the model to use hundreds of thousands of words, symbols, and characters across so many alphabets and language systems.

This large vocabulary is critical to adapting Gemma to power projects like Navarasa. Navarasa is a model that's trained for Indic languages. It's a fine-tuned model based on Google's Gemma.

We built Navarasa to make large language models culturally rooted. where people can talk in their native language and get the responses in their native language. Our biggest dream is to build a model to include everyone from all corners of India. We need a technology that can be used by everyone. Today, the language that you speak in could be the tool and the technology that you use for solving your...

real-world problems. And that's the power of generative AI that we want to bring to every corner of India and the entire world. Listening to everything that's been announced today, it's clear that AI is already helping people from their everyday tasks to their most ambitious, productive and imaginative endeavors.

Our AI innovations like multi modality, long context and agents are at the cutting edge of what this technology can do. Take it to a whole new level, its capacity to help people. Yet, as with any emerging technology, there are still risks and new questions that will arise as AI advances and its uses evolve. In navigating these complexities, we're guided by our AI principles, and we're learning from our users, partners, and our own research.

To us, building AI responsibly means both addressing the risks, and maximizing the benefits for people and society. Let me begin with what we're doing to address the risks. Here I want to focus on how we are improving our models and protecting against their misuse.

Beyond what Demis shared earlier, we're improving our models with an industry standard practice called red teaming, in which we test our own models and try to break them to identify weaknesses. Adding to this work We're developing a cutting-edge technique we call AI-assisted red teaming. This draws on Google DeepMind's gaming breakthroughs like AlphaGo, where we train AI agents to compete against each other and improve and expand the scope of their red teaming capabilities. We're developing AI models with these capabilities to help address adversarial prompting and limit problematic outputs.

We're also improving our models with feedback from two important groups, thousands of internal safety experts with a range of disciplines and a range of independent experts from academia to civil society. Both groups help us identify emerging risks from cybersecurity threats to potentially dangerous capabilities in areas like chem bio. Combining human insight with our safety testing methods will help make our models and products more accurate, reliable, and safer.

This is particularly important as technical advances like better intonation make interactions with AI feel and sound more human-like. We're doing a lot of research in this area, including the potential for harm and misuse. We're also developing new tools to help prevent the misuse of our models. For example, Imagine3 and Veo create more realistic imagery and videos. We must also consider how they might be misused to spread misinformation.

To help, last year we introduced SynthID. A tool that adds imperceptible watermarks to our AI-generated images and audio so that they're easier to identify. Today, we're expanding SynthID to two new modalities, text and video.

These launches build on our efforts to deploy state-of-the-art watermarking capabilities across modalities. Moving forward, we'll keep integrating advances like watermarking and other emerging techniques. to secure our latest generations of Gemini, Imagine, Lyria, and VIA models.

We're also committed to working with the ecosystem, with all of you, to help others build on the advances we're making. And in the coming months, we'll be open sourcing SynthID text watermarking. This will be available in our updated Responsible Generative AI Toolkit, which we created to make it easier for developers to build AI responsibly. We're also collaborating with C2PA, and we support C2PA collaborating with Adobe, Microsoft, startups, and many others to build and implement standards that improve the transparency of digital media. Now, let's turn to the second and equally important part of our responsible AI approach, how we're building AI to benefit people and society.

Today, our AI advances are helping to solve real-world problems. like accelerating the work of 1.8 million scientists in 190 countries who are using AlphaFold to work on issues like neglected diseases, helping predict floods in more than 80 countries, and helping organizations like the United Nations track progress of the world's 17 sustainable development goals with data commons. And now, generative AI is unlocking new ways for us to make the world's information and knowledge universally accessible and useful for learning.

Billions of people already use Google products to learn every day. And generative AI is opening up new possibilities, allowing us to ask questions like, what if everyone, everywhere, could have their own personal AI tutor on any topic? Or what if every educator could have their own assistant in the classroom? Today marks a new chapter for learning and education at Google.

I'm excited to introduce LearnLM, our new family of models based on Gemini and fine-tuned for learning. LearnLM is grounded in educational research, making learning experiences more personal and engaging. And it's coming to the products you use every day, like Search, Android, Gemini, and YouTube.

In fact, You've already seen the Leder lamp on stage today when it helps Samir with his son's homework on Android. Now let's see how this works in the Gemini app. Earlier, Sissy introduced Gems, custom versions of Gemini that can act as personal assistive experts on any topic. We're developing some pre-made Gems which will be available in the Gemini app and web experience, including one called Learning Coach.

With Learning Coach, you can get step-by-step study guidance, along with helpful practice and memory techniques designed to build understanding rather than just give you the answer. Let's say you're a college student studying for an upcoming biology exam. If you need a tip to remember the formula for photosynthesis, Learning Coach can help.

Learning Coach, along with other pre-made gems, will launch in Gemini in the coming months. And you can imagine what features like Gemini Live can unlock for learning. Another example is a new feature in YouTube that uses LearnLM to make educational videos more interactive, allowing you to ask a clarifying question, get a helpful explanation, or take a quiz.

This even works for those long lectures or seminars, thanks to Gemini Model's long context capabilities. This feature in YouTube is already rolling out to select Android users. As we work to extend LearnLM beyond our own products, we're partnering with experts and institutions like Columbia Teachers College, Arizona State University, and Khan Academy to test and improve the new capabilities in our models for learning.

And we've collaborated with MIT RAISE to develop an online course to help educators better understand and use generative AI. We're also working directly with educators to build more helpful generative AI tools with LearnLM. For example, in Google Classroom, we're drawing on the advances we've heard about today to develop new ways to simplify and improve lesson planning, and enable teachers to tailor lessons and content to meet the individual needs of their students.

Standing here today makes me think back to my own time as an undergraduate. AI was considered speculative, far from any real world uses. Today we can see how much is already real, how much it is already helping people from their everyday tasks to their most ambitious, productive and imaginative endeavors, and how much more is still to come. This is what motivates us.

I'm excited about what's ahead and what we'll build with all of you. Back to you Sundar. Thanks, James. All of this shows the important progress we have made as we take a bold and responsible approach to making AI helpful for everyone.

Before we wrap, I have a feeling that someone out there might be counting how many times we have mentioned AI today. And since the big theme today has been letting Google do the work for you, We went ahead and counted so that you don't have to. That might be a record in how many times someone has said AI. I'm tempted to say it a few more times, but I won't. Anyhow, this tally is more than just a punchline.

It reflects something much deeper. We've been AI first in our approach for a long time. Our decades of research leadership have pioneered many of the modern breakthroughs that power AI progress for us and for the industry.

On top of that, we have world-leading infrastructure built for the AI era. Cutting-edge innovation and search now powered by Gemini. Products that help an extraordinary scale, including 15 products with over half a billion users, and platforms that enable everyone, partners, customers, creators, and all of you to invent the future.

This progress is only possible because of our incredible developer community. You're making it real through the experiences you build every day. So to everyone here in ShowLine, and the millions more watching around the world, Here's to the possibilities ahead and creating them together. Thank you. This reminds you of......Frodinger's cat.

Wow! Okay! When all of these tools come together, it's a powerful combination. It's amazing.

That's amazing. It's an entire suite of different kind of possibilities. Hi, I'm Gemini. What neighborhood do you think I'm in?

This appears to be the King's Cross area of London. Together we're creating a new era.

Transcript for:Google I/O 2023 AI Innovations Overview

Transcript for:
Google I/O 2023 AI Innovations Overview