AI Innovations and Trends by Andrew Ng

Please welcome Andrew Ng. Thank you. It's such a good time to be a builder, so I'm excited to be back here at Snowflake Build. What I'd like to do today is share with you what I think are some of AI's biggest opportunities. You may have heard me say that I think AI is the new electricity. That's because AI is a general purpose technology, like electricity. If I ask you what is electricity good for, it's always hard to answer because it's good for so many different things. And new AI technology is creating a huge set of opportunities for us to build new applications that weren't possible before. People often ask me, hey Andrew, where are the biggest AI opportunities? This is what I think of as the AI stack. At the lowest level is the semiconductors. And then on top of that, a lot of the cloud infra tools, including, of course, Snowflake. And then on top of that are many of the foundation models, trainers and models. And it turns out that a lot of the media hype and excitement and social media buzz has been on these layers of the stack, kind of the new technology layers. Whenever there's a new technology like generative AI, a lot of the buzz is on these technology layers, and there's nothing wrong with that. But I think that almost by definition, there's another layer of the stack. that has to work out even better. And that's the application layer. Because we need the applications to generate even more value and even more revenue so that, you know, to be able to afford to pay the technology providers below. So I spend a lot of my time thinking about AI applications. And I think that's where a lot of the best opportunities will be to build new things. One of the trends that has been growing for the last couple of years, in no small part because of generative AI, is faster and faster machine learning model development. And in particular, generative AI is letting us build things faster than ever before. Take the problem of, say, building a sentiment classifier, you know, taking text and deciding this is a positive or negative sentiment for reputation monitoring, say. Typical workflow using supervised learning might be that it will take a month to get some label data, and then, you know, train an AI model, that might take a few months. and then find a cloud service or something to deploy it on, that'll take another few months. And so for a long time, very valuable AI systems might take good AI teams six to 12 months to build. And there's nothing wrong with that. I think many people create a very valuable AI systems this way. But with generative AI, there's certain classes of applications where you can write a prompt in days and then deploy it in, you know, again, maybe days. And what this means is there are a lot of applications that used to take me and used to take very good AI teams months to build that today you can build in maybe 10 days or so. And this opens up the opportunity to experiment with, build new prototypes and ship new AI products. That's certainly the prototyping aspect of it. And these are some of the consequences of this trend, which is fast experimentation is becoming a more promising path to invention. Previously, if it took six months to build something, then we better study it, make sure there's user demand, have product managers really look at it, document it, and then spend all that effort to build, and then hopefully it turns out to be worthwhile. But now, for fast-moving AI teams, I see a design pattern where you can say, you know what? It'll take us a weekend to throw together a prototype. Let's build 20 prototypes and see what sticks. And if 18 of them don't work out, we'll just ditch them and stick with what works. So fast iteration and fast... experimentation is becoming a new path to inventing new user experiences. One interesting implication is that evaluations or evals for short are becoming a bigger bottleneck for how we build things. So it turns out back in supervised learning world, if you're collecting 10,000 data points anyway to train a model, then you know if you needed to collect an extra 1,000 data points for testing, it was fine. Whereas the extra 10% increase in cost. But for a lot of large language model-based apps, if there's no need to have any creating data, if you made me slow down to collect a thousand test examples, boy, that seems like a huge bottleneck. And so the new development workflow often feels as if we're building and collecting data more in parallel rather than sequentially, in which we'll build a prototype and then as it becomes more important and as robustness and reliability becomes more important, then we gradually build up that test data in parallel. But I see exciting innovations to be had still in how we build evals. And then what I'm seeing as well is The prototyping of machine learning has become much faster, but building a software application has lots of steps. Does the product work? Does the design work? Does the software integration work? Does the plumbing work? Then after deployment, DevOps and LLops. So some of those other pieces are becoming faster, but they haven't become faster at the same rate that the machine learning modeling part has become faster. So we take a process and one piece of it becomes much faster. What I'm seeing is prototyping. It's not really, really fast, but sometimes if you could prototype into robust, reliable production with guardrails and so on, those other steps still take some time. But the interesting dynamic I'm seeing is the fact that the machine learning part is so fast, it's putting a lot of pressure on organizations to speed up all of those other parts as well. So that's been exciting progress for our field. And in terms of how machine learning development is speeding things up, I think the mantra... move fast and break things got a bad rep because, you know, it broke things. I think some people interpret this to mean we shouldn't move fast, but I disagree with that. I think the better mantra is move fast and be responsible. And I'm seeing a lot of teams able to prototype quickly, evaluate and test robustly. So without shipping anything out to the wider world that could cause damage and cause meaningful harm, I'm finding smart teams able to... build really quickly and move really fast, but also do this in a very responsible way. And I find this exhilarating that you can build things and ship things in a responsible way, much faster than ever before. Now, there's a lot going on in AI and of all the things going on in AI, in terms of technical trend, the one trend I'm most excited about is agentic AI workflows. And so if you were to ask, what's the one most important AI technology to pay attention to, I would say it's agentic AI. I think when I started saying this near the beginning of this year, it was a bit of a controversial statement. But now the word AI agents has become so widely used by technical and non-technical people. It's become a little bit of a hype-y term. So let me just share with you how I view AI agents and why I think they're important, approaching this from a technical perspective. The way that most of us use large language models today is with what's sometimes called zero-shot prompting. And that roughly means we would ask it to give it a prompt, write an essay, or write an output for us. And it's a bit like if we're going to a person, or in this case, going to an AI and asking it to type out an essay for us by going from the first word, writing for the first word to the last word, all in one go, without ever using backspace. It's just right from start to finish like that. And it turns out, people, you know, we don't do our best writing this way, but despite the difficulty of being forced to write this way, our large language models do, you know, not bad, pretty well. Here's what an agentic workflow is like. To generate an essay, we ask an AI to first write an essay outline and ask it, do you need to do some web research? If so, let's download some web pages and put it into the context of the large language model. Then let's write the first draft, and then let's read the first draft and critique it, and revise the draft, and so on. And this workflow looks more like doing some thinking or some research and then some revision, and then going back to do more thinking and more research. And by going around this loop over and over, it takes longer, but this results in a much better work output. So in some teams I work with, we apply this agentic workflow to processing complex, tricky legal documents, or to do healthcare diagnosis assistance, or to do very complex compliance with government paperwork. And for many times I've seen this drive much better results than was ever possible. And one thing I'm going to focus on in this presentation on top of later is the rise of visual AI, where agentic workflows are letting us process image and video data. But to get back to that later, it turns out that there are benchmarks that show, seem to show agentic workflows deliver much better results. This is the Human Eval benchmark, which is a benchmark for OpenAI. that measures learning our large-ranged models'ability to solve coding puzzles like this one. And my team collected some data. Turns out that on this benchmark, and it was a ParseCade benchmark, ParseCade metric, GBD 3.5 got 48% right on this coding benchmark. GBD 4, huge improvement, you know, 67%. But the improvement from GBD 3.5 to GBD 4 is dwarfed by the improvement from GBD 3.5 the GPT-3.5 using an agentic workflow, which gets over up to about 95 percent, and GPT-4 with an agentic workflow also does much better. And so it turns out that in the way builders build agentic reasoning or agentic workflows in their applications, there are, I want to say, four major design patterns, which are reflection, two-use, planning, and multi-agent collaboration. And to demystify agentic workflows a little bit, let me quickly step through what these workflows mean. And I find that agentic workflows sometimes seem a little bit mysterious until you actually read through the code for one or two of these and go, oh, that's it. That's really cool, but oh, that's all it takes. But let me just step through for concreteness what reflection with LLMs looks like. So I might start off prompting an LLM. There's a coder agent LLM, so maybe an assistant message to your roles to be a coder and write code. So you can tell it, please write code for certain tasks, and the LLM may generate codes. And then it turns out that you can construct a prompt that takes the code that was just generated and copy paste the code back into the prompt and ask it, you know, here's some code intended for a task, examine this code and critique it, right? And it turns out if you prompt the same element this way, it may sometimes find some problems with it or make some useful suggestions how to improve the code. Then you prompt the same element with the feedback and ask it to improve the code and we come up with a new version. and maybe false shouting to use. If you can have the LLM run some unit tests and give the feedback of the unit tests back to the LLM, then that can be additional feedback to help iterate further, to further improve the code. And it turns out that this type of reflection workflow is not magic, doesn't solve all problems, but it will often take the baseline level performance and lift it to better level performance. And it turns out also with this type of workflow where we're thinking of prompting an LLM to critique as an output, uses their own criticism to improve it. This maybe also foreshadows multi-agent planning, or multi-agent workflows, where you can prompt an LLM to sometimes play the role of a coder, and sometimes prompt an LLM to play the role of a critic to review the code. So it's actually the same conversation, but we can prompt the LLM differently to sometimes work on the code, sometimes try to make helpful suggestions and distant results and improve performance. So there's a reflection design pattern. And second major design pattern is to use, in which a large language model can be prompted to generate a request for an API call to have it decide when it needs to search the web or execute code or take other tasks like issue a customer refund or send an email or pull up a calendar entry. So to use is a major design pattern that is letting large language models make function calls. And I think this is expanding what we can do with these agentic workflows. Real quick, here's a planning or reasoning design pattern in which if you were to give a fairly complex request, you know, generate an image where a girl's reading a book and so on, then an LLM, this is an example adapted from a Hogan GDP paper, an LLM can look at a picture and decide to first use an open pose model to detect the pose, and then after that generate a picture of a girl. After that, describe the image, and after that, you set the speech or TTS to generate the audio. So in planning, you have an LLM, look at a complex request and pick a sequence of actions executed in order to deliver on a complex task. And then lastly, multi-agent collaboration is that design pattern alluded to, where instead of prompting an LLM to just do one thing, you prompt the LLM to play different roles at different points in time. So it's different. agents, simulate agents interact with each other and come together to solve a task. And I know that some people may wonder, you know, if you're using one LLM, why do you need to make this one LLM play the role of multiple agents? Many teams have demonstrated significantly improved performance for a variety of tasks using this design pattern. And it turns out that if you have an LLM, sometimes specialize on different tasks, maybe one at a time have it interact. Many teams seem to really get much better results using this. I feel like maybe there's an analogy to if you're running jobs on a processor, on a CPU, why do we need multiple processors? It's all the same processor at the end of the day, but we found that having multiple threads or processes is a useful abstraction for developers to take a task and break it down into subtasks. And I think multi-agent collaboration is a bit like that too. If you have a big task. Then if you think of hiring a bunch of agents to do different pieces of the task than interact, sometimes that helps the developer build complex systems to deliver a good result. So I think with these four major agentic design patterns, agentic reasoning workflow design patterns, it gives us a huge space to play with, to build rich agents, to do things that frankly were just not possible, you know, even a year ago. And I want to, one aspect of this I'm particularly excited about is the rise of not just large language model of these agents, but large multi-modal based, large multimodal model based agents. So given an image like this, if you wanted to use a LMM large multimodal model, you could actually do zero-shot prompting, right? And that's a bit like telling it, you know, take a glance at the image and just tell me the output. And for simple image tasks, that's okay. You can actually have it, you know, look at the image and, right, give you the numbers of the runners or something. But it turns out, Just as with large language model-based agents, large multi-model-based model-based agents can do better with an iterative workflow where you can approach this problem step-by-step. So detect the faces, detect the numbers, put it together. And so with this more iterative workflow, you can actually get an agent to do some planning, testing, write code, plan tests, write code, and come up with a more complex plan as articulated as expressing code to deliver on more complex tasks. So what I'd like to do is show you a demo of some work that Dan Maloney and I and the whole Lambda AI team has been working on, on building agentic workflows for visual AI tasks. So if we switch to my laptop, let me have an image here of a soccer game or football game. And I'm going to say, let's see. counts the players in the field. Oh, and just for fun, if you're not sure how to prompt it, after uploading an image, this little light bulb here, you know, gives some suggested prompts. You may ask for this, but let me run this. So count the players on the field, right? And what this kicks off is a process that actually runs for a couple of minutes to think through how to write code in order to come up with a plan to give an accurate result for counting the number of players in the field. This is actually a little bit complex because you don't want the players in the background, just players in the field. I already ran this earlier, so let me just jump to the results. But it says the code has selected seven players on the field, and I think that's actually right. One, two, three, four, five, six, seven. And if I were to zoom in to the model output, now one, two, three, four, five, six, seven, I think that's actually right. Part of the output of this is that it has also generated code that you can run over and over. Actually generated Python code that if you want, you can run over and over on the large collection of images. And I think this is exciting because there are a lot of companies and teams that actually have a lot of visual AI data, have a lot of images. have a lot of videos kind of stored somewhere. And until now, it's been really difficult to get value out of this data. So for a lot of the, you know, small teams or large businesses with a lot of visual data, visual AI capabilities like the Vision Agent lets you take all this data previously shoved somewhere in blob storage and, you know, get real value out of this. I think this is a big transformation for AI. Here's another example. You know, this is given a video. Split is another soccer game, or football game. So given video, split the video clips five seconds, find the clip where the goal is being scored, display a frame so that you have output. So ran this already because it takes a little bit of time to run. Then this will generate code, evaluate code for a while, and this is the output. And it says true 10, 15. So it thinks there's a goal stored, you know, around here, around between the... Right. And there you go, that's the goal. And also, as instructed, you know, extracted some of the frames associated with this. So really useful for processing video data. And maybe here's one last example of the vision agent, which is, you can also ask it, write a program to split the input video into small video chunks every six seconds, describe each chunk, and then store the information in the Panda's data frame, long clip name, start and end time, return to Panda's data frame. So this is a way. to look at video data that you may have and generate metadata for this that you can then store in Snowflake or somewhere to then build other applications on top of. But just to show you the output of this. So, you know, clip name, start time, end time, and then there's actually written code here, right? Wrote code that you can then run elsewhere if you want. You put in a stream of that or something that you can then use to then write a lot of, you know, text descriptions for this. And using this capability of the Vision Agent to help write code, my team at Landing AI actually built this little demo app that... uses code from the Vision Agent. So instead of us needing to write code, the Vision Agent write the code to build this metadata and then indexes a bunch of videos. So let's see, I was actually browsing. So Skier Airborne, right? I actually ran this earlier, hope it works. So what this demo shows is we already ran the code. So take the video, split it into chunks, store the metadata, and then when I do a search for Skier Airborne, It shows the clips that have high similarity. Oh, marked here with the green has high similarity. Well, this is getting my heart rate out seeing you do that. Here's another one. Whoa, all right. And the green parts of the timeline show where the skier is airborne. Let's see. Gray wolf at night. I actually find it pretty fun. when you have a collection of video to index it and then just browse through, right? Oh, here's a gray wolf at night. And this timeline in green shows where the gray wolf at night is. And if I actually jumped to a different part of the video, there's a bunch of other stuff as well, right? There's not a gray wolf at night. So I think that's pretty cool. Let's see. Just one last example. So yeah. I've actually been on the road a lot. But if you're searching for your luggage, this black luggage, right? There. But it turns out there's actually a lot of black luggage. So if you want your luggage, let's say black luggage with rainbow strap, because there's a lot of black luggage out there. Then, you know, there. Black luggage with rainbow strap. So a lot of fun things to do. And I think... The nice thing about this is the work needed to build applications like this is lower than ever before. So let's go back to the slides. And in terms of AI opportunities, I spoke a bit about agentic workflows and how that is changing the AI stack is as follows. It turns out that In addition to this stack that I show, there's actually a new emerging agentic orchestration layer and there are little orchestration layer like lan chain that have been around for a while that are also becoming increasingly agentic through lan graph for example. And this new agentic orchestration layer is also making it easier for developers to build applications on top and I hope that landing.ai's vision agent is another contribution to this that makes it easier for you to build visual AI applications to process all this image and video data that possibly you had but that was really hard to get value out of until more recently. So before I wrap, I want to share with you what I think are maybe four of the most important AI trends. There's a lot going on in AI, it's impossible to summarize everything in one slide. If you had to make me pick what's the one most important trend, I would say it's agentic AI. But here are four of the things I think are worth paying attention to. First, turns out agentic workflows need to read a lot of text or images and generate a lot of text. So we say that generates a lot of tokens and there are exciting efforts to speed up token generation, including semiconductor work by Samanova, Cerebus, Brop, and others, and a lot of software and other types of hardware work as well. This would make agentic workflows work much better. Second trend I'm excited about. Today's large language models have started off being optimized to answer human questions and human questions. generated instructions. Things like, you know, why did Shakespeare write Macbeth? Or explain why Shakespeare wrote Macbeth? These are the types of questions that large language models are often asked to answer on the internet. But agentic workflows call for other operations, like tool use. So the fact that large language models are often now tuned explicitly to support tool use. Or just a couple weeks ago, Anthropic released a model that can support computer use. I think these exciting developments create a lot of lift. create a much higher ceiling for what we can now get agentic workflows to do with large language models that tune not just to answer human queries but to tune explicitly to fit into these iterative agentic workflows. Third, Data engineering's importance is rising, particularly with unstructured data. It turns out that a lot of the value of machine learning was the structured data, kind of tables of numbers. But with Gen AI, we're much better than ever before at processing text and images and video and maybe audio. And so the importance of data engineering is increasing in terms of how to manage your unstructured data and the metadata for that and deployment to get the unstructured data. where it needs to go to create value. So that will be a major effort for a lot of large businesses. And then lastly, I think we've all seen that the text processing revolution has already arrived. The image processing revolution is in a slightly early phase, but it is coming. And as it comes, many people, many businesses will be able to get a lot more value out of the visual data than was possible ever before. And I'm excited because I think that will significantly increase the space of applications we can build as well. So just to wrap up, this is a great time to be a builder. Gen AI is letting us experiment faster than ever. Gen Tech AI is expanding the set of things that are now possible. And there are just so many new applications that we can now build in Visual AI or not in Visual AI that just weren't possible ever before. If you're interested in checking out the Visual AI demos that I ran, please go to va.landing.ai, the exact demos that I ran. you'd better try out yourself online and get the code and run code yourself in your own applications. So with that, let me say thank you all very much. And please also join me in welcoming Elsa back onto the stage. Thank you.

Please welcome Andrew Ng. Thank you. It's such a good time to be a builder, so I'm excited to be back here at Snowflake Build.

What I'd like to do today is share with you what I think are some of AI's biggest opportunities. You may have heard me say that I think AI is the new electricity. That's because AI is a general purpose technology, like electricity.

If I ask you what is electricity good for, it's always hard to answer because it's good for so many different things. And new AI technology is creating a huge set of opportunities for us to build new applications that weren't possible before. People often ask me, hey Andrew, where are the biggest AI opportunities?

This is what I think of as the AI stack. At the lowest level is the semiconductors. And then on top of that, a lot of the cloud infra tools, including, of course, Snowflake. And then on top of that are many of the foundation models, trainers and models.

And it turns out that a lot of the media hype and excitement and social media buzz has been on these layers of the stack, kind of the new technology layers. Whenever there's a new technology like generative AI, a lot of the buzz is on these technology layers, and there's nothing wrong with that. But I think that almost by definition, there's another layer of the stack.

that has to work out even better. And that's the application layer. Because we need the applications to generate even more value and even more revenue so that, you know, to be able to afford to pay the technology providers below.

So I spend a lot of my time thinking about AI applications. And I think that's where a lot of the best opportunities will be to build new things. One of the trends that has been growing for the last couple of years, in no small part because of generative AI, is faster and faster machine learning model development.

And in particular, generative AI is letting us build things faster than ever before. Take the problem of, say, building a sentiment classifier, you know, taking text and deciding this is a positive or negative sentiment for reputation monitoring, say. Typical workflow using supervised learning might be that it will take a month to get some label data, and then, you know, train an AI model, that might take a few months. and then find a cloud service or something to deploy it on, that'll take another few months. And so for a long time, very valuable AI systems might take good AI teams six to 12 months to build.

And there's nothing wrong with that. I think many people create a very valuable AI systems this way. But with generative AI, there's certain classes of applications where you can write a prompt in days and then deploy it in, you know, again, maybe days. And what this means is there are a lot of applications that used to take me and used to take very good AI teams months to build that today you can build in maybe 10 days or so. And this opens up the opportunity to experiment with, build new prototypes and ship new AI products.

That's certainly the prototyping aspect of it. And these are some of the consequences of this trend, which is fast experimentation is becoming a more promising path to invention. Previously, if it took six months to build something, then we better study it, make sure there's user demand, have product managers really look at it, document it, and then spend all that effort to build, and then hopefully it turns out to be worthwhile.

But now, for fast-moving AI teams, I see a design pattern where you can say, you know what? It'll take us a weekend to throw together a prototype. Let's build 20 prototypes and see what sticks.

And if 18 of them don't work out, we'll just ditch them and stick with what works. So fast iteration and fast... experimentation is becoming a new path to inventing new user experiences. One interesting implication is that evaluations or evals for short are becoming a bigger bottleneck for how we build things.

So it turns out back in supervised learning world, if you're collecting 10,000 data points anyway to train a model, then you know if you needed to collect an extra 1,000 data points for testing, it was fine. Whereas the extra 10% increase in cost. But for a lot of large language model-based apps, if there's no need to have any creating data, if you made me slow down to collect a thousand test examples, boy, that seems like a huge bottleneck.

And so the new development workflow often feels as if we're building and collecting data more in parallel rather than sequentially, in which we'll build a prototype and then as it becomes more important and as robustness and reliability becomes more important, then we gradually build up that test data in parallel. But I see exciting innovations to be had still in how we build evals. And then what I'm seeing as well is The prototyping of machine learning has become much faster, but building a software application has lots of steps. Does the product work?

Does the design work? Does the software integration work? Does the plumbing work? Then after deployment, DevOps and LLops. So some of those other pieces are becoming faster, but they haven't become faster at the same rate that the machine learning modeling part has become faster.

So we take a process and one piece of it becomes much faster. What I'm seeing is prototyping. It's not really, really fast, but sometimes if you could prototype into robust, reliable production with guardrails and so on, those other steps still take some time.

But the interesting dynamic I'm seeing is the fact that the machine learning part is so fast, it's putting a lot of pressure on organizations to speed up all of those other parts as well. So that's been exciting progress for our field. And in terms of how machine learning development is speeding things up, I think the mantra... move fast and break things got a bad rep because, you know, it broke things.

I think some people interpret this to mean we shouldn't move fast, but I disagree with that. I think the better mantra is move fast and be responsible. And I'm seeing a lot of teams able to prototype quickly, evaluate and test robustly. So without shipping anything out to the wider world that could cause damage and cause meaningful harm, I'm finding smart teams able to... build really quickly and move really fast, but also do this in a very responsible way.

And I find this exhilarating that you can build things and ship things in a responsible way, much faster than ever before. Now, there's a lot going on in AI and of all the things going on in AI, in terms of technical trend, the one trend I'm most excited about is agentic AI workflows. And so if you were to ask, what's the one most important AI technology to pay attention to, I would say it's agentic AI. I think when I started saying this near the beginning of this year, it was a bit of a controversial statement. But now the word AI agents has become so widely used by technical and non-technical people.

It's become a little bit of a hype-y term. So let me just share with you how I view AI agents and why I think they're important, approaching this from a technical perspective. The way that most of us use large language models today is with what's sometimes called zero-shot prompting.

And that roughly means we would ask it to give it a prompt, write an essay, or write an output for us. And it's a bit like if we're going to a person, or in this case, going to an AI and asking it to type out an essay for us by going from the first word, writing for the first word to the last word, all in one go, without ever using backspace. It's just right from start to finish like that. And it turns out, people, you know, we don't do our best writing this way, but despite the difficulty of being forced to write this way, our large language models do, you know, not bad, pretty well. Here's what an agentic workflow is like.

To generate an essay, we ask an AI to first write an essay outline and ask it, do you need to do some web research? If so, let's download some web pages and put it into the context of the large language model. Then let's write the first draft, and then let's read the first draft and critique it, and revise the draft, and so on.

And this workflow looks more like doing some thinking or some research and then some revision, and then going back to do more thinking and more research. And by going around this loop over and over, it takes longer, but this results in a much better work output. So in some teams I work with, we apply this agentic workflow to processing complex, tricky legal documents, or to do healthcare diagnosis assistance, or to do very complex compliance with government paperwork.

And for many times I've seen this drive much better results than was ever possible. And one thing I'm going to focus on in this presentation on top of later is the rise of visual AI, where agentic workflows are letting us process image and video data. But to get back to that later, it turns out that there are benchmarks that show, seem to show agentic workflows deliver much better results. This is the Human Eval benchmark, which is a benchmark for OpenAI. that measures learning our large-ranged models'ability to solve coding puzzles like this one.

And my team collected some data. Turns out that on this benchmark, and it was a ParseCade benchmark, ParseCade metric, GBD 3.5 got 48% right on this coding benchmark. GBD 4, huge improvement, you know, 67%. But the improvement from GBD 3.5 to GBD 4 is dwarfed by the improvement from GBD 3.5 the GPT-3.5 using an agentic workflow, which gets over up to about 95 percent, and GPT-4 with an agentic workflow also does much better.

And so it turns out that in the way builders build agentic reasoning or agentic workflows in their applications, there are, I want to say, four major design patterns, which are reflection, two-use, planning, and multi-agent collaboration. And to demystify agentic workflows a little bit, let me quickly step through what these workflows mean. And I find that agentic workflows sometimes seem a little bit mysterious until you actually read through the code for one or two of these and go, oh, that's it. That's really cool, but oh, that's all it takes. But let me just step through for concreteness what reflection with LLMs looks like.

So I might start off prompting an LLM. There's a coder agent LLM, so maybe an assistant message to your roles to be a coder and write code. So you can tell it, please write code for certain tasks, and the LLM may generate codes. And then it turns out that you can construct a prompt that takes the code that was just generated and copy paste the code back into the prompt and ask it, you know, here's some code intended for a task, examine this code and critique it, right?

And it turns out if you prompt the same element this way, it may sometimes find some problems with it or make some useful suggestions how to improve the code. Then you prompt the same element with the feedback and ask it to improve the code and we come up with a new version. and maybe false shouting to use.

If you can have the LLM run some unit tests and give the feedback of the unit tests back to the LLM, then that can be additional feedback to help iterate further, to further improve the code. And it turns out that this type of reflection workflow is not magic, doesn't solve all problems, but it will often take the baseline level performance and lift it to better level performance. And it turns out also with this type of workflow where we're thinking of prompting an LLM to critique as an output, uses their own criticism to improve it.

This maybe also foreshadows multi-agent planning, or multi-agent workflows, where you can prompt an LLM to sometimes play the role of a coder, and sometimes prompt an LLM to play the role of a critic to review the code. So it's actually the same conversation, but we can prompt the LLM differently to sometimes work on the code, sometimes try to make helpful suggestions and distant results and improve performance. So there's a reflection design pattern. And second major design pattern is to use, in which a large language model can be prompted to generate a request for an API call to have it decide when it needs to search the web or execute code or take other tasks like issue a customer refund or send an email or pull up a calendar entry. So to use is a major design pattern that is letting large language models make function calls.

And I think this is expanding what we can do with these agentic workflows. Real quick, here's a planning or reasoning design pattern in which if you were to give a fairly complex request, you know, generate an image where a girl's reading a book and so on, then an LLM, this is an example adapted from a Hogan GDP paper, an LLM can look at a picture and decide to first use an open pose model to detect the pose, and then after that generate a picture of a girl. After that, describe the image, and after that, you set the speech or TTS to generate the audio.

So in planning, you have an LLM, look at a complex request and pick a sequence of actions executed in order to deliver on a complex task. And then lastly, multi-agent collaboration is that design pattern alluded to, where instead of prompting an LLM to just do one thing, you prompt the LLM to play different roles at different points in time. So it's different. agents, simulate agents interact with each other and come together to solve a task.

And I know that some people may wonder, you know, if you're using one LLM, why do you need to make this one LLM play the role of multiple agents? Many teams have demonstrated significantly improved performance for a variety of tasks using this design pattern. And it turns out that if you have an LLM, sometimes specialize on different tasks, maybe one at a time have it interact.

Many teams seem to really get much better results using this. I feel like maybe there's an analogy to if you're running jobs on a processor, on a CPU, why do we need multiple processors? It's all the same processor at the end of the day, but we found that having multiple threads or processes is a useful abstraction for developers to take a task and break it down into subtasks. And I think multi-agent collaboration is a bit like that too. If you have a big task.

Then if you think of hiring a bunch of agents to do different pieces of the task than interact, sometimes that helps the developer build complex systems to deliver a good result. So I think with these four major agentic design patterns, agentic reasoning workflow design patterns, it gives us a huge space to play with, to build rich agents, to do things that frankly were just not possible, you know, even a year ago. And I want to, one aspect of this I'm particularly excited about is the rise of not just large language model of these agents, but large multi-modal based, large multimodal model based agents. So given an image like this, if you wanted to use a LMM large multimodal model, you could actually do zero-shot prompting, right? And that's a bit like telling it, you know, take a glance at the image and just tell me the output.

And for simple image tasks, that's okay. You can actually have it, you know, look at the image and, right, give you the numbers of the runners or something. But it turns out, Just as with large language model-based agents, large multi-model-based model-based agents can do better with an iterative workflow where you can approach this problem step-by-step. So detect the faces, detect the numbers, put it together. And so with this more iterative workflow, you can actually get an agent to do some planning, testing, write code, plan tests, write code, and come up with a more complex plan as articulated as expressing code to deliver on more complex tasks.

So what I'd like to do is show you a demo of some work that Dan Maloney and I and the whole Lambda AI team has been working on, on building agentic workflows for visual AI tasks. So if we switch to my laptop, let me have an image here of a soccer game or football game. And I'm going to say, let's see. counts the players in the field.

Oh, and just for fun, if you're not sure how to prompt it, after uploading an image, this little light bulb here, you know, gives some suggested prompts. You may ask for this, but let me run this. So count the players on the field, right? And what this kicks off is a process that actually runs for a couple of minutes to think through how to write code in order to come up with a plan to give an accurate result for counting the number of players in the field. This is actually a little bit complex because you don't want the players in the background, just players in the field.

I already ran this earlier, so let me just jump to the results. But it says the code has selected seven players on the field, and I think that's actually right. One, two, three, four, five, six, seven. And if I were to zoom in to the model output, now one, two, three, four, five, six, seven, I think that's actually right.

Part of the output of this is that it has also generated code that you can run over and over. Actually generated Python code that if you want, you can run over and over on the large collection of images. And I think this is exciting because there are a lot of companies and teams that actually have a lot of visual AI data, have a lot of images.

have a lot of videos kind of stored somewhere. And until now, it's been really difficult to get value out of this data. So for a lot of the, you know, small teams or large businesses with a lot of visual data, visual AI capabilities like the Vision Agent lets you take all this data previously shoved somewhere in blob storage and, you know, get real value out of this. I think this is a big transformation for AI. Here's another example.

You know, this is given a video. Split is another soccer game, or football game. So given video, split the video clips five seconds, find the clip where the goal is being scored, display a frame so that you have output.

So ran this already because it takes a little bit of time to run. Then this will generate code, evaluate code for a while, and this is the output. And it says true 10, 15. So it thinks there's a goal stored, you know, around here, around between the... Right.

And there you go, that's the goal. And also, as instructed, you know, extracted some of the frames associated with this. So really useful for processing video data. And maybe here's one last example of the vision agent, which is, you can also ask it, write a program to split the input video into small video chunks every six seconds, describe each chunk, and then store the information in the Panda's data frame, long clip name, start and end time, return to Panda's data frame.

So this is a way. to look at video data that you may have and generate metadata for this that you can then store in Snowflake or somewhere to then build other applications on top of. But just to show you the output of this.

So, you know, clip name, start time, end time, and then there's actually written code here, right? Wrote code that you can then run elsewhere if you want. You put in a stream of that or something that you can then use to then write a lot of, you know, text descriptions for this.

And using this capability of the Vision Agent to help write code, my team at Landing AI actually built this little demo app that... uses code from the Vision Agent. So instead of us needing to write code, the Vision Agent write the code to build this metadata and then indexes a bunch of videos.

So let's see, I was actually browsing. So Skier Airborne, right? I actually ran this earlier, hope it works.

So what this demo shows is we already ran the code. So take the video, split it into chunks, store the metadata, and then when I do a search for Skier Airborne, It shows the clips that have high similarity. Oh, marked here with the green has high similarity. Well, this is getting my heart rate out seeing you do that. Here's another one.

Whoa, all right. And the green parts of the timeline show where the skier is airborne. Let's see. Gray wolf at night. I actually find it pretty fun.

when you have a collection of video to index it and then just browse through, right? Oh, here's a gray wolf at night. And this timeline in green shows where the gray wolf at night is. And if I actually jumped to a different part of the video, there's a bunch of other stuff as well, right? There's not a gray wolf at night.

So I think that's pretty cool. Let's see. Just one last example.

So yeah. I've actually been on the road a lot. But if you're searching for your luggage, this black luggage, right? There.

But it turns out there's actually a lot of black luggage. So if you want your luggage, let's say black luggage with rainbow strap, because there's a lot of black luggage out there. Then, you know, there. Black luggage with rainbow strap.

So a lot of fun things to do. And I think... The nice thing about this is the work needed to build applications like this is lower than ever before.

So let's go back to the slides. And in terms of AI opportunities, I spoke a bit about agentic workflows and how that is changing the AI stack is as follows. It turns out that In addition to this stack that I show, there's actually a new emerging agentic orchestration layer and there are little orchestration layer like lan chain that have been around for a while that are also becoming increasingly agentic through lan graph for example.

And this new agentic orchestration layer is also making it easier for developers to build applications on top and I hope that landing.ai's vision agent is another contribution to this that makes it easier for you to build visual AI applications to process all this image and video data that possibly you had but that was really hard to get value out of until more recently. So before I wrap, I want to share with you what I think are maybe four of the most important AI trends. There's a lot going on in AI, it's impossible to summarize everything in one slide. If you had to make me pick what's the one most important trend, I would say it's agentic AI. But here are four of the things I think are worth paying attention to.

First, turns out agentic workflows need to read a lot of text or images and generate a lot of text. So we say that generates a lot of tokens and there are exciting efforts to speed up token generation, including semiconductor work by Samanova, Cerebus, Brop, and others, and a lot of software and other types of hardware work as well. This would make agentic workflows work much better.

Second trend I'm excited about. Today's large language models have started off being optimized to answer human questions and human questions. generated instructions. Things like, you know, why did Shakespeare write Macbeth?

Or explain why Shakespeare wrote Macbeth? These are the types of questions that large language models are often asked to answer on the internet. But agentic workflows call for other operations, like tool use. So the fact that large language models are often now tuned explicitly to support tool use.

Or just a couple weeks ago, Anthropic released a model that can support computer use. I think these exciting developments create a lot of lift. create a much higher ceiling for what we can now get agentic workflows to do with large language models that tune not just to answer human queries but to tune explicitly to fit into these iterative agentic workflows.

Third, Data engineering's importance is rising, particularly with unstructured data. It turns out that a lot of the value of machine learning was the structured data, kind of tables of numbers. But with Gen AI, we're much better than ever before at processing text and images and video and maybe audio.

And so the importance of data engineering is increasing in terms of how to manage your unstructured data and the metadata for that and deployment to get the unstructured data. where it needs to go to create value. So that will be a major effort for a lot of large businesses.

And then lastly, I think we've all seen that the text processing revolution has already arrived. The image processing revolution is in a slightly early phase, but it is coming. And as it comes, many people, many businesses will be able to get a lot more value out of the visual data than was possible ever before. And I'm excited because I think that will significantly increase the space of applications we can build as well. So just to wrap up, this is a great time to be a builder.

Gen AI is letting us experiment faster than ever. Gen Tech AI is expanding the set of things that are now possible. And there are just so many new applications that we can now build in Visual AI or not in Visual AI that just weren't possible ever before. If you're interested in checking out the Visual AI demos that I ran, please go to va.landing.ai, the exact demos that I ran. you'd better try out yourself online and get the code and run code yourself in your own applications.

So with that, let me say thank you all very much. And please also join me in welcoming Elsa back onto the stage. Thank you.

Transcript for:AI Innovations and Trends by Andrew Ng

Transcript for:
AI Innovations and Trends by Andrew Ng