Advances in AI and Protein Synthesis

there's a few big news in the world of AI today and the first one is this esm3 simulating 500 million years of evolution with a language model we'll also talk about a company called etched and it's releasing what they're claiming to be the fastest AI chip of all time over half a million tokens per second much faster than anything by Nvidia or anything else if I'm doing my math right that would be the equivalent to writing all of the Harry Potter books like all of them in under 3 seconds tons of big names that are backers and supporters of this be teal Balaji Stanley dren Miller Brian Johnson's on there the don't die guy this guy the uh crazy guy that's trying not to die but we'll come back to that in a second but this is getting very interesting so we've heard about Alpha fold out of Google Deep Mind and its ability to predict these very complex 3D shapes of proteins proteins are the building blocks of life they're the little factories that make life possible as this post puts it proteins are wondrous Dynamic molecules of incredible functions from molecular engines that power motion to photosynthesis to skeletons to sensors like our eyes and ears and the information processing systems AKA our brains they're the programs that are operating the system of life and these things are programmable the ribosome takes codes of protein in the form of RNA and builds them up from scratch fabrication at the atomic scale these are little factories that are programmed to do certain specific things that make everything in life possible every cell in every organism on Earth has thousands to millions of these molecular factories by contrast the machines the tools the factories we build barely scratch the surface we are nowhere near the level complexity of these things and here's a sentence that I think kind of summarized the point of this whole thing if we could learn to read and write in the code of life it would make biology programmable we would be able to program life itself our trial and error approaches will be replaced by Logic and the painstaking experiments by simulation so right now for example how we create certain drugs is I don't want to say we throw stuff at the wall and see what stickes but that's not completely inaccurate we're triing a lot of different things there's painstaking experiments there's unforeseen side effects the process works but there's a lot of ways in which it could be improved being able to write this code to custom create proteins to understand how they work would be a GameChanger and so they're introducing themselves and their esm3 it's a frontier language model so similar to Chad GPT Gemini Etc it's a language model except this one is created to program and create with the code of Life the point of it is to engineer biology from the first principles in the same way that we would engineer structures machines microchips and software and here they talk about an example of what this thing can do so this is just an illustration but as you can imagine pretty much anything that life does or things that it doesn't even do yet may be possible if we crack this code so here they're talking about the GF P remember that acronym it's a green fluorescent protein you often see these fluorescent things pop up in various Research into DNA and editing gen DNA and stuff like that simply because it's pretty easy to see if it works or not right you look at it does it Glow if yes then you've succeeded M saying that fluorescent proteins are responsible for the glowing colors of jellyfish and they're important Tools in modern biotechnology so here they've created the esm gfp again gfp is the glowing protein right and so their esm gfp their new protein has a sequence that's only 58% similar to the closest known fluorescent protein so this new glowing protein that they found this esm gfp how long would something like this take to evolve in the wild if you just let nature and evolution take its course well they're estimating that the generation of this new fluoresent protein is equivalent to simulating over 500 million years of evolution now how were they able to create something like this well with most AI models that we see today we basically tokenize these things large language models take text turn into tokens Sora opening eyes text to video model talked about tokenizing various aspects of video production and here they've tokenized the biological properties of proteins so basically they've broken them down into chunks that is easily read and understood by these AI neural Nets and they've created one that can reason over the three of the fundamental biological properties of proteins sequence structure and function now sequence is kind of the quote unquote easy part it's the sequence of amino acid methods that spell the protein so kind of like letters in a word or words in a sentence those spell out how the protein looks then we have the structure the 3D structure this is where Alpha fold from Google deep mine made a big breakthrough they were able to predict the 3D structure of proteins based on on the sequence something that would be impossible to do with a kind of a Brute Force approach the number of possible variations of these structures is massive something like I think it's more than the amount of atoms in the universe or some insane number like that and the third thing was function so what are these sequences and structures what do they do esm 3S vocabulary Bridges sequence structure and function all within the same language model when this process is scaled across billions of proteins and billions of parameters esm3 learns to simulate Evolution and what this means is that this model can follow prompts to generate new proteins and this AI model can generate an L3 modality so here if they're talking about sequence structure and function it's not that for example it can just output the structure and and the sequence if you're trying to find a certain function so if you say we want this glowing protein yes it can do that but it also can generate in all three modalities why is this important well it gives scientists the ability to generate new proteins with an unprecedented degree of control for example the M can be prompted to combine a structure sequence and function to propose a potential scaffold for an active site of I'm guessing the this is petas an enzyme that degrades poly ethanol terap phalate I apologize if I'm mispronouncing that basically this has been kind of a long search for sort of ability to create certain life forms that are able to break down plastic waste here's kind of an example of it generating the scaffold for the pet A's active site through multimodal prompting and they're talking about emergence of capabilities with scale esm3 is ability to solve challenging protein design tasks emerges with scale for example once that task Atomic coordination tries to design a protein where the atomic positions of amino acids are distant in the sequence but close in the structure so if you had like a string where a is here and then X is here there's a bunch of letters between but these two are close well I mean that structure might looks like something like this where you have a and X close and there's some sort of a curve here so they're close together in space but divided by a long length of sequence so the model is able to achieve Atomic level accuracy in structure generation and we see this with models and this is isn't fully understood there's some debate about this but it seems like at scale certain new abilities emerge but at least it seems like they emerge rapidly maybe there's maybe there's some more gradual process but as models get bigger they develop new skills that are not at all seen in the smaller models this model can also provide feedback to itself to improve the quality of its own generations and so they're going back to the protein that they found they're calling this simil in the 500 million years of evolution and back to our green fluorescent protein discovery of which led to the award of the Nobel Prize and has become one of the most widely used tools in technology because it allows scientists to see proteins within cells right so because it's glowing it's easily able to be seen we're not going to dive too deep into the kind of scientific explanation because that might not be interesting or appropriate for some people but I will leave this link and I suggest to the people that are interested that you read this because specifically what they're talking about is the fact that you know producing these fluorescent proteins is hard even for nature so if you think about 500 million years of evolution is how long it would take to create something like that it it's not easy right this mechanism is unique there are no other ones like it and scientists have discovered many variants of this protein in nature and created variants of those natural proteins in the lab so we've done that before we find a way that nature did it and then we create kind of our own copy of it something similar to it we were able to make them even more different from nature with certain mutations that increase the brightness or change the color and more recently with machine learning techniques it's possible to extend the search to find more distant variants from what nature came up to right that differ from nature to even 20% of the sequence right so they're 20% different that's as far as we were able to get to before this something that's 80% similar to how nature how Evolution developed it but when we give our model the Asm three we give it the structure of a few residues in the core of natural gfp so we show it how nature did it right how it sort of evolved these glowing proteins and then the AI model is asked to reason in a Chain of Thought to generate candidates of this of of new glowing proteins this kind of jumped out on me the fact that it's using Chain of Thought it's the same kind of approach that we would use for Chad GPT or the other models that has been shown to improve its reasoning ability basically it's asking to think through something step by step so instead of jumping to a conclusion we break down what we have into steps in math classes this would be called showing your work right you going a step by step show your work type of thing next they're mentioning that generating one by pure chance is astronomically unlikely there are more possibilities than the number of atoms in the visible universe so the AI model comes up with some ideas right the test 96 Generations that it came up with and they found a number of proteins that flues including one that was Far From Any protein in nature now this one was 50x less bright than natural proteins of that sort they continueed that Chain of Thought starting from that sequence of the protein B8 is sort of that set of experimental proteins from the B8 and then they generated another 96 proteins right so it sounds like the AI gave them let's say 96 Generations they found one that was very promising so they said give me 96 more like it from that batch they found that there were silver proteins that have similar brightness to Natural glowing proteins including the brightest one in well C10 C10 I think this is just how they're kind of annotating their internal batches of generations and this is the one that they found and this is the one that's kind of the big deal they're calling it The esm esm is their model evolutionary scale model and gfp is that green fluorescent protein the glowing protein and this is where it gets even more interesting this will be an open model since Inception the project has committed to open science with code and model releases so basically whatever they're doing both the model weights so sort of the neural Nets the the brain that they've trained and the code that that helps that model run all the scaffolding and architecture around it it's all open source they believe that sharing research and code accelerates progress and contributes to understanding and reducing risk ultimately maximizing positive impact for the world they're releasing the weights and code for the esm3 1.4 billion open model and they're excited to see what you create now I know some of you are probably filled with Dread at that statement yeah let's see what you can create with this Godlike ability to modify life I know some of us are very excited certainly a lot of pain and suffering and ill health can be reduced and prevented with something like this potentially even one day maybe unlocking maybe if not immortality and at least some sort of a way to slow down aging to where it's not even perceptible right this idea of negligible inessence has been thrown around or you know aging so slowly that you don't even notice you basically stay at your current biological age forever can you guess where this link takes us if we click on it it goes to GitHub that's right you can download and run this thing on your local computer it's written python looks like you have to accept the non-commercial license so this is for research and nonprofit use only and of course this is the smallest and fastest model in that family so they're releasing the 1.4 billion parameter models it looks like the largest model they have is 98 billion parameters Mr Yan Lun himself gives kudos to the team behind evolutionary scale AI so he's saying an AI for proteomics so proteomics is kind of like genomics but for proteins so the ability to understand the makeup and how to program proteins so AI for proteomics a startup that just came out of stealth they're introducing their esm3 the 98 billion parameter generative llm for programming biology and the company was formed by former members of The Meta Fair protein group so looks like they have some ties to facebook/ meta and and they couldn't help but laugh at this as you know of course Yan laon really doesn't like llms and he thinks llms are kind of silly and foolish and stupid and they will not lead to AI or AGI or any sort of understanding of the world he even went as far to say that all new sort of people all the new students and kids going into college to learn about AI the best advice he could give them he said don't work on llms there's a number of people here kind of jumping in it's like okay so apparently this doesn't apply to this kind of research since this model is indeed an llm this is their website evolutionary scale.ai some of the use cases that they're kind of thinking about is for example creating proteins or imagining how to create proteins to capture carbon enzymes that break down plastic imagine various new medicines and if you're interested they have their paper you can get as a PDF and it's massive it has tons of stuff in it including a massive appendix with just a gold minus of stuff would be to dig into but I am incredibly excited to see where people take this in other news this company claims to have developed the fastest AI chip of all time the company's etched and it's coming out of stealth mode and it's claiming to be able to run for example in this case llama 70 billion parameter at over 500,000 tokens per second so that's like the entire Harry Potter collection in under 3 seconds which would let you build products that are impossible on on gpus for example nvidia's gpus they're saying one 8X Soo server so basically Soo is this new chip that they're introducing replaces 160 h100s from Nvidia how well remember as6 for Bitcoin mining for example you might remember seeing images of these uh Bitcoin mining Farms with these little rectangular things with fans attached basically it's a processor that's specialized in doing just one thing and nothing else whereas something like Nvidia can do a lot of different things it can be used as a graphic accelerator for computers it can be used for mining Bitcoin it can be used for training neural Nets running inference on neural Nets like it's very versatile as6 are basically specialized in this case for running inference on these Transformer models so all the Chad gpts the Geminis everything like that runs with the Transformer architecture by specializing they get way more performance these chips can't run run any other types of neural Nets they can't run the convolutional neural Nets or or anything else any other AI models but the things that are very popular today so the CH GPT Claud Gemini Sora all that is powered by Transformers they're saying that chips are faster and cheaper than even nvidia's Next Generation Blackwell which was recently announced so apparently their chips are more than 10 times faster than the Blackwell so these chips aren't out yet so we're not able to get our hands on them it's interesting because this is yet another competitor for NVIDIA potentially along with Gro and now this brand new Asic for the Transformer models it'll be interesting to see how successful they are but certainly it'll be interesting to see how this whole Space develops with that said my name is Wes rth and thank you for watching

Transcript for:Advances in AI and Protein Synthesis

Transcript for:
Advances in AI and Protein Synthesis