do you remember where you were on the 30th of november 2020 probably not i'd imagine but that date sticks in my mind because i got a text from a friend and colleague geneticist adam rutherford it read have you seen the news the protein folding problem exclamation mark wow i thought they've only gone and done it [Music] in season one of this podcast i told the story of how the ai company deepmind was using artificial intelligence to take on a 50 year old grand challenge in biology now only a year later deepmind had unveiled an ai system called alpha fold that virtually solved the problem of predicting the 3d shapes of proteins the scientific world was full of excitement a headline in the journal nature declared this will change everything the president of the uk's royal society called it a stunning advance that had come decades before many in the field would ever have predicted and forbes dubbed it the most important achievement in ai ever [Music] i'm professor hannah fry i'm a mathematician and i have been following the developments in artificial intelligence for as long as i can remember in season two of deepmind the podcast i've gone back inside the lab to speak to countless researchers scientists and engineers about their work and i must say i've seen some pretty extraordinary things from robots that have taught themselves to walk around looks like it's a drunk robot to ais that can predict the weather it is important to know if there's going to be a buildup of a catastrophic storm and generate human sounding voices but it seemed only right to kick off the first episode with what is probably deepmind's biggest breakthrough so far coming up we are gonna hear the full story of alpha fault from the people who were in the room where it happened i remember sitting on the plane on the way home writing out well what are all the ideas that might actually work and just sitting there going this is amazing and this is terrifying we'll find out how alpha fold could rapidly accelerate the drug discovery process we got hundreds and hundreds of emails from excited biologists wanting to try out their system and paved the way for new discoveries i think it's woken up the scientific world to the possibility of what ai could do this is episode one a breakthrough unfolds one of those scientists who is paying particular attention to the alpha fold news is john mcgeehan professor of structural biology at portsmouth university the first time i saw the story about alpha fold i thought that can't be true because it's about two decades too early so i immediately started checking the details of this to make sure i was understanding it correctly i literally emailed deepmind and said we could really use this technology to see how much of a game changer john and the scientific world thinks the alpha fold breakthrough might be i want you to come with me on a trip to the bottom of the ocean every year somewhere between 8 to 12 million metric tons of plastic waste gets dumped into the oceans and john has made it his scientific mission to do something about it [Music] some of that plastic is used once sometimes just for minutes in the case of maybe a fizzy drinks bottle and if that enters the ocean some plastics last hundreds of years in that sort of environment what tends to happen is through sunlight and waves they break down but only into smaller and smaller pieces and they end up being microplastics wildlife consumed this plastic it ends up in our food chain and i think we're only just starting to understand the potential impact there might just be a twinkling ray of hope on the horizon in march 2016 a japanese research group announced in the journal science that they discovered a species of bacteria that could break the molecular bonds of one of the world's most used plastics polyethylene terephthalate also known as p-e-t or polyester p-e-t is found in products from fizzy drink bottles to textiles john's collaborator in the u.s greg beckham instantly spotted the potential of this discovery and gave him a call and he said this japanese group has literally found a bacterium living in a plastic waste dump that is digesting plastic it's actually using the plastic as a food source we had to figure out how these enzymes worked and if we could make them faster because the potential here is if you can break down plastic into its building blocks then suddenly you can recycle it infinitely we'll find out exactly how john mcginn and his team are using alpha fold to help fight plastic pollution a bit later on but for now because alpha fold predicts the 3d structures of proteins it's helpful to find out a bit more about how proteins work in the human body we'll make it lots of fun i promise because proteins are truly fascinating things proteins are these incredibly small machines that are extraordinarily unusual for the way that we think of machines but they do the kind of mechanical and chemical things that the cell needs to let me introduce you to john jumper who leads the alpha fold project the human cell has about 20 000 of these nano machines and people make entire careers explaining the function of one of these proteins proteins are the workforce behind practically everything that happens within the human body you'll find them in our immune system antibodies are a type of protein we release that can fight bacteria and viruses the protein insulin is what allows our bodies to absorb glucose proteins even work to shepherd other molecules around the body as in the case of one particularly famous protein hemoglobin is a protein whose job it is to carry oxygen within your red blood cells your lungs are extracting that from the air but then it needs to be carried throughout the body and so it is attached to this hemoglobin protein which later releases it where it's needed and yet these powerful naturally occurring machines are made from only a small list of building blocks all the machines that we have in kind of the normal natural world are really really complex and the proteins they have this lovely kind of modular system where there's 20 types of amino acids and a protein in its simplest form is just a line of these amino acids these 20 amino acids tiny molecules made up of a handful of atoms are the building blocks of all human proteins to make a protein these amino acids are strung together in a sort of miniature cellular factory called the ribosome how about an analogy like one of those bracelet kits my daughter has one right and she wants to spell her name sarah so she grabs the s and she puts it on the line she grabs the a she puts it on the line she'll surely grab a heart and put it on the line in the same sort of way you have beads in a line or these amino acids that are attached one at a time and when it's done the protein leaves the ribosome within milliseconds the forces of physics gently tease the line of amino acids into intricate 3d shapes which give function to the protein and allows them to work as miniature machines you can think of it like origami a blank piece of paper might hold all manner of potential but it only becomes meaningful once it's folded into shape the 1d shape the line isn't so good at telling us how these machines work how we can build new ones they really need to assemble into this intricate 3d shape that we can start to reason about seeing a structure is really the beginning of explaining why the protein is that way how it misfunctions how we might be able to develop drugs but it's extraordinarily difficult to get this kind of information [Music] most drugs work by binding to a particular protein in a specific way beta blockers stop the protein receptors from noticing the adrenaline floating past antibiotics work by interfering with a bacteria's protein this is partly why understanding the shape of a folded protein is so important so what do folded proteins look like well these things are tiny it's not like you could just look at them down a microscope to see what shape they are at that scale they might as well be invisible to work out the structure you normally have to use something called x-ray crystallography first of all you have to express your protein so you need to coax a bacterium into producing enough of it for your experiments here's catherine tunis awanagan a senior research scientist then you have to be able to get it out of the cell in sufficient quantities then you need to get it to form crystals which is often the most difficult part and when you say the most difficult part yeah are we talking weeks it's very variable so if you happen to be working with a protein that's really well studied it can be a very quick process if it's something that's never been crystallized before it can take years so i've had emails from people who said you know we've worked on this for 10 years we can't even make enough of the stuff to start our experiments that's an extreme case but it can be as bad as that [Music] for the last 50 years numerous scientists have dedicated their entire working lives to painstakingly uncovering the intricate structure of proteins one at a time slowly and surely building a collective body of knowledge but finding a shortcut getting from the string of amino acids straight to the folded protein has long been considered one of the grandest challenges in biology [Music] at some point in those intervening decades in a pub in cambridge denis asabus the man who would go on to co-found deep mind first heard about the protein folding problem as an undergrad one of my friends was obsessed with this problem and we used to go to the pub and almost any situation he would crowbar in the idea of like if we solve protein folding that would unlock all the biology back in those days no one had quite managed to get computers to reliably predict how a general protein might fold you could do it for tiny proteins where the physics equations were tractable but any larger and there were too many variables but that didn't put demis off from wanting to have a go there was this game called foldit people were playing a game to fold proteins and making some progress they were not trained biologists they were gamers they were able to find two or three protein structures that were actually published in nature they were actually quite big breakthroughs so then finally when we did alphago i thought well if we're able to mimic the intruder process of a go player then why couldn't we do what these folded players were doing for folding proteins in 2016 before the team had even caught the flight home after alphago's victory demis was already eyeing up the next big challenge for ai to work on protein folding i'm sure we can do that his conversation with alphago's lead researcher david silva was captured by a film crew that followed them down a street in seoul where the match had taken place it's just amazing seeing how quickly a problem that is seen as being impossible and changed to being continuously done we consult protein folding there were good reasons for thinking that an ai approach might work it's a very interesting problem in the sense that it has a very large search space push me kohli leads deepmind's science team the space of possible structures is just infinite and yet it has definite patterns hiding within one of the things that makes this problem great is the process is extremely repeatable and specific the sequence makes the structure but it's also so complex that we don't know how to write it down specifically but we know how to write down a machine that if you show it enough data we'll learn these something else that made protein folding suitable for ai was the decades of work by experimental biologists to find the structures of proteins one by one it meant an ai could peak at the back of the textbook to learn how the patterns worked and much like in the game of go protein faulting came with a ready-made way to measure how well the algorithm was performing for nearly three decades there has been a regular global competition amongst biologists who use computers to fold proteins it's called the critical assessment of protein structure prediction or casp the way it works is you take a set of 100 proteins where you're challenged to say i've got the sequence what does the structure look like professor john molt co-founded the casp competition in 1994 and to all work on those hundred in parallel as a community and now you can compare results the accuracy of a protein fold prediction in casp is measured using the global distance test or gdt you take a visual sketch of the true folded protein and then superimpose the calculated competition entries on top of it the more amino acids that end up roughly where they should be the higher the score it's on a scale of 0 to 100 where 100 would be exact agreement with the experimental structure so if you get to something like 90 then you're arguing with the experimentalists whether your model is better than his structure deepmind set up a new team with the task of building an ai model that could predict how a protein will fold from its amino acid sequence it was led by john jumper how far away did you think success was at the very beginning i wouldn't say that we were sure of success there was always this feeling that it's complex but there's no rule that it can't be understood but i would say it was absolutely unclear that we were going to be able to build it i have always been one of the skeptics on the team i sort of expected maybe 15 years from now we would have a system like this and also i think in biology there is quite a bit of skepticism often about computational methods we think of experiment as being the last word and it can be an uphill struggle to convince a biologist that a computational method is really going to help them the first version of alpha fold is the one we talked about in season one of this podcast the team repurposed an ai technique normally used for image analysis so that it could work out which beads or amino acids should be positioned close to each other in the final protein then the model crunched through the different structures that could possibly fit it seemed to work pretty well although it was slow and it took a lot of computational power in april 2018 deepmind entered alphafold into the cast competition for the first time before the results were released one member of team alpha fold tried to predict how well they had done he had run the numbers and by his results we were 20th it was a one of the more depressing moments of my career finding out that after all this effort we weren't even close to the top we had a meeting where we very seriously saying okay are we going to solve this problem or should we be doing something else that we can solve well give up was that really on the table that was absolutely i would say as a scientist my job is to find a place to be useful you're always saying do i want to spend my time in this direction or do i want to go in a different direction but let's not get ahead of ourselves here there was just a small issue with that prediction that alpha fold had come in 20th place at casp we found out that we just computed it wrong we just done the math wrong at least you did the math right when it was important on the actual algorithm we get the right answer eventually in actual fact alpha fold came top in casp that year it was a strong result but it was still very far from giving the experimental methods a run for their money as well as being painfully slow the first version of alpha fold achieved 58.9 as its average gdt score remember you need to be getting over 90 to have a structure that would be considered true to the real folded protein after the casp win deepmind decided to redouble their efforts we tried to improve the caste system and it was giving us some minimal performance improvements but we were nowhere close to making the jump that we wanted and then there was a breakthrough alpha fold 2 started by throwing away everything that had come before the data fed in was almost entirely unchanged but the repurposed image recognition system was gone replaced by an ai that had been redesigned from the ground up just to understand protein folding [Music] but how much better was this new and improved version of alpha fold really in 2020 the alpha fault team prepared to enter the next casp competition keeping a close eye on its score gdt has gotten better i think we're making incredibly fast progress and i think we'll continue to i think it's an important within our meetings we went through this tradition where we would play a song from the year that corresponded to our gdt and so if we were at gdt80 we would play a song from 1980 or something like that and everyone's just like when we get to the spice girls we know it's okay absolutely and i don't think we ever did get to the spice girls either because that would be 97 or so what did you get to 1990 mc hammer something like that we would have these running sweepstakes predicting what would be the accuracy of alpha fold who won the sweepstake i am not going to tell you who won the sweepstakes or what was my prediction but did you go live at some points i did follow whether the type of accuracy we are seeing today was possible was a completely open question we would have these meetings with demis and everyone would think well demis is going to push our target even higher and they were not really sure if that was possible [Music] i just felt that all the ingredients meant there should be a solution to this and i also believe the original biologists who said 50 years ago that this should be possible in may 2020 the new casp competition kicked off online nearly a hundred groups around the world submitted their predictions for 90 protein targets once team alpha fold had submitted their predictions they faced an agonizing wait for the results to be verified but after a month or so i got this email from john malt the founder of casp and it had a title i think it was zoom request you're not going to turn down that zoom are you no and so we dialed in to a pre-existing team social and john just said i'm going to read you an email it is from john malt as i expect you know your group has performed amazingly well in casp 14 both relative to other groups and an absolute model accuracy it didn't tell us our exact score but it did tell us the method seemed to be an outstanding one how did the rest of the social go did you get to crack out the champagne fantastically well yes i was drinking some bourbon and we were swapping lots of reminiscences and stories oh and the final scoring casp an average of 92.4 across all targets here's kasp's founder john molt to be able to fold up single proteins to this atomic accuracy is a really amazing and satisfying thing as we originally framed the problem this is a solution but in many ways it's just a start after cast 14 we got hundreds and hundreds of emails from excited biologists wanting to try out their system and use it for particular problems we did pick out a few cases where we thought alpha fold was really going to make an impact one of those early users of alpha fault was john mcgeehan whose work on plastic pollution we heard about at the beginning of this episode to understand how alpha fold is helping john and his colleagues it's important to appreciate that those plastic eating enzymes that could help clean up our oceans are in fact proteins they're non-living proteins that break chemical bonds and you know if you just had lunch or dinner they'll be working in your stomach at the moment digesting all those different carbohydrates into sugars the enzymes we're interested in work in the same way but they actually start breaking down plastics [Music] when that group of japanese scientists discovered an enzyme that could digest pet plastic john and his team set about trying to understand how it worked on a molecular level currently we dig up oil and gas and we distill from those the chemical building blocks for plastics and we stick those together in long chains what the enzyme does is do the opposite of that it comes along like a big pair of molecular scissors and cuts those bonds and then gives you back those building blocks we know there is at least one enzyme that can break down one specific type of plastic by looking through the database of sequenced proteins the scientists found a bunch of others that look like they might work in a similar way for other plastics but to make any progress they really needed to see how they are folded and structured in three dimensions and understand how they might work to eat chains of plastic polymers solving all these structures takes an awful long time we've got 100 enzymes sitting there all needing structures and each structure takes maybe six months sometimes even a year depending how tricky it is 100 enzymes any of which could have gigantic potential for our planet sat in a queue waiting for months or years to have their structure worked out without knowing what they look like there is little hope of designing synthetic versions that could work better and faster than the one discovered in japan and this is where alpha fold comes in john mcgeehan sent deepmind seven enzymes of interest two of which his team had already solved the structure for rather spectacularly in a couple of days they came back and the first thing i did was to compare the structures that we already knew to what they'd predicted and they're almost perfect actually not only did it get the three-dimensional fold right it got all the positions of where all those key atoms are in the structure correct as well alpha fold is now in the process of resolving 100 enzymes that we've chosen but it needn't stop there we could look at thousands of enzymes and that's incredibly exciting because this will massively accelerate the development of enzymes for different plastic recycling beyond tackling plastic pollution the alpha fold team were well aware that there were other scientists like those working in drug discovery who would find this new tool useful another organization that was an early beneficiary of alpha fold is the drugs for neglected diseases initiative an international non-profit dedicated to developing new treatments for neglected tropical diseases these are a group of preventable infectious diseases that really affect 1.7 billion people dr monique wasuna is the dndi's regional director based in nairobi over the years she has treated many people suffering from parasitic diseases but one patient stands out in her mind from her time as a junior doctor in kenya this particular middle aged man had been unwell for many months and he was just really getting weaker and weaker so he had about our team that was extensioned many many miles away and he took a walk day and night for five good days the patient had been infected with leash moniesis a parasitic disease found mainly in eastern africa parts of the americas and the middle east symptoms start within a couple of weeks of being bitten by a female sand fly you start having a fever your guns might start swelling your liver your spleen and in the end if you're not treated you die the most deadly form visceral leachmaniasis poses a risk to 600 million people worldwide and causes an estimated 20 to 40 000 deaths a year these patients survived but how many patients like him will die and nobody knows that they have died of flesh manages it is possible to treat leash menialisis but you need a long painful course of drugs over 17 days and that can come with severe side effects they are toxic they damage your liver your kidneys we have to keep the faith because we haven't gotten the ultimate treatment typically neglected tropical diseases don't get on the radar for pharmaceutical companies because there's no market incentive for them charles mobrey leads dndi's drug discovery effort from geneva the needs of these patients are huge they deserve better treatments the way people living in wealthy countries benefit from the advances in modern medicine charles and his colleagues are looking for monique's ultimate treatment a drug that can kill leash mania parasites can be easily administered in the field and does not cause serious side effects it's an expensive and time-consuming process so dndi have had to rely on a more pragmatic way of finding new drug treatments for neglected diseases can we find drug molecules that will kill a parasite in a test tube that initial starting point requires that we screen hundreds of thousands of molecules to find just one that has some of the right properties in short it's a bit of a stab in the dark there is also no guarantee that a molecule that can kill a parasite in a test tube will work inside a human so instead in recent decades scientists have opted for a different approach rather than testing for drug molecules that kill a bug we can think about carefully designing molecules that interfere very specifically with a process in that bug but don't interfere with any related processes in the human imagine you'd worked out that there was one particular protein that was vital to the parasite if you knew the three-dimensional structure of that protein you could explore how different drug molecules might disable it the parasites protein is like an intricately designed three-dimensional lock covered in cavities and holes the drug molecules need to act like a key filling in the spaces and blocking the protein from functioning understanding how best to design that key is a lot easier if you can just see everything on a computer screen rather than have to infer it from laborious experiments you can see where the cavities are and you could bring a molecule up and you could see how does it fit into that cavity and this is where alpha fold comes in particularly useful rather than waiting months or years for an experimental method to come up with a protein structure that may or may not be important in a disease alpha fold can offer a shortcut [Music] here's catherine tanya sawanakan again so it basically changes the situation from not using structural information waiting several years for the experiments to be finished and now we have this middle way of well i can get some actionable structural information within 10-15 minutes oh wow i mean that's massive yeah it's a huge saving in the case of leash moniesis scientists had already identified a drug molecule that is effective at killing the parasite in a test tube unfortunately they reached a dead end when it came to adapting this molecule into a usable drug treatment so they worked backwards first they tried to work out what part of the parasite this drug molecule appeared to be targeting and in the process they were able to pinpoint a novel protein that seems critical to stopping the leash mania parasite in its tracks so if that protein is shut down the parasite can't continue it will die it's not like other proteins that we've looked at as targets for leash monasis so that's exciting the other exciting thing about this particular protein is that it looks like it's not unique to leash mania parasites similar proteins are found in the parasites that cause other infectious diseases you can see how powerful this quickly becomes how the structure of proteins can potentially open the door to drugs that are effective for multiple diseases we could be cutting years off the timeline this is the power of knowing about the structure and if using alpha fold predictions we can get to that starting point quicker it will help us pick the right projects and move them more quickly the hope is that if alpha fold can do this for charles it will do the same for scientists working on all kinds of important problems around the world and so in july 2021 deepmind took the step of publicly releasing the structures of all 20 000 proteins found in humans as well as several other organisms commonly used in labs around the world everything from worms and fruit flies to rats and tuberculosis in time the team aims to fold and release all 100 million of the proteins ever discovered for free for anyone to use to accelerate the work of scientists but before this major release a lot of careful thinking went into reviewing the risks of doing so what if for example alpha folds insights into protein structure were used by a bad actor to design a rogue protein say we carried out a full review of this because of something we we're concerned about sasha brown from the ethics team was involved in the process of sharing alpha fold with the world we were trying to understand will the release of alpha fold help the creation of new bio weapons or increase the potency of current known ones and the view we got from experts was that the likelihood of protein folds being used in this manner was small because non-specialists face several barriers to enhancing and synthesizing harmful proteins and they're more likely to use readily available materials in other words if someone is intent on committing an act of bioterrorism there are currently much easier ways to do it than getting to grips with the biochemistry of protein folding it wouldn't remove any bottlenecks to using bioweapons at the level that alpha fold is currently while it's hoped that the alpha fold database release will now accelerate the work of scientists around the world there is another important point worthy of mention here alpha fold doesn't provide a perfect dictionary that can translate between protein sequences and folded shape it's only offering a prediction the results of cass put it very well here's john jumper again of the predictions we made about two-thirds were at a quality bar they considered competitive with experiment so that means one-third they're pretty sure that we have deficiencies although typically small ones and so it's very important to us how do we make sure we're doing more good than harm there are many reasons why alpha folds prediction might be uncertain some proteins have never been seen before others might have strange disordered shapes and so on so the alpha fault team have built in a way for the model to put a number on its confidence in its prediction of a given protein shape alpha fold predicts the score it would have got for that protein had it been part of the original caste competition here's pushmi kohli to explain it's very good at that uncertainty estimation it's not shy in saying i don't know which is i think a good thing install the humility package essentially when alpha fold solved one problem it opened up a whole new world of questions and possibilities for science it's very important to note that it does not solve protein understanding writ large in terms of making sense of how do proteins operate so the community has appreciated the breakthrough that has been made but also at the same time there is a lot more work that needs to be done in many ways alpha fold is just the beginning of demonstrating how ai can be used to advance scientific discovery and in fact it isn't just proteins that deepmind is working on researchers are also successfully applying machine learning to other areas of science including genomics chemistry nuclear fusion mathematics and ecology in episode 6 of this podcast we'll be exploring some of those projects in more detail trust me you won't want to miss it but deep mind say they have a bigger mission of solving intelligence so i want to end this episode by asking how alpha fold fits in with that does alpha fold have any of the hallmarks of intelligence i mean what is intelligence oh that's a whole kind of worms yeah exactly and as a former worm researcher i know worms right so this is very much a can of worms here we're talking about intelligence in terms of being able to accomplish a specific task that humans can't do so it's something that is a superhuman ability it is very good at learning from the existing structures that we have but i wouldn't call that artificial intelligence necessarily because that term generally refers to something that is more of a generalist whereas i see elf fold as a specialist [Music] i think it's really important to remember that these are really powerful techniques that we've developed that are still far short of a real artificial intelligence that you can talk about thinking and making decisions and everything else [Music] if you think our fold isn't intelligent how does it fit in with that broader goal of deep mind towards solving intelligence this is an example of how we work on these really hard problems that stretch us and how do we develop new ideas and techniques and then think about which of those will pull back into the core mission which of those take us one step closer to general intelligence over the next four episodes of this podcast we'll be taking an in-depth look at that core mission building artificial general intelligence or agi starting with the big push to give machines the ability to communicate with humans me what's my worst feature model you're going to get upset if i tell you [Music] deepmind the podcast was presented by me hannah fry it was written by me and dan hardoon who is the series producer if you've enjoyed this episode please subscribe to the podcast and leave us a rating or review if you can bye for now