Transcript for:
Exploring the Earth Microbiome Project

About 500 years ago, Leonardo da Vinci was pondering the unknowns of the world. And he wrote this quote. He said, We know more about the movement of celestial bodies than about the soil underfoot. and I think what he was thinking about at this time, he was very observant, and this was around the time of the birth of astronomy, modern astronomy, but it was before we had microscopes. It was before we understood ecology. It was before we had biochemistry. It was, in fact, before we understood the germ theory of disease. Soil was a mysterious material. It provided sustenance, it's what we grew crops in, it provided the basis for all of civilization, but we understood very little about it. What I'm going to talk about for the next 20 minutes, actually 17 minutes, is a project called the Earth Microbiome Project. And this project is very ambitious. The idea is to systematically study the smallest life forms on Earth. to build a comprehensive database that would capture everything that we can learn about these organisms. And to put it in perspective, so probably the biggest other field in science, astronomy, I have a chart here, you can see it over there too, in one kilogram of soil, a couple of pounds of soil, there are more microbes than there are stars in our galaxy. If we look at the entire planet Earth, there's 10 to the 30th microbes. In the entire known universe, there's only about 10 to the 24th stars. There's over a million times the microbes on planet Earth. planet Earth and there are stars in the universe. This is a very, very large problem. But microbes are responsible for almost everything that's alive on Earth. They're responsible for the carbon cycle. They control climate. They created the modern atmosphere. They put enough oxygen oxygen in the atmosphere that humans eventually could evolve. They're responsible and control the cycling of all major nutrients in the planet. Now, what is a microbe? We're looking at over here on the wall. are some micrographs, microphotography, of the earliest known fossils of microbe life on Earth. These were taken from Western Australia, about 3, we think they're about 3.4 billion years old. So about 3.5 billion years ago, life was already flourishing enough on Earth that we could leave fossils. About 2 billion years ago, life evolved to form multicellular organisms, the kinds of things that eventually became people. What we want to know in this project is how many kinds of life are out there. It's microbial life. We know in terms of... Large-scale organisms, we count species in the thousands or millions, in the thousand kinds of bats, for example. But we don't know how many microbes there are. Well, we have some estimates, I'll tell you. We don't know exactly where they are. They seem to be everywhere. In fact, it's almost impossible to find some place where we don't find microbes. We don't know exactly where they come from. That is, we think they all evolved on Earth, but we don't know how they move around. We don't know where microbes come from into a particular environment. and where they go. And we really don't know what they're doing. Here's a picture. This is an electron microscope picture of a microbe. This is an organism called Seneca cistus. It's in the ocean. This kind of organism is called a cyanobacteria or a blue-green algae. It was responsible for putting the oxygen in the atmosphere. a couple billion years ago. It has about 4,000 genes, and it's very small. It's only a couple of microns long. But it has enough technology in its genome that it can make thousands of chemical compounds. It's a miniature chemical factory. In fact, all microbes are miniature chemical factories. Now, how are we going to do this project? How are we going to study the smallest life forms, the simplest life forms, the oldest life forms on Earth, the most abundant life forms on Earth? Well, for a couple of centuries, we've tried to study them. by culturing them, growing them the same way you might grow something in a garden. You'd create a sample, you'd grow them in a petri dish, some of you probably remember this from your school days, and you would create a pure culture. When you had a pure culture that looked like it was only one type of organism, you could look under the microscope. If you could describe it and name it. And for the last 200 years, we've done that and we've named about 8,000 organisms. However, we believe that the real number of species of these organisms is somewhere between a million and a billion species. Probably more like a billion species. And this process of culturing just doesn't work very well. In fact, we've cultured less than 1%. maybe less than one-tenth of 1%. Most of them do not grow on material that we can combine. So we have to find another way to do it. The strategy we have is to go out into the environment, get many, many volunteers, citizen scientists. crowdsource-type volunteers to go out and collect samples. Those samples come back to our laboratory, and we use robots to process those samples and extract the DNA. Then we sequence that DNA using the same DNA sequencer machines that are being used to sequence human genomes, that are being used to sequence animals and agricultural plants and so forth that give us the basis for modern genomics. But now we're sequencing the DNA that comes from this environmental system. sample without really knowing where it came from in terms of the organism. This produces millions and millions, billions, tens of billions, hundreds of billions, trillions of little DNA fragments. And now we have a puzzle to reassemble. We have to find a way to assemble these small fragments into larger fragments so we can understand what these organisms are and what they do. And that problem is very interesting, but it's very similar to another problem that computers have. scientists have been working on for quite a long time. Imagine that all you have access to are Twitter feeds and Facebook posts. And what you want to do, because this is basically true today, right? But what we want to do is reconstruct the news on the front page of the Wall Street Journal. Could we do that? Could we process and sift through billions and billions of small fragments of information and assemble coherent... true stories that make sense to us, that would give us some insight to what's happening. That's the same problem we have here. It's in fact almost identical. How are we going to do this? Well, this is what comes off the machines. Looks like my Twitter feed, sort of. A's and C's and T's and G's. This is raw DNA sequence that comes off a sequencer. We get trillions of digital bytes like this every week off of our machines. One screen looks just like another screen. You could stare at this all day. It wouldn't make any headway. All of your DNA looks like this. If I look at it, I couldn't tell your DNA from this DNA. So what we have to do is use computers. to index this information and to sort through it. Now, it's interesting that if we had invented DNA sequencing before we had invented computers, we would have to invent computers to actually understand the DNA. There's no way to do this without massive computers. Now, I'm fortunate because I've spent the last 25 years of my professional life working on building the world's fastest computers, and we happen to have one. at Argonne. I think some of you probably visited there today. It's called Mira. Mira is the Latin word for wonderful or thing of beauty. It's the root word to miraculous, miracle, admirable. It's a very large computer. This picture makes it look small, but it's big. It fills up this room. It has nearly a million processors in it. It has nearly a petabyte. of memory, right? Your laptop has a few gigabytes of memory. This has a million times that much memory. It has the equivalent of 40,000 terabyte disk drives in it. This is a massively large computer. It's big enough. that I can load the DNA that I've taken off these sequencing machines, even for the biggest sample, and sort through it. We take these little pieces, we index them, we work out what's overlapping. just like lining up your Twitter posts by the hashtag or by a keyword. We then build networks, and we compute through these networks, through these graphs, looking for the most likely pathway that will result in a continuous consensus sequence, reconstructing the genome. When we do that, we end up with something like this. This is a snapshot of a genome of a small organism that lives inside an aphid. An aphid is a tiny insect that some ant species treat as cattle. They go out and these aphids live on plants, the ants take care of the aphids, they milk the aphids, get the sap out of the aphids, they live on the sap. These aphids, however, have bacteria that live inside them, because if you're living off of a plant and all you're doing is sucking nectar all day, you don't get all of your multivitamins, and you need to have some help in producing the chemical compounds that the plant doesn't produce. That aphid gets that help from a bacterial symbiont. sort of a friendly bacteria that lives in it. This is the genome of that bacteria. It's about 700 genes long. Each of these little arrows is a gene. This is sort of how we gestalt it. Look at one image. Each color represents a different kind of function. We're going to zoom in on the function in the upper left-hand corner here to sort of give you some sense as to how this works. So we zoom in, and we've got eight genes here in a row. And you can see genes are not all the same length. The average gene is about a thousand of these base pairs long, a thousand letters long. When it produces a protein, it's maybe 300 amino acids long. There's eight genes here in a row, and they code for a very interesting machine. This machine is called ATP synthase. Now, you don't need to know what that means, but you might want to know that it's the smallest rotary motor that's known. And it evolved probably... at least 3 billion years ago because it's in every single genome that we see. Now what does that look like? Well, it looks like this, sort of an artist's rendering. It sits across a cell membrane, and the bottom part turns. Every time a proton goes past the membrane, it turns a little bit. The top part is where it processes the larger molecules, making energy molecules for the cell. The cell can run it in one direction to produce energy molecules using protons, or it can run it the other direction to consume energy and move protons out of the cell. It's very, very cool. And it's in everything. Your body has a thousand trillion of these in your body right now, making ATP, powering your cells. It takes 55 billion of these ATP molecules. to enable a small cell to reproduce. That's how much energy it takes. Now, that's just eight genes. But a typical microorganism has 4,000 genes, which means it's got like 500 machines of that complexity. Here's sort of a network diagram on one of our big display walls. Hundreds and hundreds of nodes in this network, each of which is encoded by one of these genes. Enormous amounts of biochemistry are in these microorganisms. They're little miniature chemical factories. We only know a small fraction of what they do, but almost every interesting chemical reaction that we need to support life or to make interesting technology, interesting materials, is encoded in these genes. Each unique species encodes 50 to 100 unique genes. If there's a billion species, we might have as many as a hundred billion unique proteins encoding a hundred billion unique functions. That's the complexity of life on Earth. It's very, very large. Now there's some other interesting questions. Soil microbes everywhere. Well, I mentioned, you know, one of the questions we're interested in is where do these microbes live? We can't seem to find any place that they don't live. In fact, there's a race right now to miniaturize DNA sequencing machines and put them on the next Mars rovers to see, in fact, if bacteria are still hanging out on Mars. One theory says that the bacteria on Earth actually originally came from Mars. We're not sure about that. So one question we ask ourselves is how do these things get around? They're very small, right? They're millions of a meter long. Well, when you're that small, everything is a transport vehicle, right? A dust grain, a single dust grain, can carry dozens of bacteria. A single dust grain can stay up in the atmosphere for days or weeks. When the wind blows across the Sahara, dust goes in the atmosphere, goes across the Atlantic Ocean, lands in North America, South America. When the wind blows across the Gobi Desert, dust goes in the air. Lands on the west coast of North America and so forth. Microbes are constantly being transported between continents. on Earth. In fact, we think that there's enough transport in the air that if you just waited in one place, maybe a few thousand years, every kind of microbe would eventually land on you. That's sort of interesting. One of the questions we want to know is, if we look In the same kind of places on different parts of the planet, do we see unique microbes or do we see the same kind of microbes? Here's a picture. If I didn't tell you where these were, the three pictures on the top you might think are all... in the same place. Turns out that there are three different rainforests separated by 10,000 kilometers on the planet. Same thing on the bottom, three different grasslands separated by 10,000. We go there and we start sequencing the soil there. We start looking at the microbes. We see very interesting things. First of all, about half of the microbes that we see are common. 10,000 miles apart, it's the same organism. But the other half are not. Now, if you think about macroscopic organisms, you know, like humans or elephants or chimpanzees, you have a range, you have a place that's endemic, that organism's endemic. It grew up there, the species is local to there, and if you find it outside that range, it would be very abnormal. What we don't know here is whether or not everything is everywhere, or at least in principle everything is everywhere. This has enormous ramifications, for example, for intellectual property. property. Let's say that I'm really interested in some novel chemical reactions that are in some microbe. Do I have to fly to the other side of the world, pay some government to let me sample their soil to sequence, or do I just go in my backyard and sequence enough, or go to your backyard? We don't know. It's still a quite open question. Back in 2010, about 25 scientists got together in Snowbird, Utah, and we put together this project to systematically attack the question of microbial life on the planet. It's catching a lot of steam. Last year, there was a meeting in China. There are hundreds and hundreds of people signing up to work on this. So far, we've collected over 60,000 samples. We probably need 100,000 samples to get maybe 1% more information. maybe as many as a million or maybe a billion samples. It's all going to have to be done with robots. It's all going to be done, I think, in the next five to ten years. At that point, microbiology will start to exceed astronomy in its complexity and interest. And I thank you. Thank you.