[Music] Hello everybody welcome back all President's Day but today is all virus's day every day is all viruses day and today is a very interesting topic one of my favorites reverse transcription and integration and you have to read the quote I will read it to you every slide has a quote right you probably didn't even notice but this one is so cool because you're going to see it it applies to this lecture one can't believe in impossible things says Alice I dare say you haven't had much practice said the queen why sometimes I've believed as many as six impossible things before breakfast I don't know who's read aliceon Wonderland so that's what today is about more than six impossible things it's very cool so this starts with some history 1908 a chicken leukemia virus was discovered in other words a cell-free filtrate you put it through a point2 micron filter you inject it into chickens that causes Leukemia we'll get back to that when we talk about cancer what's going on there three years later Rouse here in New York City discovers a virus that causes a solid tum a saroma for which you got the Nobel Prize 55 years later people didn't think at the time leukemia was a cancer that's why those other individuals didn't get much credit but these were eventually called tumor viruses because when you inject them into animals they cause tumors and then they were found to have RNA genomes so they are RNA tumor viruses now Howard ton was working on this uh in the what year late 50 he made the the logic that when you infect cells in culture with these RNA tumor viruses they get their morphology changes they get transformed permanently and he said that somehow this AR virus must be making a DNA copy and that gets integrated into the cell genome and he called this the provirus hypothesis and then uh David Baltimore Who had who made the Baltimore scheme you remember he came at it from a different logic he said okay RNA viruses with plus stranded genomes don't need a polymerase in the particle because they can be translated as soon as they get into cells minus strand RNA viruses have to have a polymerase in the particle so he said okay if ton is right then there must be an enzyme in the particle to convert RNA to DNA because they thought that such an enzyme did not exist exist they're actually wrong but their logic turned out to lead them in the right direction they both published papers back to back in uh the early 70s they independently discovered this enzyme in V RNA tumor viruses and it was called reverse transcriptase and the two separate papers here RNA dependent DNA polymerase and virion of Rous saroma virus that was the one discovered in 1911 and then uh Baltimore RNA dependent DNA pulas and verons of RNA tumor viruses he used a different virus uh just parenthetically at the 50th anniversary of this discovery of RT I did a podcast at Cold Spring Harbor with um David Baltimore and a few other people unfortunately Howard ton uh died some time ago and he said he he thought of this idea and in two weeks he did all the experiments he wrote up the the paper and kman had been working on this for 10 years so he felt sorry for him so he called him up say Howard I'm going to publish this and so Howard rushed to finish the experiments and they got back-to-back Publications so he said that was the right thing to do so remember that if you do some science be nice I know you need to get ahead but maybe you can call your competitor and do something so the enzyme that plays a big part in today's talk is reverse transcriptase because we used to think genetic information goes RNA DNA protein and this one obviously goes RNA DNA and then DNA RNA protein so it kind of reverses it so they called it reverse transcriptase it has completely revolutionized molecular biology in ways you can't even imagine every assay uses it whenever you have an rtpcr for anything including SARS coov V2 you first take RNA and convert the DNA with reverse transcriptase then you do a PCR reaction PCR also revolutionized molecular biology but this too and you can buy it now you can buy tubes of reverse transcripts you see it's called superscript 3 you know so if you want to go into marketing you can make up clever names like that you have to fool scientists to buy it because I guess you can't call it reverse transcriptase because then everybody would call it the same name and the lawyers don't like that so we're at the beholding of the lawyers that's the way life is so every diagnostic test but also in research we amplify genes using reverse transcriptase we do so many things with it so here on the Baltimore scheme fittingly is where the retroviruses sit they have RNA genomes of positive polarity but as you'll see today they are not translated when they infect cells they are copied to a DNA copy foring single first single stranded DNA and then double stranded DNA I can't find my pox bottle I think I left it here did you take it please return it I need those pox viruses darn I don't know what I did with it and we're going to talk about that uh step here the conversion to DNA and then the DNA of course would give rise to mRNA but the that DNA first has to integrate into the host cell it's an obligatory step for for retroviruses integration into the host cell so riy saroma virus one of those first RNA tumor viruses discovered is shown here it is an enveloped virus with glycoproteins in the envelope we call this we call these Su and TM surface and TM parts of the envelope then there is a icosahedral capsid within it which contains the viral plus stranded RNA and there are two copies of the RNA for reasons I'll explain in a moment and they're coated with a protein called nucleocapsid protein so this is very unusual for a A plus stranded RNA virus to have protein cating it but that's because uh reverse transcription is going to happen when this infects the cell and in this virus particle you can see there is reverse transcriptase there's also a molecule called Integra and a proteas and some other things as well which we'll explore today and so this at the top is the Genome of the virus as DNA so a term you're going to need to remember is provirus or proviral DNA that means the integrated DNA copy of the retroviral genome so you go from RNA to DNA and that double stred DNA when it's integrated into the cell it's called the provirus after Howard Ton's hypothesis so you can see the provirus is flanked by two sequences called LTR those are long terminal repeats they will figure prominently today all of you have thousands of ltrs in your genome that's what you're looking at here you have them all no exception so does every animal on the planet we'll talk about that later but the simplest retroviruses include three proteins gag pole envelope so the gag is the structural proteins that make up the capsid the Paul is the polymerase the reverse transcriptase and the Integra and then the envelope protein and this DNA of course is transcribed to form an mRNA that's capped and polyadenylated which is then translated to form viral proteins when these viruses infect the cell here's the overview and then we'll go into some detail virus binds to a receptor many of them fuse at the plasma membrane and dump the capsid into the cell the nucleo capsid right the icosahedral Shell with the genome in it that goes into the cytool and the genome never leaves kind of like Rio viruses The genome never leaves the capsid there's of course reverse transcriptase in there and that converts the RNA into a single double stranded circular DNA molecule and then that sorry not circular it's shown here as circular but it's actually linear sorry so the RT converts it to a double stranded linear molecule that goes into the nucleus and integrates into the host cell chromosome now we call it a provirus and from there mrnas are made which give rise to proteins uh that lead to the production of new particles we'll talk about the Assembly of these next time it gets a lecture all on its own now the enzyme reverse transcriptase here are a few properties the primer it's a primer dependent enzyme it can be DNA or RNA the template can be DNA or RNA and you say why is that well to go from RNA to single stranded DNA the template is single stranded DNA but then you have to take that single stranded DNA make it double stranded in the reverse transcript does that as well so the template can be RNA or DNA but always dnps not rntps are Incorporated deoxy nucle tide triphosphate which is the precursors of DNA are Incorporated never the precursors of AR R ntps and the enzyme is very much like all of the other enzymes that we've talked about which add add material in a 5 to3 Prime Direction and read the template in a 3 to five Prime Direction now it turns out that contrary to Baltimore and ton everything has RT on the planet bacteria ARA UK carots us our cells have reverse transcript B activity they just didn't know it but it worked for them because it turns out the virus has its own RT in the capsure does not utilize cellular RT and here's a tree of life of course showing you how the bacteria the ARA and the ukar Nots evolved from a common ancestor and so because all three domains of life have rt it must have evolved in the last ancestor of all of them simpler than having it evolve independently and in fact the philogenetic analysis comparison of RT sequences from bacteria archa and ukar say that it came from a single precursor it's very old in other words and we think it might be the bridge between the RNA world and the DNA World remember we think there was an RNA World existing before cells and somehow that gave rise to a DNA world how did that happen well a lot of things had to happen like protein synthesis but cells require DNA genomes there are no cells that we know of that have RNA genomes because they can't get long enough and so at some point an enzyme ER rose that can copy the RNA to DNA so there was probably first an RNA dependent RNA polymerase that copied those rnas and a single base change a single amino acid change would probably be enough to make it make DNA from an RNA template so it's the bridge between the old RNA world and the modern DNA world and we find reverse transcriptase and other viruses besides retroviruses Hepatitis B viruses which we'll see today a little bit and the Colo ver day yeah those are viruses that infect cauliflower and have reverse transcriptase activity but even beyond that cells have other activities that resemble reverse transcriptase so here is a a circular philogenetic tree of all the reverse transcriptase like sequences that we know of there are virus and virus-like ones there are bacterial there are eukaryotic ones bacteria and archus or they're both here here the eukaryotic and then in our cells there are our tases the ends of our chromosomes they have rt associated with them and so do line elements long intersperse nuclear elements these are highly repeated sequences in our genome that arise via reverse transcription we'll take a look at that in a moment so I this is why I love this lecture because this stuff is everywhere it impacts you in incredible ways which we'll see a little bit about today so the polymerases that we looked at earlier the four classes DNA dependent DNA DNA dependent RNA RNA dependent DNA which we're looking at today and RNA RNA they're all shown here they're all right hands with a palm as the active domain fingers and thumb domains here's the reverse transcriptase of hiv1 and it has an active domain it has two metal uh two magnesiums uh that are required for activity they're coordinated by amino acid and again you see here at the active site two aspartates are coordinating those two magnesium ions there's another activity a reverse transcriptase enzyme so it's an En it's a single enzyme it can be a monomer or a multimer as you'll see but besides reverse transcriptase activity which means copying RNA to DNA it has a second activity which is called RNA this Cleaves RNA as the name says RNA but only when it's in duplex form double stranded it can be DNA RNA or RNA RNA when it's in duplex form RNA say will chew it up if it's single stranded it will not touch it very specific and what it does is makes endonucleolytic cleavages right that means Cuts internally not through the ends if you're chewing from the ends you know what that's called exonucleases right the Endo cut inside as you can see here the red arrows are where they would cut between an oxygen and a phosphate joining any two bases and it makes short oligos with a five Prime phosphate so you can see that red cut is between the O and the p and a thre Prime hydroxy they're relatively short but you'll see the purpose is to get rid of the RNA because once we make a DNA copy we don't need it any longer here's HIV reverse transcriptase it's three-dimensional structure this is composed of two subunits we have a p66 and a P-51 and in the middle both have come together to form the active enzyme the um you can see labeled fingers and thumb and palm domain there's the active site of the enzyme and a it's shown with a DNA RNA hybrid coming out of the active site so the active site is up here in the Palm domain the RNA comes in one end it's made double stranded and it continues through this groove and then it encounters the RN domain which is going to chew up the RNA part so that only a single stranded DNA comes out so here's a cartoon of the enzyme showing you the Palm domain here on the left the active side here are the two metals so there was some confusion on the exam about the use of the term metal and for non RTS magnesium is the metal for sure but for RTS you can use magnesium or manganese doesn't work as as well with magnetes the physiologically relevant metal is magnesium but that's why this says metal they're typically divalent cat ions right so there are two metal binding sites here in the RT active site and here in the RNA H active site so the RNA is coming in It's Made double string it's made into DNA in the active site so now you have a DNA RNA hybrid and then as it passes through the RNA H active site the RNA is degraded so this is a pretty slow process proc it takes about 4 hours to copy the 9 KB Genome of retroviruses so 4 hours it's a long time and it's also error the enzyme makes one error per 10,000 to a million bases and it has no correction uh activity so those errors stay so retroviruses have high mutation frequencies which is exactly what HIV is doing it's ch it diversifies incredibly in you in a host from your first infection every few weeks it under goes antigenic change as we'll see later because of this High error rate all right so that that's some fundamentals about RT we'll look at the reaction in a minute but first a question reverse transcriptase has revolutionized molecular biology which statement about the enzyme is not correct it is unique to retroviruses the art is packaged in the particle the RT has RNA H activity the name derives from its ability to reverse the flow of genetic information and it might have bridged the RNA world and the DNA world you know when you know do you know who who coined this central dogma that DNA goes to RNA to protein do you know who said who said that no Francis Crick does that ring a bell Watson and Crick and then when they discovered reverse transcriptase a reporter went to Crick and said well what do you think about that and he said I never said that wasn't me just deny it right but what he should have said was that's the way science works you make a hypothesis and then it may be proven wrong and then he would at least not look slimy right okay look we're almost at 100% so let's see how we did I'm not going to wait 97% yeah it's not unique to retroviruses right it's found in ARA bacteria and UK carots and other viruses all the other statements are correct uh someone picked the name of the enzyme correct it comes from its ability to reverse the flow of genetic information that's actually on one of the slides so that is correct all right let's take a look at reverse trans description the process so as I said in the virus particle the retrovirus particle there are always two copies of the genome so we call it diploid not we I'm not part of that because I don't think it's such a good name but it's called diploid for that reason you know usually we associate diploid with chromosomes these are not chromosomes so but that's what you'll hear two copies of the genome and there are the two copies shown as a schematic on the left panel a five Prime end is CA three prime end is poly dentil then the coating regions for gag Paul and envelope are there um and then um these two molecules are attached by what we call a kissing Loop and the the the very five Prime end region uh is highly structured as you see here and one of these Loops SL1 can base pair with the same loop on the second molecule so that keeps them together you can also see in this structure in red the PBS now it's not Public Broadcasting Company it's primer binding site because bound to each molecule of RNA is a TRNA which is going to be the primer for reverse transcription this is a cellular molecule that is incorporated into the retroviral particle so there are two molecules of TRNA that are complementary to a specific sequence in the viral genome there about 50 to 100 molecules of RT per virus particle so I just want you to remember I'm going to show you now a scheme where the enzyme is cop a single enzyme is copying the genome but there are probably multiple on there now you may ask why are there two rnas and we think it's to make the virus resistant to mutation because this this genome has a High mutation frequency and if you just had one RNA in the particle it might be that it had lethal mutations in it but if you have two your chance of rescuing that mutation is better and we we say that it it allows copy Choice during reverse transcription to repair errors so here we have two genomes at the top green one and the bottom green one and say there's a mutation in on the left side here little a that's red there's a lethal mutation there but during reverse transcriptase transcription the enzyme actually flicks back and forth very very quickly between the two strands just randomly and so there might be a chance that it would copy the wild type A and and not have that mutation so that's the idea for having two RN in the particles so that you can get around this High mutation frequency here here's the primer binding to the viral genome so at the top are the two genomes full length and then we expand the very five Prime end in the middle there you can see the five Prime cap and there is the primer binding site and there's a TRNA bound and then below here is a more detailed view of what's going on so here is the RNA itself the viral RNA with the primer binding site here's the TRNA and on the right it shows you how the TRNA bases are complementary to that the primer binding site so this is all highly structured as you can see and the TRNA is fitting into that but that three prime end is going to be the primer for reverse transcription again that's TRNA from the cell so let's go through reverse transcription this is an amazing process in my opinion and uh you you learn a lot from it so that's why I like you to understand it so the virus remember binds the receptor gets somehow the nucleus capset is is in the cell either at the surface or maybe from within an endosome and then the RNA is converted to DNA so what happens first is we have this RNA in the particle in the cell with a S and we're going to just look at one for Simplicity there's a TRNA bound to it and the reverse transcriptase begins to make DNA using the three prime hydroxy and you can see uh there on the second panel you have blue DNA it's actually the wrong color because the RNA is plus stranded so the DNA should be light blue because plus plus DNA is dark blue I see this every year but the book only gets revised every six years and so these figures are all from the textbook okay uh so that you make that little bit of DNA and then you stop because that's the end look it's the end of the genome five Prime n right there we can't go any further so this DNA is is made and at the same time the RNA is chewed up that's what all these little blocks mean it means that the RNA is chewed up and those pieces come off so you end up with a single piece of DNA now what happens so the Pumas is going to jump from here to here from the very five Prime end of the template to the three prime end the reason it can do so is because there's complimentarity so at the either end of the RNA there are some repeated sequences we have u3 r u little u3 and little r and then at the other hand you have little r and little u5 now when you copy them you get Big R and big u r that signifies it's the complementary base all right so now the Big R can base pair with the little r and you form a duplex and so the enzyme simply keeps copying around that RNA so it's jumping from one strand to another so we call that a template jump because it goes from the five Prime end to the RNA to the thre Prime end why this happens you'll see in a bit once we get through this so here's that step the r Big R little r a kneeling that's the first template exchange the enzyme continues to copy the viral RNA you can see that in the second panel and in the third panel it goes around now what happens also is that there's a piece of RNA Left Behind RNA H doesn't cut it probably structured in some way it's called PPT polypurine tract and that's going to be the primer for the second strand of DNA so before this first strand is even done the enzyme starts to make the second strand of DNA obviously it's got to be a different enzyme can't be the same one because that enzyme is working down here right so now we have a little bit of the other strand and almost the complete copy of the first Strand and eventually the enzyme copies all the way around the RNA and then it can't go in any further the other strand primed by PPT goes to the TRNA the TRNA is removed tically endonucleolytically by RN so that's no longer needed and now there is complementarity between that light blue and the dark blue strands because both have a PBS a primer binding site right which is what the TRNA was bound to so the first strand copied the PBS from the RNA and the second strand is copying it from the TRNA right so here you see the first strand copies the PBS from the green that's the RNA and the the next strand the complimentary strand is copying it from the TRNA remember that it copies it once from The genome once from the TRNA if you if you can remember that write it down raken Yello said to remember this okay it will come back to please you all right so now we have the two strands and KN here and as you can guess then the polymerase is going to use that that complimentary strand to finish off the duplex so here we have where we just left off we have the analing and that strand is the second strand is completed so now we have a fully double stranded copy of the viral RNA at the bottom there and we have made at each end an LTR the RNA doesn't have an LTR in it but now we've made one I think I show you that on the next slide so the RNA an LTR consists of the repeated sequences big u3 Big R big u5 big u3 big r big u5 so that's a long terminal repeat it's at the end it's long and it's repeated it's the same sequence but look at the RNA doesn't have all three segments it's got u3 and R or R and u5 so the jumping the two templ itic changes what they do is make an LTR if you go back and look at it you will see that because as you copy from here you get u5r and then u3 as DNA you've now made one LTR and you do the same thing at the other end so that's why we need to have these jumps to make an LTR because when you do transcription the LTR is lost because the transcription initiates within the LTR anyway if you want to look at an animation that I made that kind of puts this in an animated fashion that may help you visualize it okay second question which of the following steps occur during reverse transcription of retroviral genomic RNA priming of minus DNA synthesis by TRNA two template exchanges degradation of the viral RNA by RNA generation of two ltrs all of the above okay how did we do here is this 100 this could be 100 [Music] number two keep score we'll see how many you get so so here is what we ended up with a double stranded DNA with ltrs at either end now that has to integrate into the host cell DNA and that's what we're going to talk about now because what happens is that DNA integrates in host DNA which is shown purple here and then you end up with a provirus which is the viral DNA which you can see in blue the the ltrs are in green and then on either side is host DNA and from there the DNA is transcribed by host cell Paul 2 RNA dependent DNA dependent RNA polymerase 2 and you get a message capped and poly dentil this fulllength transcript is also going to be the viral genome that's what's packaged into particles but it also under goes some splicing as we'll see later but what happens here during integration two things that are Hallmarks of inte integration by the retroviral enzyme integrates first you lose a couple of bases at the end of the LTR so here you see we have aatg now we've lost two bases upon integration also two bases at the other end and we have duplicated host DNA sequence at either end that's shown in the uh the red tinted areas so you can look at a sequence in a eukariotic genome if you see these features you can pretty be much be sure that it's an integrated retroviral genome it's a provirus so let's see how that happens so in general the overview is we've made remember the capsid nuclear capsid enters the cytoplasm it is permeabilized and you need to permeabilize it so the entps can get in otherwise if you have a a solid capsid P will not get into that and then reverse transcriptase makes a DNA copy which double strained DNA copy which you see there there's also a molecule of integrates bound to the DNA and that's shown here it is a tetramer so we have 1 two 3 four subunits there now this this complex enters the nucleus it's relatively small it's not drawn to it's not drawn to scale here so it's relatively small with respect to the nuclear pore can get into the nucleus and then it integrates into chromatin so here's our DNA chromatinized wrapped around nucleosomes and we think there are some interactions between integrace and some cell proteins here's one called BRD bromo domain containing protein which itself binds a cated lysines which are on the histones and that somehow directs the integrace to the chromosome to the chromatin for integration so again this tan molecule here is integrates and it's a Teter four copies of the same enzyme and then uh this is a a description of the integration reaction so here is integrace in tan again a tetramer you have retroviral DNA with its intact ends so far and then Target DNA also binds the integrates that's our cellular DNA in purple the integrat removes two bases from the ends of each viral DNA so you need to do that in order to expose uh the the hydroxy which then attacks the cellular DNA and cuts it and is liated to it all driven by the Integra so now you have the ends of the viral DNA joined to host DNA so the host DNA is also nicked and liated to the viral DNA uh then um this molecule it's the same one in these two pictures we've removed the Integra is repaired because now we have some gaps here and that's repaired by host proteins and that's why you get a repeat at each end because initially when you when you chop each single strand of the host DNA you now have this single strand here and a single strand on the other end and those have to be repaired so they're filled in those gaps are filled in by repair proteins and that's why you have a duplication of host DNA at either end now you have an integrated viral DNA or a provirus on this table on the left here is a quantification of integration sites in cells uh for different viruses so it's it's shown as percent integration and here's these are three different retroviruses aslv MLV and HIV and if you integrated randomly 26% of the time you know taking into account the size of the viral genome and the size of the chromosome if you integrated it randomly you would expect 26% integration into genes and 5% into transcription start sites but you see these viruses don't approach the random number they're all higher HIV is is very high within genes and there and two of these are also higher than than random at trans description start site so it is not random the integration but it can happen everywhere on every chromosome but it happens to be in specific sites in genes or at transcription starch sites so here's a summary of what we have we made a double stranded DNA it's produced by two from two rnas so from two rnas you make one double stranded DNA so you're going again from two to one you make a strong wrong promoter in the LTR we're going to see that actually did I point it out to you so the the promoter for mRNA synthesis is within the LTR and the Terminator is in the other LTR and you can see that transcription initiates about halfway through the LTR and it ends halfway through the other LTR so that's why the RNA doesn't have the whole LTR because it's not transcribed that's why it has to be rebuilt during reverse transcription so you have a strong promoter the LTR built during reverse transcript it's a very important fact that the ltrs have a strong they both have a strong promoter and a strong Terminator right so for example if this integr if this Pro virus integrates next to some crucial Gene it could turn it on with that strong Promoter on the right and that happens when we use retroviruses for gene therapy ini initially we cause cancer because they were integrating next to an enene talk about that later the proviral DNA then is transcribed to make viral mrnas they go out into the cytoplasm and are translated to make proteins and then some of those mrnas end up in new virus particles because they're the genome so if you think about this you know we've talked about viral RNA replication by RNA polymerases we've talked about viral DNA replication by either cell or viral DNA polymerases where you take one genome and you copy it many times by an enzyme it's not happening here right there's really no DNA replication happening well when you divide when the cell divides the provirus will divide so it will go to the next cell but you're not making a lot of copies of DNA and there's no viral RNA replication there's transcription so it's a very unique replication cycle in that it's all dependent on Cell processes not viral enzymes to make more of its genome with the exception of RT but it's not really um replication so that that's all summarized here the viral RNA is copied to DNA in the cytool it comes in the nucleus it integrates into the host cell transcribed to make mrnas which then go on to make new particles now this step of integrating into the host DNA has enormous consequences all of you have tens of thousand 11% of your genome is integrated proviruses every one of you I I guarantee every one of you and you're all very similar in that that sense me too okay I'm not making an exception for you we all have integrated retroviral dnas and it has very big consequences that of which are just beginning to be sorted out so the then the reason is because once that provirus is there it's never going to go it never leads and so if it's an epithelial cell you know eventually epithelial cells die so it's gone CD4 positive T lymphocytes which are endogenized by which are where the HIV um genome integrates those eventually die as well although if they become memory cells they could live forever but if the genome integrates into germ cells then you pass it on to your Offspring and when you integrate into the germ line we call those endogenous retroviruses so the endogenous is a special term that means the viral DNA is now part of the germ line so you pass it on to your Offspring and as a consequence the genomes of every living thing are full of proviruses from previous infections not every virus infects The Germ line HIV for example does not infect germ cells cells it only infects CD4 positive lymphocytes it's good thing it doesn't endogenize it could be even worse of a problem um can you imagine if you have HIV and you have kids then they all have HIV as well without any risk factors so fortunately HIV does not integrate but many viruses do and we have them in me and so what happens is many many hundreds of thousands of years ago you know one person may be endogenized by a retroviral infection and then over thousands of years that person has Offspring so that's the red person here and there more and more of those as well and sometimes these proviruses make infectious virus but typically if they're integrated into an organism and they don't have a function mutations arise because they're not needed and that inactivates the virus producing part of the provirus and so many of these have become fixed in the human population they're in all of us every human on the planet has certain kinds of endogenous retroviruses sometimes they're lost as well so they're gained and lost and as far as I know you know we haven't had any new retroviral infections in Homo sapiens for many many many years probably many tens of thousands of years um that we know of now we sequence human genomes and we look at them we don't see anything except the ones that everyone has now this whole process of endogenization which is Illustrated here where you have a Founder who gets the pro virus and passes it on to offspring obviously this happen this takes many many years to propagate right tens of thousands of years that process is happening in real time in koalas right now you can study it in Australia so the koalas uh are being endogenized by a retrovirus that they got from rodents gualas are native to Australia and some thousands of years years ago a retrovirus from a rodent infected a koala became endogenized and now the that koala passed it on to their offspring and they passed it on and on and on and we can see this happening in this figure these are P charts of different koala populations in different parts of Australia so you can see uh the um the populations in the north are all endogenized with this virus you can see 250 out of 250 animals that were sample all endogenized but then as you move further south less and less so the implication is the virus is somehow moving north to south perhaps by movement of of koala and then of course there's some Island populations like Snake Island French island Philip Island kangaroo uh where it's it's slower because I don't know if koalas are good swimmers right maybe they float on a piece of wood and but eventually but it's getting there and some of these islands since I made this picture have since been 100% endogenized now this virus imuno supresses the koalas and it makes them get cidial infections so chlamidia bacterial infections that are problematic and it's causing a loss in the population of koalas so people are thinking about immunizing the the koalas against cidial infection if if you look in zoo so the only koalas elsewhere are in zoos and there are some populations that are free of this U Core V it's koala retrovirus and those are bred preferentially obviously because those can stay uh retrovirus but it's an example of endogenization in real time now we also the these endogenous retroviruses are part of a family of of elements that we call retro elements they're sequences that move around the Genome of the host by reverse transcription so here's an example on the right here you have one of these retro elements it's shown here in red it's transcribed from our DNA to make mRNA then it's reverse transcribed because our cells have rt in them mostly from retro elements and that copies the RNA into DNA which then integrates somewhere else so you can have the called transposons retrotransposons because they move around the genome by reverse transcription so some of these are endogenous retroviruses but others are not endogenous retroviruses they're elements that just move around the genome by reverse transcription so they're retro elements and we think retroelements predated retroviruses they were in The genome long before retroviruses they don't have a envelope Gene and as soon as they acquired an envelope then they became r retroviruses so probably retroviruses originated from cells that had retro elements and 42% of our DNA encodes mobile elements including these retro elements 42% so here are the Retro elements in the human genome 42% of the genome about two million copies of these elements and they are divided into non-ltrs they don't have ltrs different classes lines and signs and then LTR containing which include the endogenous retroviruses and others like retrotransposons they're not derived from a retrovirus infection they've always been in our genome I want you to remember that so the retroviruses came from retrotransposons and the retrotransposons have been there for eons and when they acquired a envelope they became a retrovirus so here are the different classes on the left we have endogenous retroviruses those originated by an infection with a retrovirus so we have LTR the retroviral genome LTR duplicated sequence at either end so these were from an infection so our heres these are abbreviated herbes endogenous retroviruses they don't produce infectious particles they make particles if you do take some cells and look by em you can see retroviral particles but they're not infectious because they've all been mutated so that they're no longer infectious and we have retr transposons where we have again ltrs gag pole but no envelope so they can't make infectious particles they can make a particle because gag itself will make the nuclear capsid but they will not make particles that can go from cell to cell and infect another I don't know if any of you are interested in neuros science but there's a protein called Arc in the CNS which is a retr transpose on it makes a particle and it's thought to do neurotransmission via these particles from synapse to synapse so it's it's it's really interesting it's amazing so those are retrotranspose then we have lines which are the non-ltrs now there's no ltrs in them but um there is duplicated host DNA so it's some kind of integration process uh these contain reverse transcriptase encoded so they move by reverse transcript days but then there are two other element signs short interspersed nuclear elements and processed pseudogenes these are just mrnas which from the previous slide here have been copied the DNA and integrated so they don't have ltrs they're made from an mRNA copied by reverse transcriptase and integrates from some other element is integrating them and that's why you have the the duplicated sequence at either end but they do not encode um reverse transcripts so so again the retrotransposons are the ancestors of retroviruses and less than 05% of these uh elements are active in our genome which means they're being transcribed so full of these things it's quite amazing there is an amazing story about one of the genes in these endogenous retroviruses that's used by humans on a daily basis and that is a protein called sens uh sens is an essential pro protein for making the outer layer of cells in the placenta it's formed by a single layer of fused cells these are all CIA right fused cells with many nuclei and one long tube of cytoplasm and that makes the Maternal Fetal barrier right it's there's fetal blood there there's maternal blood and the cot tropho blasts are an essential part of that barrier the reason they fuse is because they have a protein called sens which is derived from the envelope protein of an endogenous retrovirus so remember when retroviruses bind to cells they have an envelope protein that binds a receptor and then the membrane of the virus fuses with that of the cell it's catalyzed by the glycoprotein it's a fusion protein so the gene encoding that it's called envelope has been taken from the endogenous retrovirus and put somewhere else in the genome and there it gave rise to the placenta this happened millions of years ago it happened three different times in Maman Evolution mamals have placentas right but there are some older mammals that do not have placentas like marsupial and before them capture of sensation three times has led to the develop of a placenta in all of these species so you can see sens two here gave rise to placentas in many of all of these species I just noticed this old word monkeys they use Ancient English and then another integration gave VAR to Gibbons or angut Tang's gorilla chimps and humans so it's a it's a gene taken from an endogenous retrovi so this probably happens a lot and this is the best studied example so this this is why we give live birth as humans because we have a placenta the baby can develop in utero for months and then come out quite welldeveloped as opposed to laying an egg right can you without viruses you'd all be laying eggs just think of it and not not only that they would be um they would be white eggs as you know chickens can have different colored eggs and if you like blue eggs you can find them at markets you don't find them at Whole Foods or Trader Joe's but you can find them in markets they're blue because of a retrovirus that's integrated next to a pigment Gene that makes the eggs red uh green blue sorry so a retrovirus makes chicken egg shells blue so without viruses humans would be laying eggs and they'd be white God you don't think that's funny at all nothing there's there are also other uses for these uh retro elements um so hervk is one of these endogenous retroviruses in all of us that infected our ancestors 200,000 years ago so here is a kind of map of of human development here's Homo sapiens here and this is 200,000 years ago and so here are some of our ancestors so you know about 200,000 years ago our ancestors were infected with Herve K human endogenous retrovirus K and that is now in all of us we all have herf K's in us they don't make infectious virus but during human embryogenesis hervk mrnas are made and they persist through uh the epiblast stage so we have oite zygotes 248 cell morulas and then they stop after the blast toist stage and these uh rnas these endogenous ret our RNA seem to induce immune responses in the developing embryo so they may be part of antiviral defense system this is just one example of many different uses for endogenous retroviruses that people are discovering so I you know I put these lectures on YouTube and I get emails every year is it okay to eat the blue eggs or am I going to get a retrovirus infection all right the last question which of the following statements about retro elements is not correct there are many copies in in eukaryotic genomes they are currently entering the koala G germ line those in the human genome produce infectious viruses they can be beneficial none of the above what do we have here most of you got uh the right answer those in the human genome produce infectious viruses is not correct they're all inactivated um and and I think probably one person wrote all none of the above so that's the one that is incorrect they're all inactivated so I want to talk just briefly about hipad viruses these are DNA viruses that encode reverse transcript so the retroviruses have RNA genomes right they're RNA viruses but heatin viruses have DNA genomes but they encode reverse transcript so what's going on there let's have a look so these are envelope viruses big cause of hepatitis and liver cancer globally very very serious you know if you're working in healthc care you should absolutely be vaccinated against hepb most kids now are vaccinated shortly after birth because it's preventable infection the virus is enveloped with glycoproteins in the envelope and inside is an icosahedral capsid that contains partially double stranded DNA remember this is group seven of Baltimore it's got a big gap of single stranded DNA it's got a piece of RNA it's got a protein attached to it we're going to learn today why it looks like this and I'll give you a hint the protein is reverse transcriptase it's stuck there because it didn't finish its job basically it didn't finish making uh the second strand so these viruses infect cells by binding two receptors the Caps the fusion occurs at the plasma membrane the capsid which contains the Gap DNA gets in the cell docks onto the nuclear pore the DNA then ends up in the nucleus so the nuclear Machinery repairs it because it has all these features that are not liked by the cell nucleus so the DNA is made double Stranded the protein is taken off in the AR is taken off now it is a closed circular covalently closed circular DNA because it was originally not coant closed right it was circular but it had a gap now it's fully double stranded coant closed circular and then it becomes chromatinized by the host cell it's coated with histones and then it can be transcribed to make mrnas that encode for proteins to make new virus particles and so what I want to point out out here you don't need to worry about all of this procedure but early on the G the MRNA comes into the cytop Sol and makes structural proteins that give rise to a new capsid and so initially a capsid is made containing the MRNA for the whole genome there's a single mRNA that covers the whole genome that is encapsidated into this icosahedral shell then reverse transcription begins because is a molecule of RT attached to the RNA begins to reverse transcribe right in the cell before leaving so the retroviruses are different they reverse transcribe when they get into a new cell this one can't wait starts the RT before it leaves but because the capset is closed it runs out of ntps dnps so it leaves a partially double stranded molecule that's why it's like that because it's so excited to get going can't wait to the next cell if it were in the next cell it would come apart and the ntps could get in and make a double stranded molecule but it apparently doesn't matter because the cell fixes it on the next cycle so that's the key here RT commences in the cytoplasm that's why you have a gap in the DNA the RNA as you'll see is a leftover primer kind of like with the retroviruses and the protein is the reverse transcriptase so this reverse transcription part of the reproduction cycle of this virus only happens in the host cell so here's a diagram of the reverse transcription process I'm not going to go into it as in as much detail as we did for for RNA tumor viruses but I just want to show you what the product is at the end so top left and there are some elements that are very similar as you will see so here's the viral RNA that is what's packaged in the newly assembled particles in the cytoplasm of an infected cell so we have a cap we have some DRS those are direct repeat sequences you can see both at the three prime n we have a polya at the three prime n and then here there's a stem Loop that is bound to a molecule of reverse transcriptase it's actually bound to a TP a terminal protein and there's RT in tan behind it the RT when this RNA is packaged into the capsid in the cell begins copying this stem Loop and then it jumps to the three prime end of the RNA so there's a template switch very much like happens in retro viruses and then it the RT continues to make the DNA on that Strand and eventually you make an entire strand of of DNA and the reverse transcriptase remain remains stuck on one end and this little piece of RNA near the cap that remains and acts as a primer for the second strand but the the RNA primer actually jumps from dr1 to the other strand and begins reverse transcription in the dark blue and then it continues around using the the three prime end is a template and that's about as far as it gets so there a couple of template jumps here as you can see there's one at the very beginning and then there's one here and a second one here so this is where we are we have a full length minus strand from the five Prime n to the thre Prime n and then the second strand has begun using this RNA primer that was not removed by RN right everything else is removed by RN except this kind of like the PPT and that primed the second strand but it ran out of triphosphates about right here and that's what's incorporated into the particle partially double stranded a lot of single stranded here and there's a molecule of RT attached and the primer is still there and that primer actually has a five Prime cap which we haven't showed anywhere so that funny genome structure is a consequence of reverse transcription happening before the virus leaves the cell where it's a closed capsid and it runs out out of triphosphates but as I said it doesn't matter because this is all repaired in the next cell so here's an overview of of what we've talked a bit about today and a little more so we talked a lot about retroviruses where the viruses have an RNA genome so yellow boxes are what's in the virus particle in other words what's the viral genome so for retroviruses the genome is RNA most retroviruses as you'll see there's always an exception there there are some retroviruses called Foy retroviruses uh which package DNA in their particles for the similar reason as the Hepatitis B viruses do they package RNA initially and then it is reverse transcribed in the particle before it leaves the cell so some retroviruses one class called fomy retroviruses package DNA then we have the Hepatitis B viruses which have an a DNA genome although they they initially package RNA it's reverse transcribed right in the particle to make that DNA and then we also have Colo viruses these are reverse transcriptase en coding viruses that infect cauliflower so probably next time you eat cauliflower you're eating some can't really avoid them but don't worry it's just going to pass through just passing through nothing to see and these viruses have a DNA genome although uh they make an RNA in the cell that is reverse transcribed you can see there's a TRNA primer hybridizing to it it's reverse transcribed uh in the particle and that ends up as as DNA with some um Brak so it has Brakes in it as you can see it has pieces of RNA attached so this also has to be repaired when it gets into uh the next cell and so that's repaired in the cell and goes on so obviously there's some evolutionary things going on here where reverse transcript and coding viruses can have either RNA or DNA in their particle and either one is evolutionarily successful now the retroviruses of course there are a couple that infect humans one of which is HIV 1 and two and we'll talk about that uh later on because those obviously are are serious infections we're going to start ending the first half of this course two more lectures until we get to the second half and we're going to talk about how to put virus particles [Music] together