Transcript for:
DNA Barcoding Lecture Insights

hi good afternoon i'm dr john james wilson i'm the curator of vertebrate zoology at world museum part of national museums liverpool and like most people i'm currently working from home but this is picture here is where i'm normally based at world museum and my office and the collection is down here in the basement of the building and as i said i manage the use and the care of the vertebrology collection which totals around 80 specimens of birds mammals fish reptiles and amphibians our bird collection is particularly important being the second largest in the uk and being rich in specimens that are over 150 years old including a large number of type specimens so obviously i'm very interested in taxonomy and biodiversity and the message really is that there are so many species so little time they're estimated to be at least 10 to 15 million animal species on our planet of which around only one and a half million have been scientifically described by taxonomists after several years of training typical taxonomists can perhaps become an expert on around 1 000 species which means we need at least 15 000 active taxonomists and with their experience evenly distributed across the animal kingdom to have an effective identification capacity for animal species and that's talking on a global scale assuming that everyone works together as yeah it's a called the taxonomic impediment there's a real problem uh in terms of identification of species is the lack of taxonomic expertise so i'm here to talk today about dna barcoding and the story goes that canadian professor paul hebert was walking around his local supermarket thinking about the moths in his back garden when an idea struck him and the supermarket cashier can accurately identify thousands of consumer products and their prices through the universal product codes the barcodes that are present on each item and so paul was thinking can we use barcodes to identify those moths in his back garden and in particular can we use dna like barcodes so here's paul this is what he really looks like and he was actually my phd supervisor at the university of guelph in ontario paul he wasn't the first to think about using dna for species identification but was the one who popularized the term dna barcodes and dna barcoding and brought it to the public's attention so the idea is basically that just a species show small consistent differences in their morphology that enable us to tell them apart as you can see here from these gra graciela butterflies there's small consistent differences in the marking on the underside of the wing and just as they differ in these ways they also differ in their dna sequences so let's think about how we might use dna as species barcodes as you might remember from from school dna is a double helix made up of chains of paired nucleotide bases these are adenine cytosine guanine and thymine so now you might be thinking how much dna is the air in the cell it's got to be a lot right and wouldn't it actually be making the problem more complex to use dna for species identification well yes there is a lot of dna in a cell and so for example the human genome is made up of 46 chromosomes which include around 20 to 25 000 genes in about 3.2 billion bases so the idea of barcoding is that we can focus on just one of these genes and that gene must have their required properties of a barcode in that it is identical the sequence of that gene is identical or very similar in all members of the species but is very different uh relatively very different between individuals of different species so that's what we need from a dna barcode uh based on previous findings paul knew that mitochondria would be a good place to search for a gene that could function as a dna barcode and the mitochondria is an organelle within the cell is known as the powerhouse of the cell and this organelle has its own small amount of dna due to its evolutionary history it's got a small genome of its own there's several mitochondria within each cell and within the mitochondria yeah mitochondrial dna itself is also found in several copies within each mitochondria and it's relatively simple structure being a short circular piece of dna with around 37 genes only so we've already simplified the problem quite significantly from 20 25 000 genotype to 37 genes within the mitochondrial dna itself it was decided that the cytochrome c oxidase one gene known as co1 co1 gene would be the best candidate for a dna barcode and in fact this is the dna barcode region that we use for animals and the dna barcode itself actually is just half of about half of the co1 gene that's approximately 658 base pairs only so there's the dna barcode and location within the mitochon circular mitochondrial dna yep so in effect uh dna barcode is just 658 base pairs of mitochondrial dna and it can be shown simply as a string of letters and with only four possible combinations a c g or t and this here is a representation of the dna barcode of the very toxic venomous brown widow spider which is actually spreading around the world so how can we use these letters to identify species how does it work so what happens is that you compare the dna bar code from your unknown specimen say you've got the leg of an unknown insect a dna bar code for this and you compare it against a reference library of dna barcodes from specimens we've confirmed species names and you look for a match so you can compare the similarity between this unknown mark of this query with each of the reference library barcodes get a percentage similarity and you're looking for a match so if the unknown barcode matches with the barcode in the reference library you can be confident that your unknown unknown specimen is a member of that species so there you go that's a 100 percent match to this particular stringent mark so now we know how it works question is how do we get those letters in the first place and as any of you who have joined the dna barcoding workshop will know and we get that through lots and lots of pipetting these are the basic steps on the way to producing a dna barcode and i'm going to go through each of these briefly in turn so the first step is dna extraction which involves cell lysis or cell digestion and then purification of the dna cell lysis is breaking open the cell and its contents as you can see from the animation here releasing the cell including the dna into a solution then purification involves binding of that dna so that all the other cell components all the parts you don't need are washed away and the dna is isolated using a filter and then the dna is released from that filter into a pure solution of dna purified dna so that's what the there this diagram shows at the bottom here the purification process using a filter and there are commercially produced kits available for this dna extraction step and this is the one that we're currently using at the museum so next we need to isolate and copy the co1 gene fragment that is our dna barcode and for this we use pcr a polymerase chain reaction pcr involves pipetting the purified dna primers nucleotides and the polymerase enzyme into a microtube and then cycling this tube and its contents through a set of different temperatures in a pcr machine which is also known as a thermocycle so yeah you pipet these into a tube and then put it into the thermocycler with cycle through a range of different temperatures so in the first step in pcr the double helix of dna separates into single strands these primers which are specifically designed to target the dna sequence that we're interested in our co1 dna barcode and the primers will attach to that specific region of dna and then the nucleotide sorry the polymerase enzymes will join onto the strands and add their nucleotides to the target sequence effectively creating a new copy of that sequence and then this is repeated again so in the next cycle more copies of the target dna are created do the primer binding and the polymerase adding those nucleotides copying the sequence so this is repeated 30 to 40 times 30 to 40 cycles so millions of copies of their target dna are created again the commercial kits are available for pcr so these contain all of the ingredients that are needed except for the purified dna and the specific primers and this is the particular one that we've been using at the museum for our dna barcoding the next step is dna sequencing where our target dna is red okay so we use a sanger sequencing generally for our dna barcoding that we're doing at the museum in sanger sequencing the pcr copies are terminated at different lengths with fluorescent nucleotides so this results in every base pair in the target sequence being tagged with its corresponding color color corresponding to which particular nucleotide base ac grt is at that position in the sequence and this is then uh passed down a capillary where there's a camera that captures the fluorescence so as the camera as the sequence passes down the camera is capturing the fluorescence of that particular point in the sequence and it will result in a chromatogramma graph indicating the different nucleotide base at each position so we'll get a chromatogram 600 to 700 base pairs long like this a graph different peaks of color indicating the different nucleotide bases in each position so we don't have a dna sequencer at the museum and we send our pcr products to the facility at the university of sheffield for sanger sequencing the chromatograms are sent back to us by email and then a computer program can be used to convert those peaks the graph into letters basically into the nucleotides so the obviously here the black color is representing g the green color a the red color t and so on so all the letters can be assigned to each of those different peaks to get the nucleotide sequence in the form of letters like what we need for our dna bar coding always with to check that the computer is making the correct calls for each of those different peaks and this step is known as sequence editing so it simply involves checking the a calls and making a judgment call yourself based on the height and and the cleanness of each of the chromatogram peaks in case the computer is making any errors but this is a very straightforward step usually so after the sequence editing process we'll end up with our 650 or so letters and can proceed to identify the sequence and our unknown specimen so as i explained earlier this can be done through making comparisons with the reference library and in reality this is an automated process so we'll just simply paste uh 650 letters into the barcode of life data systems uh identification which is available freely online we'll just paste our sequence into there and bold we'll compare our sequence with the nearly four million sequences that are present in the library and then suggesting identification so for example this sequence here has a hundred percent match to several sequences of the species of geometry morph right so some of you might have been playing paying really close attention and you'll have noticed that bold currently has about 4 million sequences but i told you earlier there are around 10 to 15 million species of animals on earth so obviously not every species is yet included in bold and it may be that you don't get a close match to anything in the library it could be a result like this in this case fault will give the sequence a barcode index number known as a bin and these numbers can function as the same way as traditional species names in that stable persistent and interoperable as an identifier and you can use this until a name can be assigned to that barcode in that specimen so that might be when the specimen is morphologically identified it could be described as new if it's a new species or it could as new sequences are added to bold it could be matched with named sequences later then your your sequence will eventually get a taxonomic name okay so that covers the laboratory and online steps required to do dna barcoding and these are the standard methods and as you can imagine they're constantly being refined and evolved and developed to become more practical and one major push has been moving this work out of the library and into the field so where exactly is needed so you can do pipetting uh dna barcoding in the field or in the forest it's this picture hit shows companies are developing portable laboratory equipment to take with you into the field such as this vento lab which is about a laptop sized mini lab which contains a small pcr machine centrifuge and all the other things you need for dna barcoding in the field and there's even portable dna sequences being developed so it's important to know these are using very different technology compared to the sanger sequencing that i've explained earlier so some very significant developments going on in moving dna barcoding from the lab in directly into the field so that's all well and good but you might be wondering about dna barcoding in practice so i'm very briefly going to introduce a few projects uh dna barcoding projects that i've been involved with in malaysia so there's a been a lot of interest recently in using environmental dna for species monitoring and this is where trace dna present in the environment can be used to detect a species that's present at that environment without a need to capture or even see it okay so we've done a pilot study and using environmental dna to detect this species any ideas what this is well it's actually a southern river terrapin okay so this southern river terrapin as it's swimming along by all organisms it will be shedding dna into its environment through feces and pieces of skin and other secretions so there's dna present from this organism in its environment which in this case is the river and if we want to know if that species are present we can take a sample of the river water extract dna from the water and do pcr that's specifically for our terrapin dna barcode and sequence that terrapins dna barcode to determine its presence in the river these kind of methods are especially useful for species like the southern river terrapin which is very rare and elusive very hard to see visually monitor in the river and so we tested this method using a short dna barcode actually 110 base pairs of mitochondrial dna we tested it in vitro and taking water from the turtle hatchery where certainly turtles present and we also tested it in situ in the kamehameha river and we managed to detect terrapin dna from this river water so another similar but slightly different idea is using invertebrate derived dnai dna also known as and using this approach we can detect mammal dna from the guts of low flies so mammals also are very hard to detect visually to find in the rain forest and the flies which of course are relying on this from mammals and for their nutrition they're feeding on mammal carcasses wounds or feces they're much better at finding mammals in the forest than we are so we discovered that mammal dna will persist in blow fly guts for about 72 hours and the flies travel about one kilometer a day which can help us pinpoint the locality of those mammal species and we designed pcr target that will preferentially copy the amount of dna rather than the flies dna which of course aids with the dna sequencing we tested our flight trapping and sequencing protocol uh against the common methods by observing surveying mammals in forests so this includes camera traps and miss nets and hot traps for bats cage traps and hair traps and we found more mammal species with the flies than with these other methods so it's quite a simple procedure you can set these baited traps with some rotting mammal flesh in here to attract the flies they fly in and get trapped in this area and then we can take the flies and extract the gut content sequence of gut contents and detect multiple dna so then we can determine which mammals are present within that forest area another area of advanced dna barcoding is known as dna metabolic coding this is where large mixed species samples such as those collected by malaise traps are sequenced simultaneously using high throughput sequencing and we use this method in peninsula malaysia to compare seasonal species turnover in different forest types such as oil plantation as shown here which of course are expanding in southeast asia and replacing more natural forest types for dna metabolic coding it means that this large collection of insects such as you get from a malaise trap don't have to be individually sorted the dna can be extracted in bulk and you can do the pcr in book just based on that bulk mixed species dna extraction and then you can sequence this pcr product using next generation technology such as the illumina mysq and we should theoretically produce sequences barcodes for all this species in the samples which can then be sorted bioinformatically rather than physically and we also use the similar approach to sequence another type of complex samples but guano so this is a famous site in malaysia called the batu caves and there's a colony of bats roosting in this cave here so this bat guano from these bats it contains dna from many species of plants we did a plant barcoding project to see which species of plants these bats are feeding on and also which they're potentially pollinating is interesting then because if you know the distribution of the plants you know how far the bats are traveling in order to feed out of their roosting colonies in this cave there's another interesting project but let's move a little closer to home and see how you could use dna barcoding in your recording activities so i think it is fair to say that the uk is slightly behind other equivalent developed nations in terms of building up its dna barcode library and local recorders and enthusiasts can be the drivers of dna barcoding dna barcode library building here in the uk together with the tinipro project world museum has started dna running dna barcoding workshops where you could use dna barcodes to confirm the presence of new or rare species within your area so a good example there was this blood bee a new record for cheshire identified by chloe at our first dna barcoding workshop and i'm sure there are many examples where barcoding could be very helpful to you to confirm species so the nbn atlas is a record of uh species occurrences in the uk which i'm sure you're all contributing to through i record and the dormant tree of life project which is a very ambitious project serious sequence for the species of multicellular life found in the uk they're using this nbn atlas to direct their sequencing of certain taxes to direct which taxes they should be sequencing as part of this project to build up the library of dna sequences for the species in the uk so if you're contributing to i record you're also indirectly contributing to this project which is known as the darwin tree of life projects run by the welcome trust and the natural history museum okay so now you know how you can get involved and hopefully barcoding is not no longer an alien concept is clear how simple actually and how effective barcoding can be and i hope not you're full of ideas of what kinds of specimens you'd like to dna barcode yourself happy to take any questions and also just before i finish i'd like to acknowledge of course there's lots of other people have been involved in the projects that i've briefly covered in this talk uh including of course my co-authors in my publications which you can find through my researchgate profile and where of course the funding sources for those projects are also acknowledged so yeah i'm happy to take any questions and of course if you prefer you can always get in touch with me through twitter or through email thank you very much thank you john that was really interesting um so if anyone does have a question you can either put your hand up virtually or you can ask a question in the chat um we've got one question from mike ashworth um does the material have to be fresh or will barcoding work on insect specimens which have been dried for days months or years yeah good question so obviously fresh material is better but it can work with material that's been dried for a hundred years and you can get dna backwards from that kind of material as well depends on how they've been preserved and yeah but it's possible okay um does anybody else have a question everyone seems quite um quiet today maybe it's because we're in the evening uh glenn you got your hand up um do you want to meet yourself and ask a question yeah just a quick question john um what are the problems of cross contamination of some of these dna sequences how do you prevent it occurring yeah it can occur the idea would be that uh if you're taking say for example a leg from an insect in order to sequence that and maybe you've touched another insect previously and then you've touched that lace that the kind of contamination you're thinking about yeah so there should be much more of the dna of the of the specimen that you want to sequence in that dna extract there may be a small amount of dna from from another species but the pcr will pick up uh the dna that's most abundant so if your target is has more dna there that's what we'll get pci so it's just the biggest percentage of the dna yeah it could be that if the contaminant might show up as there was chromatograms that i showed you you might see a faint uh bit of background noise that could represent the species that cause the contamination yeah so it's better to try to avoid that in the lab in terms of taking yeah taking precautions to present unless the cross contamination is a species that's not been recorded before [Music] yeah okay thanks very much john okay thank you um we've got another question in the chat um what particular primers do you use yeah so there's good primers that have been developed for insects and so they've been around since i think about the 90s actually yes and they've been found to be very effective most animals in a way but there may be some species that they don't work for so you might need to modify those primers slightly to be able to target different groups of species okay um i've got a question from diana is it possible to get estimates of abundance from the meta barcode in studies a good question and there's been a lot of work looking into that because of course it would be very useful to have that kind of information on abundances i think uh lately some of the papers that i've been reading they have shown that they can estimate abundances from metabolic coding data but it requires kind of a more complicated uh experimental design in order to achieve that in things like putting in known amounts of different species to be able to calibrate it things like that need to be done in order to get reliable abundance information from metabolic reading data yeah okay thank you here we have another question from viet techie sorry if i'm pronouncing your name wrong there but um she says in terms of storage of specimens samples to be used for barcoding have you found any differences in using specimens in ethanol versus other preservatives in terms of the success of amplification yeah so for insects it seems to be that dried specimens actually are the best source of dna and for the types of animals that i've been working on ethanol is better at the museum we have a lot of specimens stored in ims or denatured ethanol it does seem to have negative effects in terms of the dna quality so maybe the dna is degraded not so easy to amplify with pcr and but that's just anecdotal i think there are some studies that have looked into that which is the best way to preserve them but a lot of my projects have been working with very fresh materials such as valets samples that you know just weeks old and so there haven't been any issues with that but yeah now i'm looking more into the historical material at the museum it does seem to be quite an issue the way that we've been studying different types of fluids have been started yeah yeah um we have a question from jeff garnis he says do you primarily slash exclusively use c-o-e-i for metabar coding of insects what do you do about coi pseudogenes in the nuclear genome and is there anything that can be done for groups where c coi doesn't resolve quite a few questions in one so some people have been using 16s which is another mitochondrial gene for metabolic coding and the disadvantage with that is that several species may share the same 16s sequence whereas c01 tends to be that each species has a distinctive sequence so it actually functions better as a barcode but 16s has other advantages that maybe it's easier to amplify kind of things so generally i i would suggest using co c01 as the barcode there's a few other questions in there as well oh the pseudogenes yeah is more difficult with meta barcoding to detect the those pseudo genes so as as you might know there's different ways of detecting them based on it translating into protein and if there's errors in terms of the translation that suggests that the sequence could be a pseudo gene but yeah but and there was a third part as well wasn't that yeah is there anything that can be done for groups where co1 doesn't dissolve it doesn't resolve it yeah looking into other genes uh really it's interesting would be an interesting case if co1 doesn't separate the species what is the reason for that are they relatively young species those those kind of questions then come up which are interesting in themselves as well yeah if you have that situation okay um from rod hill why use only one region of the gene i.e can you use a second short region to validate the screening or is this specificity of the selected region sufficient to offer the precision and selectivity required yeah and so it's a good question as well uh as it kind of pointed out in my talk the dna barcoding the whole idea is to try and make it as simple as possible so that's the reason to focus on a single gene region why make it more complicated if this single gene region can work and it tends to be that co1 is very effective as a dna barcoding for most species of animals of course if you have further resources and more funding and what some some reason why you're suspicious of the result of the co1 barcoding you could always look at other genes as well so there are other genes that also could work well as a dna barcode and primers might not be as well developed and particularly in terms of nuclear genes there's other issues in terms of mitochondrial dna is present in much higher abundances within a cell because it's from mitochondria and there's several copies of the mitochondrial genome within each cell whereas there's only a single copy of the nuclear dna so it's has advantages in terms of that as well but yeah definitely you could look at other gene regions and then if you've got different results from different genes that also raises some interesting questions as well is it a hybrid that kind of thing thanks john um katie woodcock is asking have you starred malaise catches for periods of time to extract at a later date and if so what did you find to be the best in terms of preservatives and temperature and also can i ask what preservative you used in your malaysia trap catch bottle what you found to be the best thank you okay so we used ethanol in the bottle for the catch and also that's how we preserved it then we put it into the freezer at -20 but we didn't store them for that long and say less than a year in all cases and so in terms of longer periods of stories than that i think probably the uh -20 in the freezer in ethanol would would be as suitable i mean if you can't if you do have access to a freezer that goes to lower temperatures then the lower the better really but um that i think would last for several years at that kind of temperature in ethanol okay um mark sheridan is asking as a moth trapper could this provide an alternative to gen debt for difficult species by using just the scales i'm not sure i totally understand the question i always an alternative to like um yeah using general determination i think yeah yeah okay yeah uh i think so yes uh definitely yeah yep okay we've got a question from chris workman um if the barcodes are based on the single mitochondrial gene cytochrome oxidase with 650 nucleotide and possible variation how much overlap or how many variations can show up between the species so it's generally about two percent uh different is used as a very crude threshold so if it's within a yeah if your sequence of your unknown is within about two percent of one of those reference library sequences you could assume that it's the same species if it's over two percent it's suggesting that it's a different species so that's what you used as a crude uh cut off point what we actually see generally is that there's a barcoding gap so there's uh usually the variation is within one percent and then it's probably above five percent to their their nearest neighbor species so to the next species you're going to be looking at those kind of values for similarity thank you a question from laura weldon what software do you use to process and sort your data can you recommend sash do you have a preference for meta barcoding data format barcoding data even for metabolic coding data i find bold the barcode of life data systems to be the best place for storing data it's online so you can access it from it anyway you don't need to have a large amount of storage on your own personal computer so that's what i would use even for metabolic coding data obviously you have if you're going to publish the data ultimately you'll need to submit your reads to a genbank can be useful for metabolic data as well okay she says funky um any more questions if anyone's got their hands no um so yeah if nobody has any more questions um yes i'd just like to say thank you again to john and thank you everyone for joining us today and this has been our first evening webinar and it's gone quite well um so if you'd like to watch the presentation again um we'll be uploading this to youtube um within the next week and i'll send out a link to that and um some various links as well um so our next webinar is this saturday um which is on um water beetles so please please do join us for that if you can and that one will be at lunch time at one o'clock um so yeah thank you and good night everybody