Transcript for:
Exploring DNA Metabarcoding for Stream Health

Freshwater macroinvertebrates are important indicators of stream health. They can be used to assess the impact the humans have on our stream ecosystems and hopefully lead to a better stream management, securing our freshwater resources. Macroinvertebrate specimens collected in routine monitoring are usually identified with morphological characteristics. It is however problematic as some taxa lack good morphological characteristics to identify the specimens on species level. This can make taxa identification quite time consuming or taxa are only identified to a coarse taxonomic level, like for example, chironomids. Adult specimens however can often be identified reliably to species level by taxonomic experts. Using a small genetic marker, the CO1 barcode, we can then sequence those well-identified specimens and deposit them in a reference database. Like a product in a supermarket, we can scan this DNA barcode and reliably identify each specimen in the sample. While this technique enables species-level identification, it also has to be applied to each individual specimen, making it too expensive to be applied in routine monitoring on complete kick samples. However, recent development in sequencing technology now enables us to identify all specimens present in the sample simultaneously. For doing this, DNA of the complete samples, often containing hundreds of organisms, is extracted. For this, the organisms are homogenized, DNA is extracted and the barcoding gene amplified using polymerized chain reaction. Then, Those fragments are sequenced with high-throughput sequencing, generating millions of sequences. Those have to be bioinformatically processed and can then be compared to the ZO1 reference database to identify the taxa present in the sample. The generated taxa lists can then be used to assess the ecosystem. This new DNA-based technique, called DNA metabarcoding, It's a bit like the products we had in the supermarket before, but now instead of scanning each individual product, we can scan the complete shopping cart at once. Hello, my name is Vasco Elbrecht from the University Duisburg-Essen, from the Aquatic Ecosystem Research Group, and together with my co-authors, Edith Wammus and Florian Lese, as well as Christian Meissner and Jukka Arovjeta from the Finnish Environment Institute, we looked at... the efficiency of DNA meta barcoding to be used in routine monitoring of macroinvertebrates. So our colleagues from Finland supplied us with 18 samples from the regular Finnish monitoring programs of macroinvertebrates for stream quality and we applied our DNA meta barcoding methods to those samples and basically compared if we get the same results and if DNA meta barcoding can be used for routine assessment of ecosystems. In the next few minutes, I will detail our metabarcoding pipeline and sample processing, as well as the results and challenges we are still facing with this relatively new method. In this study, we investigated 18 macrozoobentos kick samples from Finnish streams. The samples were collected as part of the official Finnish stream monitoring program and all specimens were identified based on morphology by an experienced taxonomist. In this study we are going to take those monitoring samples and also try to identify the taxa present in those samples using DNA meta-barcoding. The goal of the study is to identify potential weaknesses and advantages of the DNA metabarcoding method in the context of routine monitoring of macrozoobentos for water quality assessment. Each sample was dried overnight to remove ethanol. All specimens then were homogenized and DNA extracted. We used four different primer combinations to amplify a short region of the CO1 barcode. The BF-BR primer sets were specifically developed to target freshwater macroinvertebrates. While they are showing good detection rates with mock communities, this study is the first time that these primers have been tested with complete macrosorbentos kick samples. For DNA meta barcoding we use the Fusion Primer System. Here the primer amplifies the target region but at the same time includes inline barcodes for sample tagging and pooling. as well as the Illumina tails for sequencing. We amplified and sequenced two replicates per sample, each with four different primer combinations. So with 18 samples, two replicates and four primer combinations, this gives us 144 samples. The ready-to-load Amplicons were quantified and pooled to equal proportions onto an Illumina Hi-Seq system. Using pert and sequencing, 250 base pairs were sequenced from forward and reverse direction. We obtained 260 million read pairs in total, while each sample had an average of 1.53 million reads. Overall, the sequencing depth was quite equally distributed across the samples, as can be seen in figure S3. After sequencing and a quick quality control, Reeds were demultiplexed and preprocessed for clustering. This included pert end merging, removing of primers and trimming all amplicons to the same length. Trimming of sequence data to the same 217 base pair region. Made sure that the results are more comparable between the different primers. Additionally, the reeds were quality filtered using a maximum expected error score of 0.5, as implemented in uSearch. Also, reads longer or shorter than the expected length were discarded. For clustering, all preprocessed reads were pooled into one single file, which was then dereplicated, and all reads which had only abundance of two or less were discarded. Clustering was then applied with the 97% similarity threshold using the cluster OTU commands in uSearch. The obtained OTUs were then further quality filtered. Sequences from each sample, including singletons, were then mapped against those OTUs. Only OTUs were retained which had at least 0.003% abundance in one sample. to reduce the amount of OTUs which are generated by PCR sequencing errors and cameras. All reads were then remapped again against these filtered OTU lists. Taxonomy was next assigned using BOLT and NCBI reference databases and the OTU table which is available as supporting information used to compare the performance of the NM-meter bar coding with morphological based identifications. It should be pointed out that for statistical analysis we only retained OTUs which had above 0.003 percent abundance in both replicates. If the abundance was below we did set those values to zero. For individual OTUs where this was the case we highlighted those cells in the table in orange. Throughout this study, all four tested primer sets performed very similar. As can be seen in this bar plot here, most OTUs are shared and detected with all primer sets. The BF2-BR2 primer set did detect most OTUs, but also all other primer sets show quite similar values. Also, in a principal component analysis, here we separated the samples by the three stream types. The primers always cluster together with the same stream type, so they perform very similar. In this figure we can see the amount of taxa detected with morphology or with the different primer combinations for all 18 sample sites. For the DNA-based identifications, we don't count the number of OTUs, but the amount of taxa identified with those sequences using the reference databases. As the specimens in the reference databases are identified with morphology, we talk about morphotaxa here. The samples identified based on morphology are indicated with black dots. DNA-based identifications have other symbols. We can see that across all samples with all primer combinations, the DNA-based methods always identified more taxa than were identified with morphology. This can also be seen in the box plots on the right. With DNA-based methods, we identified roughly 60% more taxa per sample. than with morphology-based methods. While this sounds very promising, DNA meta barcoding is not perfect. The next plot shows the amount of morphologically identified taxa, which were also detected with DNA meta barcoding. You can see that DNA meta barcoding failed to detect around 30% of the taxa, which were identified with morphology. While this could be explained with morphological misidentification, it could also be that those taxa are not represented in reference databases or not amplified well because of primer bias. Nevertheless, DNA meta bar coding identified most of the taxa and identified most of them also on species level, which is often difficult to do based on morphology as many taxa are in larval stages and very difficult to identify morphologically. We use the metabarcoding data as well as morphology-identified taxa lists to calculate assessment indices based on the Finnish Stream Monitoring Program. Here on the y-axis, the DNA metabarcoding-based indices are plotted for the primer combination BF2 and BR2. And on the x-axis, the morphology-based indices are plotted. We can see some variation in the indices. But overall, the assessment results are very similar with the DNA-based to the morphology-based methods. This means that the DNA metabarcoding method would be suited for ecosystem assessment of freshwater macrozorbentos and potentially integrates very well with current assessment approaches. Thus, DNA metabarcoding should be further tested on complete macrozorbentos kick samples from routine monitoring programs to further validate this method. In a previous publication we have shown using mock samples each containing 52 different threshold specimens that metabar coding is severely affected by primer bias. This means that not all tux are amplified equally well and some might even remain undetected. Primer bias makes it extremely difficult to get reliable estimates of biomass or abundance. However, for monitoring currently most protocols are actually using taxa abundance to calculate those measurements. Thus we investigated how well the number of morphological identified specimens correlates with the amount of weeds we get for each taxon with DNA meta barcoding. If there is a significant linear relationship between the number of taxa identified with morphology and the number of sequences obtained for each with metabar coding, a regression line is shown for the respective primer pair. The BF2-Br2 primer set did lead to the most significant correlations, but we have to keep in mind here that the scattering in the data is quite large, so the adjusted R-squared values are not very high. Thus, those estimates here have to be taken with quite some caution. Nevertheless, with highly optimized degenerated primer pairs, rough estimation of biomass might eventually be possible. So in conclusion we can say the DNA metabar coding works quite well. With the right set of primers we can detect most of the taxa present in the ecosystem and many of them on species level. However there are also some shortcomings. So while DNA metabar coding compares very well to morphology based assessments in the study here. We have to admit that of course DNA miniatur barcoding is not perfect and that there should be some more improvements made and more things have to be investigated. So in the following I'm going to discuss some open challenges that metabarcoding is currently facing. In the factors currently limiting the potential of DNA metabarcoding, we can distinguish between morphology and reference database related problems and also laboratory protocol problems concerning metabarcoding, which also includes bioinformatics. Let's first look at the morphology side of things. When collecting samples, small taxa and well-hidden taxa can often be overlooked. One additional problem is the misidentification of taxa. So larvae are very difficult to identify, and this can be an issue if those are introduced into reference databases, where misidentification can lead to false positive or negative detections. Additionally, Taxa can be often missing from reference databases, which would mean that the sequence we find in the dataset, an OTU, cannot be identified down to, for example, species level, because the species level barcode is not deposited in the database. Filling the databases can be quite challenging, because there is quite a loss of taxonomic expertise of experts, which can identify imaginase based on morphological characteristics. are basically declining because many of them are retiring and there is not a lot of new taxonomists around. Potential solutions for those issues is to of course invest more time in database curation and really update records if some new taxonomic information becomes available or if someone realizes there's a misidentified species. But for that of course we need more funding for first of all taxonomy work but also barcoding work. to keep those reference databases alive. And we molecular biologists really have to work together with classical taxonomists to confirm potential cryptic species and really resolve conflicting taxonomies we have in the reference databases sometimes. On the meta-barcoding side of things, we have first of all the issue that small and large taxa are mixed in a sample. So this can lead, if you extract DNA from a bulk sample, that small and rare taxa which are not very rich in biomass might get lost in the metabarcoding dataset because it's not sequenced deep enough. Also we have this big issue of primer bias which not only can lead to some taxa fail to amplify and thus remain undetected but also this makes estimation of biomass quite difficult because not all taxa are amplified equally well. with universal primer sets. Additionally, samples can be affected by PCR inhibition. So if you imagine you want to use DNA metabarcoding for routine monitoring, it of course should work on all streams and in all sites. And sometimes there are quite some inhibitors affecting those samples and then DNA metabarcoding might not work that well. This is especially the case if we extract DNA from complete kick samples without sorting the organisms first from all the leaves and debris and gravel. But of course, ideally, we are able to deal with PCR inhibition and can extract complete samples, because then we would get rid of the issue where we overlook specimens if we have to sort the samples. Finally, there are currently, because it's quite a new technique, many DNA metabarcoding protocols around, and we really have to properly validate those and explore potential biases. to really find the best solution for monitoring of macroinvertebrates for stream quality assessment. Potential solutions to these problems can for example be size sorting based with a sieve to reduce the influence of large specimens in the bulk sample and also the use of ecosystem specific primers which reduce primer bias quite a lot. We really need more protocol testing with more communities but also with real complete kick samples for monitoring. And there has to be more standardization and cross validation of new methods, of course, and of existing methods to really set a quality standard for laboratories which want to commercialize this DNA metabarcoding for stream quality assessment. All right. This concludes our manuscript on macroinvertebrate monitoring using DNA metabarcoding, and I hope you We're convinced by us a little bit that DNA metabar coding really is useful for routine monitoring of macroinvertebrates and for assessing of stream health. I hope you take a look at the manuscript, because there all the results and multi-routed methods are provided in much more detail. And if you have any questions, feel free to ask me or my co-authors about this manuscript or anything metabar coding related. So I hope you enjoyed this video and thanks a lot for watching. Bye bye!