Overview of CARD and AMR

okay good afternoon everyone um so I'm Brian and uh yeah I'm going to be speaking today about the comprehensive antibiotic resistance database and um are associated ontology and um uh software that we use and have developed so for uh those in the room who might be unaware uh anti-microbial resistance pardon me is a uh is a ongoing Global Health crisis in order to combat this crisis we need uh Global collaborations especially in the past decade we've seen the emergence of a lot of multi and totally drug resistant pathogens and this has prompted a response to analyze both the human and economic impact of resistance and there's actually a little bit of horse news which is that it appears that kovid has actually uh set us back a bit on our progress because of poor antimicrobial stewardship overloaded hospitals all those kind of usual things that happen during covid um so that's all the bad news the um the bulk of this talk is going to be focused on this database which we call card for short um and so it's primarily a knowledge base on the molecular and genetic basis for antimicrobial resistance um it is the database is actually integrated directly with our ontology which is the antibiotic resistance ontology or Aro which is uh in the oboe Foundry and of course it's curator reviewed and it undergoes periodic public release and has various quality checks and so as of this month it has just under 7 000 ontology terms and these are accompanied and annotated by uh over 5000 reference sequences and sourced from 3000 Publications so throughout this talk I'm going to talk about the different aspects of cards so I'm going to talk a little bit more about the anti-back resistance ontology the AMR detection models that we actually use in software to uh to detect these things um are a little bit about our curation practices our software the resistance Gene identifier that actually does the the uh is the Workhorse for this for the prediction and uh this ongoing project car resistance and variants which uh I'll come back to in a bit so I thought I would just show like kind of a reductive version of uh like a subset of the ontology so basically this is what the ontology looks like for um the Metallo beta lactamases which are type of antibiotic resistance Gene um and so this is a little bit you know reductive but all I wanted to show was that power is split up into seven core branches uh or seven branches four of which we kind of consider core and the three that are in dotted outlines are a little bit secondary but the four key branches are the AMR determinants you can think of these as the genes themselves the mechanisms so the biochemical mechanisms of action and of course the antibiotic molecules and their targets and of course this is all connected in the usual uh ontological way using semantic relationships the other thing that Carr does is it attaches these uh it it sorry it annotates um the actual AMR determinants with additional information that we retrieve from Publications and from and from sequencing databases like genbank so when a novel um beta lactamase is described in this case I've just shown an ndn beta lactamase which is a type of Metallo beta lactamase we can go back to our reference databases which are usually genbank and PubMed although they don't actually have to be and we create this AMR detection model and so this is a model that our software then uses down the line But it includes the DNA sequence and the protein sequence as well as by other parameters these are typically going to be resistance Associated mutations so some some genes require a particular mutation for resistance the resistance mechum to actually kick in and then we also have this bit score cut off so this is a little bit um this is a little bit I think strange and unique to card but essentially what the bit score cutoff is doing is that when our software attempts to detect resistance from a User submitted isolate the bit score cutoff essentially tells card um how strict it has to be about what it considers that Gene so we don't actually need to have identical genes like 100 identity to a canonical resistance Gene we can be a little bit off and how off we are depends on the the gene itself and so it's Unique to that model so for curation our Golden Rule has always been that um to be included in card as an AMR determinant you have to appear in a peer-reviewed scientific publication you have to have your sequence publicly available and you actually have to have uh clear experimental evidence of uh an elevated minimum inhibitory concentration over over your controls um so that all needs to be provided in the in the publication uh where do we get these curation topics so you know like when how do we find um how do we find these novel resistance genes as they get published honestly these days a lot of our curation prompts kind of come from Community feedback so we rely a lot on our users to either email us or post on our GitHub to let us know like where we've missed things and where things need to be corrected I always kind of tell people that while card is an expert in this on ontological space with regards to antibiotic resistance none of the people really who work on card are experts on a particular subset of resistance so we rely on people who are experts in beta lactamases or other particular AMR Gene families to give us to give us that feedback and let us know where we've potentially missed things um sometimes we also do these targeted literature reviews so these might be ongoing projects for example if we're approached by somebody who wants to do a deep dive on a particular pathogen like shigella or uh enter a caucus we'll we'll do a deep dive into literature and review our database and make sure that we're up to date before we proceed with that uh most of our new month-to-month curation is actually assisted by a software that we developed at a master student in our lab developed um this card shark software which is currently on version three I would have loved to talk a little bit more about this but it's probably more appropriate for the people upstairs but essentially it's a text mining machine learning algorithm that uh ranks literature for us and kind of guides curators like where should you look first so it reviews all the literature and says this is most likely to be relevant to card and I just included the citation because um it actually appeared online just this week so if anybody is interested I encourage you to go check that out um so this is our software that we actually use the resistance Gene identifier or rgi which is the the brainchild of this man who's our lead developer I'm most repena um so I mentioned earlier about those detection models the rgi is essentially a way for users to submit their own isolates and rgi will tell them what resistance genes are present in their isolates and then annotate that using the information in the ontology and in the database and so that's available online but it's also available through command line distribution and so I'm just going to walk through an example that we use pretty often this plasmid that was sequenced in this paper from Laura Villa at all in 2012 so it's a little bit old now but it has multiple resistance genes present in a single plasmid and so it's useful for our illustrative purposes so I've just gone ahead and grabbed the accession from ncbi and I've put it into our web uh into our web interface and so when you do that and you run this through rgi rgi will print a tabular list of the results indicating what resistance genes were found and I've highlighted this one at the bottom because you'll see that it indicates that it has a strict criteria so this goes back to our idea of a bit score cutoff when we say a perfect we're talking about something that is a hundred percent identical to a canonical resistance Gene and when we say strict it's something that's not 100 identical but is above that bit score cutoff that we've manually curated and so it's basically something that's worth taking a look at in in more detail uh it also produces visuals so these are annotated essentially by the antibiotic resistance ontology so here this is just a gene oh I have a laser blender so these are just a gene uh this is a gene by Gene view so you can see the perfect hits and the strict hits but with the ontology and our annotations on the ontology you can also break this down instead by AMR Gene family which looks pretty similar except you can see that this strict hit belongs to multiple Gene families uh because in this case it actually confers resistance to multiple drug classes and then you can also view that by drug class and so now you see that these perfect hits um multiple hits can actually be bent together so both this octa-1 and ndm1 beta lactamase uh both are types of carb penameses so they get been together in the in the drug class View and it breaks it down for each drug class that's associated with this sample so then the next thing that we've started trying to do and this was basically when I came along uh goodness like seven years ago now this is something that I kind of started doing um which was the idea of using our GI to scale up our analysis and get closer to something instead of predicting resistance predicting resist tomes and so looking at an isolate or looking at a pathogen as a whole and using multiple isolates to imagine like a species-wide resist Dome and and but it is essentially generated in silico so when this started we started with a list of uh I think it was 18 pathogens um and now we're at in our most recent release about 300 and I think it's 377 I want to say um so it's grown quite a bit for each of those pathogens we go to refseq and assemble in the assembly databases and download basically all the publicly available isolates for all these pathogens and we analyze them using this resistance Gene identifier software we split them up by you know the assembly type whether they're a plasmid or a chromosome we've also Incorporated genomic Islands over time from Island viewer uh and so all of these get put in our Pipeline and of course we have a couple data Integrity checks to make sure that we're getting the species we actually want to get and that we're getting the uh complete assemblies that we're expecting but all of these get fed into this resistance Gene identifier which generates a prediction for every single uh isolate that it's examining and so we end up with a list of putative resistance genes associated with an isolate which we can then of course uh analyze in whatever way we want to to sort of get a feel of um what genes show up what genes are mobile what genes show up across multiple pathogens of of Interest or of concern and um and uh what particular types of resistance are constantly Associated within a within a family and so this all gets fed into our uh database schema so it's attached to the card database and it's again Associated directly with the ontology so it can be annotated with the ontology and so we call that uh card resistance and variants um essentially because it documents everything that we can find using our software and so this is also all available online or for download um today as I said we've done 377 pathogens which is about an analysis of 200 000 assemblies uh giving us a list of over 300 000 putative resistance Associated alleles again this is a mix of perfect and strict hits so it's not to say that everything is specifically implicated in resistance but it's it's a in silico prediction um and so if you go to the website and you were to look up either a pathogen like klebsiella or the accession or even by Gene you would end up with a list this is just snapshotting like essentially one isolate so this is one isolate that we've analyzed from ncbi where it comes from it was a plasmid and these are the associated perfect hits for resistance these are the associated strict hits for resistance according to cards sequences and this would be all the drug classes that are associated and so again some of these are strict and and you'd have to like actually do a um a phenotypic test to to verify any of these results but um yeah you can see that the list is quite long so and that's just a single isolate so with my last uh minute I kind of want to just touch on some ongoing challenges um we've talked a lot recently about how to make rgi smarter about incoming data we really want to annotate our database with a lot more descriptive terms so that rgi is more knowledgeable about what types of data are being entered and so we know for example that rgi sometimes predicts things that aren't realistic because it is naive about the type of pathogen that's being included and so we're trying to bolster that and we've also talked about you know how that would lead eventually to potential machine learning methods where not all resistance is genotypic something like a biofilm or a membrane like innate membrane permeability influences resistance and isn't predictable strictly by a genotype um and so that comes back to this idea of card pathogens which is essentially we get asked all the time how do I look up everything related to a particular pathogen and card and it's not trivial because card is Gene oriented and so we've talked a lot about how to switch that and be pathogen oriented and basically combine what's in canonical card with the information in card resist ohms and the last thing I want to mention is this nomenclature project that we've just started so this is actually a first for us we're going to start uh driving some consensus on these Amino glycoside modifying enzymes which I won't get into but have a history of poor nomenclature and I just wanted to mention this specifically because this is our main collaborator Emily Bordello who is down here in the audience um today and so if anybody has any insight into this I definitely encourage you to seek one of us out because this is a project that we're just starting and uh would love to chat more about so with that I'd like to give my acknowledgments and thank everybody in my lab and all our funding agencies and support so thank you [Applause] okay so painful questions um I'm here this is this is a great talk so I have a question um do you have a tool that users can use so let's say there's outbreak and assemblies okay they've sequenced The genome and the assembly is there so can they use a tool at your resource to see if or what types of antibiotic resistance mutations are in that particular genome or is the tools are all internal at this point uh no they're not all internal um so you can use rgi for that you could download it and uh if you have your own isolates that have been taken from patients and um and yeah you can absolutely do uh analysis on that we even have a we have a slightly modified version of rgi that we call rgi bwt which is a little bit more optimized for like metagenomic samples because we find that uh when we use native rgi without bwt it it has a bit of a hard time with metagenomic samples and so we combine it essentially with the data in card resist ohms to get more uh to get more sequences and get stronger Association but uh yeah rgi is available for anyone or any academic use or whatever research-based use and uh yeah if you're interested I definitely encourage you to check it out so thank you for the nice talk and I really like the work unfortunately I some parts of it are really nicely licensed with cc0 but Auto Parts make it very difficult for people to reuse could you consider licensing more of your work in a very permissive license yeah so it's definitely something we do get feedback about and maybe I'm not the most equipped person to talk about it because of course I do have like you know I'm not the owner of card or or whatever but um basically like The Arrow is of course publicly available as an ontology that's an oboe Foundry the rgi is uh available for any academic or government use uh to the best of my knowledge um but we do have I mean you're correct that when it comes to um like working with industry Partners who want to use rgi we do have some some license uh guidelines there um I know that that's a little bit controversial in this community but essentially we use any license fees to put back into the lab hire more people hire more staff and um yeah that's how we kind of you know have kept it going um and especially like as as card grows we need better um like more expertise because really when cart started it was uh essentially started by a group of undergrads and grad students and so over time it's really needed a bit more of a professional touch I suppose um so yeah okay any more questions no thanks everyone thank you very much Brian thank you

Transcript for:Overview of CARD and AMR

Transcript for:
Overview of CARD and AMR