Exploring Multiomics in Healthcare

[Music] welcome back everyone to revolutionizing healthcare and after our last session um which was our first specialty Spotlight session um today we will have an different type of session again um and this year we're sort of running these two types of sessions we have specialty Spotlight sessions where we go deep into to one specialty last time for example Cardiology and today's the first ever core concept session and these sessions are exciting because we'll cover ideas and methods from machine learning and medicine that are fundamental and transcend specific Specialties and today we'll start off with multiomics multiomics is a really interesting concept because the idea is to use high throughput techniques to basically provide a very comprehensive understanding of the human body and um this vast multi-dimensional data landscape it's of course right for machine learning as well we're really excited to have a great panel here to discuss these developments with us hosting these sessions are again Professor mianda who is the John hrey Plummer professor of machine learning Ai and medicine at the University of Cambridge and you here in just a second Tim olink who's a medical student at ku ku Len in Belgium hello everyone uh you just heard Tim Schubert is medical student at hibach University and also a co-host of this session uh to start off uh let's introduce our panelists uh Professor Andreas andras floto Professor floto hold the position of uh professor of respiratory biology at the University of Cambridge and serves as the research director at the Cambridge Center for lung infection at Port hospital his work uh supported by the welcome trust as a senior investigator Dives deep into the complexities of immune responses to bacterial infections next I have the honor of introducing professor kunu professor uh Yu serves as an assistant professor at Harvard Medical school's Department of biomedical informatics he has a rich background that includes a PhD in bio medical informats ICS and also a PhD in minor in computer science from Stanford University as well as an MD from the Taiwan University Professor U is at The Cutting Edge of integrating medical sciences and Technology thank you for joining us today moving forward I'm trilled to introduce Professor Julio s Rodriguez uh he's a Visionary in the realm of medical bio bioinformatics and data analysis he holds a key position at the hyberg University faculty of medicine and directs The Institute of computational biomedicine his leaderships extends to his role as a group leader at embl heidleberg University molecular medicine partnership unit next up we also have some in-house expertise we have Dr Fergus Emy who is a postu at UCLA working at theer sh lab so how will the session work uh let's take a moment to outline uh our session ensuring that we all make the most of our time together our agenda structured into five distinct sections we just are going through the introduction and following that we'll delve into the Intriguing world of what is actually multiomic this segment will be presented by Professor floto who is so kind to get us started in a few minutes next we'll move on to the core of our session with the panelist presentation our experts will share their insights and groundbreaking work shedding light on their new methodologies and Cutting Edge Technologies at the end we'll have a dedicated segment on the relevant work that has been done at the fare lab showcasing some of the Cutting Edge research that has been done lastly we'll open the floor to you for a Q&A session this is your opportunity to engage directly with our panelists so please feel free to submit your questions uh in the Q&A chat throughout the sessions and upvote those you find particularly compelling and lastly I would also like to bring everyone attention that inure in order to ensure a balanced and uh good dialogue we've also implemented a timer to ensure everybody gets time to share their views this you will see this um popping up on the screen and that will be your cue to um finish your segment and uh let's move on the next okay thanks Tim and uh before we uh really get into it uh we also wanted to make these sessions a bit more interactive so here is a poll for all of you uh we wanted to know where have you encountered multiomic before or have you even encountered multiomics before so so not surprisingly I think that most of you have encountered multiomics in research settings so about 75% and uh some of you have never encountered it before which is perfect because we will now have Professor flotto give us a brief introduction to multiomics especially to those of you are not as familiar with the concept well welcome everyone thank you to both Tims and Maha Fergus and everyone involved in these the terrific sessions actually um so I've been asked just to give you a BRI brief introduction on multiomic for those of you who are coming from a non-biology background um and I guess first of all just to refresh in everyone's mind this concept of information flow in biology uh originally declared by Francis Crick in the 50s but effectively the information flows from DNA which is then transcribed into RNA which is then translated into proteins and the proteins interact in order to deliver outcome and this kind of fundamental process uh in biology is one of the starting points if you like for why we need multiple ways of looking at omix information so I just give you one example of this which is the impact of DNA mutations now you might imagine that they all have effects but the reality is the vast majority have no effect at all and so you can see here in these cartoons the DNA is transcribed is the messenger RNA here and this is the proteins so as I say the majority of them have no effect at all some of them however affect protein quantity that's to say the amount of RNA and therefore reduced amounts of protein While others affect protein quality or the function of the proteins and you can imagine how deconvoluting this becomes relatively straightforward once you have also transcriptomic information about the impact of the mutation on the DNA extending this further you can see that there's a pathway of various steps along the road to get a disease that you can recognize in a patient so there may be mutations in DNA there may be alterations in the quality or quantity of RNA the quality and quantity of the protein how the protein interact together in networks to give a cellular phenotype and how those cells interact uh within an organ uh to cause organ dysfunction and that together with other aspects can lead to disease presentation in patients and it's worth reflecting that in addition to the genetic uh um uh mutations that can drive these process there are obviously environmental stimuli that can influence any uh one of these uh uh steps as well and so it's really as a response to this uh this these process or these series of processes that lead to disease um that that um the approach of multiomic has become very popular and it allows you really to interrogate each of these aspects of the disease process and by integrating them uh deliver insights into causal factors or mechanisms of disease so for example with a DNA perspective you can obviously have genomics that's to say sequencing the hold of the genome uh at population level in many cases there is epigenomics that that uh allows you to identify DNA modifications or modifications to the histones that bind the DNA uh regulating its accessibility uh for transcription then there are a number of methods that are used in order to uh analyze RNA both in bulk um at single cell level and nowadays more um increasingly popular uh spatial methods where you can look at uh uh transcriptional profiles in tissues then there are proteomic methods either bulk or uh uh in some cases at single cell level where you use mass spectrometry in order to characterize the quantity o of uh and modifications of proteins uh within cells or tissues then there are a number of other as omic uh modalities that can look at cellular or organ outputs for example the level of metabolomics these are small molecules uh lipidomics uh and uh citometry which can done be done at scale now in order to look at cell the differentiation numbers of relevant cell types and so I guess the challenge uh is how these various modalities can best be integrated uh sometimes with missing data from or missing data types um in order to provide insights into organ level or organism level uh biology and I guess equally there's the challenge of how you integrate both uh functional uh assessments I'm just showing here an ECG but it could be anything uh or electronic health records Imaging and sensor information uh to provide useful actionable intelligence about disease processes so I guess there are a number of potential tasks that you can apply these multiomic approaches to and I guess principally they would include causal Discovery is this DNA uh mutation relevant how does it cause disease the second mechanisms disease that's to say can we build an explainable understandable model of how disease uh is caused uh in term thirdly in terms of biomarker Discovery or using it in a kind of biology agnostic way in order to provide prognostic or predictive information fourthly in terms of synthetic biology that is to say this is a set of uh transcriptional signatures that are associated with a cellular fate if I want a different cell of Fate what would I need to change in those transcriptional profiles to deliver that type of cell and that's kind of really super exciting kind of um uh upand cominging uh Technologies and finally the idea that you could use these in order to provide information around drug Discovery and Drug targets so for example if you understand how certain diseases caus certain certain cellular responses you also orthogonally know how certain drugs affect cellular responses and then you know how certain cellular responses relate to outcomes you could in principle then design a new drug or repurpose a new drug to provide the cell responses that you want in order to provide the therapeutic benefit uh that you desire so I hope that that's been helpful as a sort of grounding for the discussions in the next hour or so and I'll hand back to [Music] Tim thank you so much uh for sharing that and I think it has been a great introduction to the field um I'd now like to hand over to Professor s Rodriguez um and I would like to ask you would you be able to summarize your work that you do in the multiomic sphere and maybe also point us to the most promising recent advancements you see in the field yeah thanks Tim I'm happy to do this and actually the introduction of fund is fantastic and and I will basically build on that so as he said no there are all these um let me see so there are all these technologies that allow us to measure different molecular layers across spaces and samples and in particular um with single cell special resolution so we recently review a work of of a body of work where we try to struct those mechanisms that Andre was alluding to by using sorry my slight exp by by using prior knowledge so so there are different ways that we can take the omix data and stract key mechanisms that are hypothesis for validation and provide us insights as well as Andre said not use them for biomarkers and therapies and the basic idea behind this approaches is that there is a lot of knowledge there about Pathways the targets of transcription factors Lian receptor complexes and and many other molecular processes there is also a lot of public data that we can leverage across repositories like the human cell Atlas or um near hridge TBI there is a lot of single cell and spatial data so this give us the opportunity to leverage uh the Boke data we may build for a particular The Cisco heart with existing public data Maybe preference data sets as well as not only human but also Mouse and integrating into computational methods to to inate hypothesis for validation and close this cycle and uh from from the point of view of the methods and thinking of AI until now most of the of methodologies used are from the computational point relatively simple um so the computational methods that we have used so far are not necessarily very advanced in part because the data sets were limited in Scopes so were standard enrichment analysis or linear models but now we can bring into these computational methods more advanced methodologies that leverage strings ranging from um VAR coders or now of course Transformers and so on no and in this real what is the focus of our group is is to bring these different resources and and databases for the computation method so we buil meta databases that bring together existing knowledge databases more recently also using knowledge graphs as a more modern way to bring knowledge together we focus on particular molecular processes such as Pathways as I said before transcription factor and and Lian receptor interactions and we provide all these tools as free open source that then can be used with different computational methods and just to to pick one example where we applied this recently uh we had a study H in the so I was going to move to this slide H maybe I'll stick in this modu where we combine single nuke single cell RNA s single nuke attack this is chromating accessibility tells you whether the DNA is open or not so whether they can be um transcription as well as one of these spatial Technologies and again without getting into the details of the of of what we learned in terms of the biology we were able to to characterize a cohort of patients that have suffered myocardial infection either acute infections so right after the the the attack happened or later on and by having these three different molecular layers we were able to identify a lot of the mechanisms and just to give you a a feeling of what type of insights you get this is when you look at at a sample from a patient who had chronic heart failure so not the acute but the more longer term effect so you see in different computational methods we were able to identify different cell types in this case we're very interested in different types of fibr blast one types one and two then we connect them to a specific disas mechanisms such as the activation of tjf beta which is a mayor pathway involv in fosis so something you would expect and to specific transcription factors as you can see here no again this is published work so you can look it up but it's an example of how combining omix with computational methods H you can um uh understand better diseases and ultimately use this as a way to identify new targets and and potential biomarkers and and what I think and again this follows what also Andre mentioned is is really uh very very exciting is the opportunity to and the challenges that come from the spatial inal multiomics data so I mentioned one example in the previous slide with the spatial transcriptomics but all of the omics can be measured at the at the spatial level and has all sort of questions from um how you develop the spoke analytical methods to levelize this spatial data to interpretation to more Upstream questions such as the image analysis and ultimately we are able to develop powerful methods and of course many people in our community are working on this we can better understand the tissue architecture in in a health but in disease context as well we can understand how there is a regulation and the regulation of the cellular programs understand the de unit and again will help us to improve uh theas prediction and treatment so yeah I'd like to finish here and of course I'm happy to take any question wonderful thank you thank you so much uh for that um and I think we will definitely have uh many many questions especially during the Q&A um for now I would like to hand over to Professor U and I think this is perfect because uh as far as I know um youve worked on the first fully automated AI to extract features from Whole SL test apology so I guess um you're the perfect person to continue after talking about specially resolved onxs wonderful thank you so much for the invitation miara and team and thank you so much for sitting the stage Dr froto and Dr say do so um as team briefly introduced my work is largely about integrating pathic which is a different form of omix related to tissue architecture and tissue Imaging with multiomic using artificial intelligence methods so I would like to introduce a perhaps familiar clinical example for some of you as many of you know the realtime pathology diagnosis especially for brain cancer is dispensible for identifying the optimal treatments for patients brain cancer accounts for more than 200,000 death per year and the surgical exision Remain the backbone for treating this deadly yers during surgery surgeons would often send pieces of the tumor sample to pathologist for Real Time evaluation and in this evaluation pathologist wouldn't have time to process through the standard form fixed procedure instead they use liquid nitrogen to quickly freeze the sample and provide a frozen section diagnosis under the microscope and report back their diagn IC result in real time within 10 to 15 minutes to the operation team and this process is pretty costly because during the process the surgical team has to wait for the results and during that 1015 period 1015 minutes of time they essentially have no important clinical decisions to make and the time pressure imposed on the pathology team also account for substantial human error as compared with the standard his pathology diagnosis using the formally fixed slides and the pressure on both ends contributed to physic burnout and our solution is to develop a AI based technique called cry section histopathology assessment and review machine or charm in short and we can perhaps dig into the details of some of the tech some of the technical details here during the Q&A but in short this is an AI based approach to connect the hisop pathology IM es with multix data including those from um the idh mutation from sequencing panels as well as dram expression and protein expression levels and we have built several independent task using multi multitask machine learning and we train and validate our Transformer based models aggregate our file based results into a patient level prediction and in short our approach has correctly identified the malignant cells and were able to rep P back in near real time to the surgical team as to whether they have been operating on the right region and we are also able to predict the multi- oming status of the samples with pretty good accuracy with an area under the curve of more than 0.9 in the prediction of idh mutation as well as many other molecular subtypes defining the patient prognosis and as we can see from the from the visualization here the samples with idh mutant STS usually have highly idat regions with low seriality whereas those with idh well type cancers has greater seriality and Tivia and we further extended this publication to build a multi omix multi cohort assessment platform for molecular and prognostic prediction and here's a brief overview of this generalized platform in short we have start with the histopathology images collected from different sources some use the whole Imaging technique to scan the tissue at very high resolution with billions of pixels per sample and other study Co use a more conventional tissue microarray technique which is lower cost and would be able to provide the representative views of the samples and we use our previous device method using supervis weekly supervised machine learning to connect these image patterns with multix profiles including they microsat instability and the mutation status of B egfr and other treatment related mutations and on another arm we also build survival prediction models to directly provide decision support as to their overall survival and diseas free survival under the standard treatment and this platform has helped us to identify many multiomic profiles including microsat inability and we can also predict the survival outcomes and one interesting angle here is that because we can now look into the tissue micro environments related to these multiomic profiles we're able to draw the correlation between multiomic operations and their impact on the tissue microarchitecture for example here we shows that in the prediction of B mutation we have a substantial amount of attention by the AI based model paid into the regions with infiltrations of Aros tissues and this can further help us to form biological insights and biological hypothesis into many different cancer types in summary here we have provided a few AI owed approach of connecting multiat and cancer diagnosis and pathology or mix but multi-discipline research is still required to further address the implementation challenges to make sure that we can implement this Cutting Edge AI or multi based approach into the actual real world settings so because there are so many ongoing challenges with apologies to JFK I think we should ask not what multi AI can do for us but instead ask what we can do together to build a better future in medicine with multi op AI thank you thank you so much Professor you um and finally I would uh like to give the floor to our in-house expert uh Dr Fergus emry well thank you so much for um inviting me to be here and present some of the work we've done in the lab um to Professor use challenge about maybe what we can do um for multi omix AI so so our other panelists as well have also highlighted some of the other challenges but I'd perhaps like to begin just by briefly reiterating some of these challenges in particular from a computational standpoint I think the first thing I want to just highlight is the dimensionality of omix data sets um different omix layers especially compared to a lot of other data modalities more frequently seen both in medicine but also in in in other AI applications these can range from kind of tens of thousands of different variables to hundreds of thousands and even and even Millions another couple of challenges um that are particularly common um in multix data sets with with like where computational methods can um really struggle are limited label data um where we're then more likely to pick up on very spurious relationships between our variables and our our outcomes of Interest as well as inter feature correlation which can confound the learning process so just to briefly explain what I mean by this let's say we have a gene X1 the when it's really highly expressed might cause some fenot type y but also causes Gene X2 to be highly expressed and of the feature we want to identify or we want our predictive model to use really is this feature is this Gene X1 since it's causing our phenotype y but the presence of of variable X2 makes this very like makes this much more challenging so I'm going to begin by talking about a method um that's been applied for polyenic risk scoring and this first approach is a method developed in the lab by former PhD student jinen Yun as well as others in the lab and Professor vanera called vme and this introduces a novel self-supervised approach for tabular data such as omix but allows us to learn from unlabeled data effectively for to them be able to um perform uh supervised tasks Downstream so to very briefly explain how vime does this just like to consider the left hand side of this figure on the slide so we first take an unlabeled sample and then we can corrupt this sample by masking out some of the entries at random and replacing them with with other values that this variable could take but that might not make sense in the context of the other um of the other values for that sample and we then train an encoder Network to be able to both predict which of these variables were masked and what the true underlying feature value should be and this this allows this encoded Network to really understand um the raw data on a on a deeper level and then with what limited label data we do have we can then train a predictive model on top of this label data in either a supervised or a semisupervised manner so as an illustrative application vime was applied to genomewide polyenic risk scoring using data from the UK bio bank and the goal here was to predict six different blood cell traits from single nucleotide polymorphisms using um UK bank data and VI outperformed all of the Benchmark approaches t uh tested across all of these six different traits a very different challenge that um andrez um in particular Professor floto identified was that of biomarker Discovery and so in standard prediction problems we often have a set of features or variables and we want to train a predictor to maximize its predictive power in contrast in in the case of kind of biomarket Discovery we actually want to identify just a smaller subset of the variables or features that are going to be the most predictive um and this then becomes a joint optimization problem of of selecting kind of as few features as possible while Max still maximizing the predictive power of our predictive modu and there are several reasons we might want to do this and I think the most pertinent for multiomic data is really to get this deeper insight into into our data and understand exactly which um biomarkers are driving the phenotypes that we're interested in but there are also in maybe other domains in particular issues around cost but also around generalization and we want more robustness to these nuisance features to overcome this challenge I mentioned earlier about this inter feature correlation so to address these two to to address this we proposed a novel feature selection method called sefs which is joint work between myself and a former PhD student from the lab Changi as well as Professor Vander Shaw and to to enable um more accurate Discovery discovery of new biomarkers we use this two-stage approach for for our method in the first we follow a similar procedure that I described for vme and pre-train this encoder Network in a in a self-supervised manner to learn representations that are favorable for feature selection and then we use this pre-training en coder in order to select features that are most predictive of our Target outcome the key technical Innovation here that really helped us was to handle um highly correlated features in particular is is how we selected this um is how we selected the features and to really incorporate the correlation structure of the underlying data much more effectively than was done in previous approaches which chose to ignore this and I'm willing to take maybe some more questions about this if it's interesting in the Q&A later we validated our approach on both transcriptomics data for peripheral blood mononuclear cells as well as proteomics data from the cancer cell line encyclopedia and found that our method sefs outperformed a large number of other feature selection approaches in both cases and additionally we are frequently able to identify and validate um the discovered features with many of the top ranked features shown on the right in blue for the proteomic data set only discovered with our self-supervised approach here we have here here in that work I just described we might have a subset of selected features but a question we can ask is Is there further structure here we can discover often let's say it's not just one mutation that is driving a disease but maybe it's it's the occurrence of two mutations jointly but these previous approaches for feature selection won't won't be able to identify this they'll just identify a set of relevant features so a natural way of thinking about this additional structure is instead of just one big list discovering what we're going to call composite features which is effectively groups of groups of variables that have some form of joint importance with each other but that are distinct from these different groups and there are many scenarios where this plays out both both within multix data sets and Beyond and so to address this we proposed a new machine learning model called comps for composite feature selection that's composed of an ensemble of feature selection models similar to the ones I've just described earlier as well as an aggregate predictor and this enables us to identify these different groups of predictive variables finally I want to turn our attention to multiomic integration which is one of the which which is one of the most significant challenges that exists in this area and one of the challenges that Andre Professor photo identified in particular in his presentation was how to was how to integrate and learn from multi omix data and in particular when we have missing omix observations and various missingness patterns across different samples and so conventional solution to this often discards um discard samples with missing omix which loses samples alternatively they could discard different like omix layers which have missing samples but this loses omix specific information or alternatively they're forced to impute missing omix observations which may cause data Distortion to address this former PhD student from our lab changili together with Professor Vana proposed a novel approach to solve this problem called Deep imv that's able to integrate data from multiple omix layers with arbitrary missing patterns both during training and app prediction and and it achieves this via a set of VI specific encoders and using an informational bottleneck approach this approach was validated using data from the cancer genome Atlas predicting oneyear mortality based on Multi omic observations outperforming all approaches including the popular mofa approach across different number of views and in particular when learning from incomplete data that's more representative of real world scenarios as one final topic Andreas mentioned in his talk that one key Dev elopment is to use omix to guide treatments and to move in this much more causal direction we need to understand both conditional dependencies and independencies between variables and in the presence of treatment but this becomes particularly challenging for high-dimensional data so for this former former student Alexis Bella and Professor V proposed new approach based on generative adversarial networks for performing conditional independent tests in particular for this High dimensional data and their approach was validated on the cancer cell line encyclopedia to distinguish between genetic mutations that influence drug responses and so for many uh mutations as shown in this table here actually the method gcit agreed with a lot of the existing approaches and the literature but in several cases they noticed this disagreement between the newly proposed existing approaches um the existing computational approaches but they found that um the gcit um kind of decision about the conditional independences agreed with the literature which I think really demonstrates the promise of this approach and with that I'd just like to thank you for hearing and I'm really excited um to maybe hear more about from our panelists and I think pass back over to the two Tims thank you Fergus uh for the excellent presentation which which will give us a lot of um sources to um discuss during the discussion uh we had a great question uh from the chat while you were presenting um Mato Maggi could you please uh unmute yourself and ask your question to Professor Yu so I was just wondering um when it comes to collecting datas especially in um in regard to pathology uh pathology images tend to be very diverse uh from one another so they may um exhibit some type of variability uh and it is probably even more emphasized when it comes to working with multiple censors so I was wondering if um both in data collection and um model development there was some kind of um some kind of work to uh to do to really um level out all these differences or maybe it's it just doesn't um it's it just isn't a problem uh but if if it's a problem to what extent um is it yeah that's a good question and to solve the variability of the image across different uh simple across different syles and also across syles from different sites our current solution is we have implemented some data normalization and image augmentation methods within our pre processing Pipeline and this is a way to help us identify if there's any potential like data distribution shifts across the different sites and also we would be able to further develop a model that would be able to recognize the different patterns collected from different different sources for example it could be that the sectioning methods are slightly different for example the section thickness may be different across different sides or the chemical they use to stand this pathology samples could also have different concentrations and I think Dr says the heat may also have some thoughts on this but in in general our current solution is largely related to image normalization and image augmentation and this is a very important preprocessing step that we have to take to ensure the generalizability of of our models yeah uh thank you Julio do you have anything to add to this because it was also addressed uh yeah indeed uh yeah I think K gave a great great answer and um um so we struggle with this also the I mean the level of thex data and I guess there are two broad ways to think about this and um and it boils down a bit to which level of detail or or granularity you you look at at the system and so what we have found is that if you try to kind of also analyze or integrate data at a more Downstream level basically when you have process and and run your models there is more consistency and and it's more feasible to to take disparate studies I'm thinking of somewhere we have done met analyzing different single cell atlases so actually that works much better than if you try to bring it together earlier on if that makes sense H but it was a very specific type of analysis on on looking at multicellular programs in terms of single cell level so I think there is a lot to to be done other people try to just reprocess everything from scratch using different strategy using common infrastructure and common workflow and inde is one of the major challenges we have which maybe it's not the the most exciting things in terms of modeling or so but you know it's really the foundation because whatever you do later will be biased but but these steps yeah thank you yes thank you uh and again excellent question thank you for that uh we have another excellent question uh from the chat from um sock chin if I pronounce it correctly would you be able to uh unmute yourself and um maybe address your question to Professor floto yes and thank you very much my apologies I don't have camera for now so my name is soin from Joseline diabetes center Harvard Medical School and my question is more about um what how can we actually apply this multi omix AI to actually decipher some of the complex uh complex disease condition for example we often have um cardiovascular kidney and metabolic disease which is highly hogen and U multifactorial in origin and um and very importantly I mean timely initiation of a drug is very important so how can we use this multiomic AI to sort of support like what drug can be given uh in a certain patient with certain phenotypic features yeah that's my question thank you that's a pretty good question um really it's really hard isn't it I I I think there's a number of issues here which is diseases are hetrogeneous across patients but also with time and diseases disease trajectories are also influenced by um the actions of Physicians and treatment and and that and and then as you quite rightly say they're intrinsically um in many cases multi-stem disorders um and so I guess there are issues around uh mapping uh specific uh omix data sets to temporal trajectories and then the second challenge is causality Right inferring causality and both of those things are actually extremely difficult um and I think actually without prior knowledge and experimental interventions at least to provide a framework where where you can probe the data sets it's actually very difficult a priority um to to to unravel that I think I'm gonna hand over to everyone else because that's a really hard question so thank you Dr flut I agree with your comments basically yeah um Professor Mela you're you're quite familiar with complex problems do you have a machine learning uh Insight you could uh could help us with no but I'm very excited about this question so um I'm going to Second Andress here I think that this question in causality is especially interesting and if you look at the machine Learning Community currently focuses a lot on causal Discovery from static data but I think that that is a big mistake and that needs to change we have actually written quite a lot of position papers to try to orient the machine Learning Community to look at the type of problems that um Dr tier is mentioning in the sense that we would like to understand how a particular type of um genetic information or or omic information triggers what type of um maybe phenotypic um expression and when and how different types of drugs are going to lead to different types of outcomes when so looking at causality and causal effects over time that's quite an established literature there but um we are only estimating the effects of treatments over time even interventions what we do not we we are not able to do as a community is to really discover through causal links so one thing is to let's say look at trajectories and how different events may be triggering other types of events so building these type of trajectories and trajectories over time for that quite a lot of literature exists establishing effects of treatments also quite a lot of f exist but through causality and proving the this is a true caal relationship that is yet another Frontier and I really hope the machine Learning Community will will address that but for that we need very high quality data set as Andress mention thank you so much for that um we'll have more questions in just a second um one question uh that uh we came up with before the session uh which we are really interested in and want to ask you um maybe first to Julio um how can we use existing biological knowledge for example on certain Pathways or genetic programs and integrate this into what we discover with multiomics data or what we learn from multiomics data yes sorry I was trying to mute um indeed I mean we are big Believers in the value of prior knowledge to help us and was also mentioned before by Andreas to to help us go a bit closer to cality we know that prior knowledge is limited it's incomplete it's it's biased towards cancer or other things we have studied the most but still we think can be helpful on one hand to to reduce the dimensionality of multiomic data by structing smaller number of features that then can be input to AI algorithms to increase the interpretability of of the data but because you know for many gen we may not know their individual function but we can abstract them at the level of these Pathways and even there are ways to do in between things like um uh and I was alluding a bit to this in my presentation so as we have more data with singles and so on we can try to find new things and use the prior knowledge a bit as a if you like basian prior as as a way to guide us what to look for but not limiting our finding to things we already know because we know is incomplete and if you have enough data you can build such models and there are nice efforts from several groups trying to do this type of of approach so in summary prior knowledge we think is helpful to get closer to True cality mechanisms but because prior is limited we always have to keep that in mind and ideally not use it as a a very strong prior in AAS in sense thank you and maybe just a quick followup uh not only to you but also to the other panelists um if we are to use mics especially um outside of research and all also in clinical practice uh interpretability will be an important topic right and the questions is what does interpretable really mean to you in this context of multiomic data which is very very complex and hard to understand maybe uh Professor FL or Professor you if you would like to go um so I mean there's a first question is how applicable is multi are multi omix approaches to Clinic practice that's a big question and are we missing a step that is to say to extract from the omix data a smaller set of features that you can test in a more robust way um but um but I but I think um I think there are the question of trans uh of interpretability is important more generally in ml but um but I think in clinical the confidence that clinicians will have with the clinical um predictions of the model are directly related to how understandable the outputs are so I'll leave it there and hand over to Dr you yeah I agree these are all great questions and also big questions that we don't really have a definite answer as a field so the first one is about the applicability there are still a few roblocks largely related to uh like the cost of implementing many of these multiomic techniques in the clinics may be cost prohibitive and for certain medical centers they may not have the infrastructure like the equipment or the the staff to handle such profiling and in ter of the interpretability that is also another big question because it basically means different things to different people for different conditions they have different levels of like understanding or of the AI model and with that different different different levels of interpretability also means makes different levels of sense to various forms of uh clinicians and the clinical staffs and but I'm pretty optimistic about this because as long as we can provide some biologically or clinically relevant information to the clinicians and if we can further validate such signals are directly related to the patient outcomes of interest and if we can further connect um through the biological mechanisms like in the previous question for example incorporating some of the known Pathways and known functions of these gen proteins or other entities we might be able to overcome this hurdle of theability by providing some insights into our model but not necessar completely open up this block box yes excellent uh thank you uh we also I have another question in the chat thank you everybody for your excellent questions um I think it's called Aina um ask a very good question could you maybe unmute yourself and uh address your question can you hear me yes uh yes so my question is uh um I was wondering if you are using multiomic approaches also to rare diseases and yeah if you have more like Publications or information on these approaches multiomic I would be really interesting to me thanks Professor you I saw you nodding along so uh yeah yeah this is a great point because as we know like machine learning approaches work better if we have a lot of data and for the real disease it POS a substantial challenge to many of the existing machine learning approaches and we have a paper currently still on the review on detecting the out of dri ution samples that is although we don't have sufficient information in the training set because we may not have encountered this rare disease in our training data however by comparing the data representation we would be at least able to frag this particular sample as something quite different from what we have seen in our training set and we can allow pathologist or clinicians to further review this sample that's our current strategy and I'm sure there will be some other ongoing techniques that could further and better pinpoint the right diagnosis for patient with rare diseases thanks can you um write the publication or in chat maybe you can share thanks sure yeah we have a recent conference presentation and the paper is on this way okay thanks a lot thank you so much um one more question to the panel um is uh in the beginning um Professor flter showed us how multiomics data is collected and it was very interesting um but I think what many of you probably noticed is that the levels at which data is collected is different for example um while you can do single cell RNA sequencing to get all the RNA from single cell um that's not really possible for protein as at Le Le for now and um the question was can we use um data from single sequencing and B proteomics for example to learn something from both of them because the ground roof is sort of the same but the sample that we look at is different um maybe this a good question for Julio again so it's a good question indeed and um uh along the lines of what you were suggesting I think from both data types we may be able to get insights about specific mechanisms such as Pathways or or let's say let's say you have single cell proteic and you have single cell transcriptomics so you may be able to measure at the protein level the transcription factor or the phosphorilation transcription factor which tells you about activity and from J expression you may be able to look at the targeting of the transcription Factor so you may get the option to come to the same biology or even two complementary streams of data about the specific biological process but again I think you need to leverage as much as possible the understanding of the biology know also like Andre was showing with the central dogma so it's not I think that you could say okay I have for a given Gene I look at the RNA and I look at the protein of the same gene um but maybe bringing into into this approach what we know about the biology and that the protein particular protein may be correlated not with its own RNA but about the RNA of other genes that is controller if that make sense if you allow me I have a question for the panelist as well is that possible yes please go ahead so I wanted to ask all of you andr and Julio um what type of machine learning methods you think need to be built and for what if you would have each one of you a wish list to the machine Learning Community because we are trying with the sessions to provide also a bridge to to the machine learning community so do you have a challenge each one of you a brief challenge for for this community and explaining the context so I me from what I can say I think we we alluded to it in different ways no is that we have algorithms that really get to the counil mechanism it's an obvious thing to say but as you also were saying Mela know like like can it really get us because at the end that what matter that you really P the true molecular driver of is what will really be critical and and that's difficult yeah I agree on top of my this is definitely Coastal machine learning to further understand the mechanisms and also about robust machine learning method that we can apply to different clinical settings for example samples collected from different centers and also fairness that is related to whether we can ensure our model would work for diverse populations especially when they have different prevence of diseases in question those are probably the top three that will be on my wish list well I agree uh definitely causality at the top um I think um perhaps I can frame it in a more prosaic way but but batch correction that is to say uh uncovering the biological information uh above and beyond the noise related to differences in an experimental approach or or methodologies across data sets I think those will be really important amazing um thank you so much wonderful thank you and I see that we have reached the timal end of this amazing discussion thank you again to the panel I think this was a really exceptional panel and um really amazing to hear from you all um coming up next is again um a specialty Spotlight session where we look at intensive care medicine on the 24th of April and our next core concept session will be on treatment effect estimation and we've already had a few hints here and there on that today so I hope that you'll be back for that as well and last but not least um we want to share again that this summer we all have a AI in medicine summer school that is the first in world that's specifically targeted at clinicians and medical students um who want to learn how to use machine learning in healthcare from the ground app so we'll give you all the fundamentals and basically you need no PRI knowledge you don't even need to know how to code but we will build all these skills from the ground up and we will cover everything from llms to treatment effect estimation and of course also multiomics so thank you again everyone for joining us today thank you for the great questions thank you for the amazing answers and presentations and hope to see you again [Music] soon

Transcript for:Exploring Multiomics in Healthcare

Transcript for:
Exploring Multiomics in Healthcare