[Music] [Applause] welcome to more quotes on introduction to protein genomics today's invited speaker is Professor Frederick Ponton who's currently a professor at topsail honesty in Sweden dr. Ponton will talk to us about the human protein Atlas or HPA which is a Swedish based program it started in 2003 with the aim to map all the human proteins in cells tissues and organs using integration of various omics technologies like antibody based imaging mass spectrometry based proteomics transcriptomics and systems biology he will tell us about this mega project how it succeeded despite having multiple challenges he'll also tell us about how Indian pathologists and research collaborators have played a great role to make everything possible for success of this project in today's lecture he will mainly focus on the tissue atlas of human protein Atlas further he will tell us about how RNA and protein expression throughout different tissue follows a trend and how this correlation need to be considered for research if we want to obtain the bigger picture dr. Ponton will also talk to us about the sub proteome oregon-based proteome secretin present in HP which will provide you an idea how to use this useful resource for your own research so let us welcome for Sir Frederick content what I'll talk about today is is the human protein a-plus and I'll give you first just a brief background about the project I'll give you a little bit of our results and data and where we are right now and in the end I'll give you some perspectives of where we're heading the next couple of years so this project started 15 years ago we received a very generous funding from private nonprofit or Research Foundation the Valen by foundation and that has kept us alive for these 15 years and we had the gold then to have a first draft of a human protein a-plus in 2015 and we fulfilled that code and I'll come back to that the project is a joint effort from the Royal Institute of Technology in Uppsala University and is a head of the whole project is and directors professor Matthias Elia Rosa a very old friend of mine and I'm heading the up solar efforts of the project so our vision then and this is timely this was if you think back this was started to be planned on during 2002 and if you remember 2001 the the human genetic code was was published in science and nature by the you you PO not you Pope you go initiative and by craig Venter and of course having all the the blueprint having all the AC T's and G's a very logical next step would be to try to add an information layer of what to then all the proteins do that the our genes encode for so that was our kind of vision and the goals then came down to let's try to make affinity probes antibodies let's use these antibodies to characterize the human proteome and then at last emerging after a couple of years was well if we have all the data and if we have the reagents let's try to put this into some clinical perspective and try to make some use into into discovery medicine and also trying to make some biomarkers and diagnostics future treatments etc so we set up a multidisciplinary team a kind of Ford factory like research project where we had we defined the different modules each module had its own monthly goals and delivery's to the next goal and so on and what we did we started with an upstream bioinformatics part where we then had the decode for all the protein coding genes we selected a code that was that we blasted against the I won't go into any details by the way I think you all know this and you've heard about this anyway this is where we started to make our recombinant proteins and the idea behind it all is that we blasted the different amino acids against all the rest of the proteome to get as unique sequences as possible to get as unique protein fragments as possible to get as unique antibodies as possible in the end outsourcing the the the antibody production and then we have the immunity technology and we ran everything on protein arrays and the all the antibodies that bound specifically to the right protein fragment they were then tested further in in immunity chemistry immunofluorescence and western blots and what was very nice about this whole project was that all the data that we produced was put out in the open space for the scientific community to use and that was a requirement from the Valen by foundation for the beginning and that has felt very good that there were no restrictions all data we produce out in the open space so what we do and what I'll focus on is then gene expression profiling and we m4 gene expression profiling we use an immuno for essence for looking at cells and organelles immune is the chemistry for looking at cells tissues organs that level and then we do RNA sequencing to get quantitative data for for our gene expression profiles well now briefly just give you the background for this I'm sure dr. Nnamani has told you all about this before but what we use them for protein profiling or then affinity purified antibodies against all the different unique proteins that the RRR genome encodes for and what we do then we look at how proteins are distributed in all our different organs and tissues and the way we can do this to get a comprehensive look at that without wasting too much tissue and too much reagents is that we use tissue microarrays and we have them focused on normal tissues cancer tissues and also cell lines and for normal tissues we have 46 different normal tissue types in triplicates from three different individuals we they make tissue microarrays by selecting representative pieces of tissues you look under the in the microscope you find represent representative areas drill out a core and then put it in a recipient block too to produce tissue microarrays and one of these can we can make about 300 350 sequential sections thus used for about 300 350 different antibodies and be able then to protein profile a large part of the human body by using tissue microarrays and this was also very timely because it was at the end of the 90s when when Olympic economy coined the term tissue microarrays and the first instruments were were made for this and this was also something that made this whole project possible was that we had the possibility to use tissue microarrays and I think this slide tells you everything about petition microwaves but handling 700 of these blocks for each antibody that just would would have been impossible while handling for blocks here is absolutely possible immunity chemistry is our basic method for when it comes to tissues when it comes to to getting protein expression profiles and as you know immunity chemistry is a great method when it comes to spatial data but it's it's a poor method it's not a method to get any quantitative data but there's nothing like immunity chemistry that can actually give you what structures what subtypes of cells do express a certain protein and it gives you a little bit feeling of quantity in the sense that if you have a complex tissue you have one population here that's strongly positive another one that's weakly positive at least you know that this population expresses a higher level of the protein them and the other one but it doesn't give you any quantification at all besides from that and of course to do this project we had also to transform the the glass slides into digital images and that was also at the time than when we started in 2003 a challenge absolutely to handle all the enormous amounts of image data and to store the date and to be able to pick up the data and so on and and of course the magic of the whole project at this time was not just putting out images in a big library but also making some data from those images and that's where our collaboration with India and with dr. Nirvana started we realized that you know the scientific community would not have been helped by by just having images stained with immunity chemistry and the people who can interpret immunity chemistry and evaluate tissues is is a cancer celery's as a normal cell is it strongly expressed here or weakly those are the pathologists and and meeting up with dr. Levani and his team of pathologists back in in in 2006 and we started and said the first first site was set up at the Indian Cancer Society in 2007 by the all these talented pathologists who started looking at images and we have to solve all the internet IT structure challenges and so on but everything worked out very well so we continued to collaborate and we were down here many from my team were here for months and worked together with with our Indian colleagues and and we changed the site to another venue and and we've had just great collaborations with with with India Indian pathologists in this project and they have produced all the data which I'll show you on the next slide and I've summarized that as being ten certified pathologists sitting looking at these images evaluating them putting out annotations is it weakly expressed as a strongly expressed incident 25 percent of the cell population or more and you can see here this is not the full figure but it goes to beginning of 2012 you can see that they then go through two million images per year which i think is extremely impressive and all together over 12 million images have been annotated by Indian pathologists but not only the workflow and the volume is impressive it's also been an impressive time too for the research collaborations and I just did this this morning checked out our me and dr. Bonnie's we're we're co-authors on those papers and they're highly cited papers in science and and many good journals so it's not only been production of data but it's also been a very fruitful scientific collaboration which I'm very grateful for so that's the protein part of the tissue atlas and also of the of the pathology atlas and I'll come back to the pathology atlas in a while what we realized a couple of years ago was that spatial data is great but it but unique quantification and and I know that all of you know this is you work with proteomics with which is a quantitative method to a large extent so what we did was we went back to the Uppsala biobank and looked for frozen tissue samples and and we we went through these by microscope to see that we had normal tissue we selected cases that were representative and where we had high quality RNA and we extracted RNA and that we did RNA sequencing to get then transcriptomics data from normal tissues and we had at least three different individuals for each tissue types and in the end we had or now we have 37 normal tissue types in over 200 individuals where we have all the transcriptomics data that has been empowered the human protein Atlas database so this now we started to learn a little bit more about the proteome and about the human proteome and how are our genes actually expressed on the protein level because and and I won't come back to this more specifically but it has been shown and this has been a debate and it depends a little bit on definitions but what about the correlation between RNA and protein and and I say that for almost all genes there's an extremely high correlation between RNA and protein and when I say that I mean across tissues or cell lines if you have a high level of RNA in one cell line or one tissue type and low level of RNA in another cell line of tissue type the protein levels will follow the RNA levels however for each gene there's a different RTP RNA to protein ratio and that can differ by many magnitudes but if you go across tissues the correlation is very high between RNA and protein and that means that you can use RNA quantitative RNA sequencing data as a proxy for protein levels so what we learned here was that about half of our our protein coding genes encode for proteins which are housekeeping proteins 44% are expressed in all tissues they through the proteins that you know build structure and cell division or all cellular integrity and everything then there's a mixed bag and then then we have these proteins which are the most interesting proteins the tissue types specific proteins the proteins are only expressed in one tissue or in very few tissues or much higher expressed in a certain tissue type than compared to other types these are the ones of course that that are responsible for the special functions of different tissues and these are the ones which will be interesting when it comes to diseases and disease biomarkers and about 9% at the time we couldn't find any any RNA in our 37 different tissues and these could of course be pseudogenes they could be genes that are permanently turned off after development or they could be genes that are in tissues that we didn't have like inner ear or olfactory plate or other more remote types of tissues with this data at hand we started them to define the different human subproblems the different organ proteomes and we put this out on the protein artists and this is a part of the protein Atlas where we built the knowledge based chapters and I'll show you just one example after this what was nice now was that we had the quantitative data from RNA sequencing and we could combine it them with our spatial data from our antibodies so we could look aware of the adipose tissue specific proteins how are they expressed what about the adrenal gland expressed in the adrenal medulla or are they in the cortex are they special subtypes of cells etc and of course the spatial information together with this quantitative information doesn't give you function per se but it gives you a very good hint of function where you see a protein expressed in a certain cell type and it's been in a certain organ and these are just examples of such cell type or tissue type specific proteins expressed in either here exocrine pancreas or endocrine pancreas etc so we spent a couple of years writing papers so if any of you are interested in any specific type of tissue or tissue proteome we have probably published a paper about it because we thought it was very interesting to go a little bit more into depth what is it what is what makes up the brain or what makes up the pancreas or whatever another way of also transacting through the the proteome is is to do it not by organ but but expression mode or and I talked about the tissue specific proteome of course there's a housekeeping proteome what about those proteins or the regulatory proteins what about all the transcription factors where are they expressed are there differences in different tissue types cell types etc secreto murmuring proteome extremely important for the communication between cells and also as biomarkers of course ISIF on proteome the very complex isoform proteome which kind of empowers the whole biology with a lot of complexity cancer proteome obvious and druggin proteome very interesting for for the drug industry of course and all these pages knowledge based paces they are then in place at in the in the protein a plus so you can go there and I'll show you one example from organ proteome in just a second so 2015 we said that okay now we have a first draft of the human proteome and we were very successful to publish a paper in science which has been very highly cited we had a poster in science and we rebuilt the whole protein applause web portal to then integrate the transcriptomics data and the proteomics data so today the human protein Atlas has three pillars it has the tissue at last normal tissue atlas which shows you in which organs and cell types our genes are expressed it has the cell atlas which shows you in what organelles are our proteins expressed in the cell and then we have the pathology at us which I'll come back to which shows you where where the how does gene expression correlate to survival for patients that have cancer and I'll show you a very short just a couple of slides from each of from the web portal and I'll start with the human human tissue Atlas and and here you can go into and look at these if you want to go through the organ proteins or the other sub proteomes and then you can just click on any of these tissue types say here I click on : that brings me to a couple of pages that summarizes the gene expression profile in colon and if say I'm interested in than these colon specific proteins I can then click on that and that brings me into the hit list of the of the the protein Aptos and here I get the hundred and sixty-five proteins which are specifically expressed in the in the colon I can choose one of these I can click on that oops and then I can click on that that yes and then I get to the summary page and in this case this is a gene called sappy tool encoding for protein that is more less specifically expressed in the colon in the epithelial cells of the colon and rectum it's also expressed in the brain we give our a little summary about the every gene all 20,000 genes and then the expression levels on the RNA level which is an fpkm and then on the protein level which is an how they are how the Indian pathologists have evaluated the expression level the protein expression levels and then one can look at the data in more detail the protein data is about bar diagrams or RNA sequence data but we also have imported for all genes the the data from the Broad Institute gtex project and also from weekend the phantom fire project so and as you can see there's a very good consistency from the different platforms and the different specimens that have been used and I think this gives a lot of validity to the expression data that we show on the protein up to us and then of course one can go and look at the primary data the protein data where we then have three individuals for each antibody and for this sappy - we had very many antibodies and then at the deepest level you can then go into the the high-resolution image and look for yourself where is a P - protein expressed well it's expressed in the nucleus of the annular cells in colon etc and just as a little parentheses since this was a very highly specific colon protein we thought maybe this could be a biomarker for colon cancer patients so we looked in colon cancer and you can see it's highly expressed in colon cancer on the protein level the only tissue that expresses that you can see high expression of zappy - was colon cancer so here we did and you can look at the high full-blown resolution also for cancers of course but here we then extended the study and did a clinical study including over 2,500 patients and actually could establish that this is a good cancer biomarker for for colorectal cancer in today's lecture you have learnt about HPA and found that human proteome Atlas could be divided into tissue Atlas cell Atlas and pathology Atlas dr. Ponton demonstrated expression level of different genes in 37 different types of tissue and how this information is important to understand diseases and identify candidate biomarkers he also talked to us about how the protein Atlas can provide you the status of RNA and protein expression in different cancer with patient follow-up data we highly recommend you to visit HP website and explore it further it will definitely be helpful resource for your own research in the next lecture dr. Ponton will talk about the shell atlas and pathology Atlas in more detail thank you [Music] [Applause] [Music] [Applause] [Applause] [Music]