Machine Learning in Computational Drug Discovery

Jul 1, 2024

Machine Learning in Computational Drug Discovery 💊

Introduction

  • Presenter: Head of the Center of Data Mining and Biomedical Informatics
  • Background: 127 research articles, 17 review articles, 5 book chapters
  • Roles: Associate professor, YouTuber, blogger
    • YouTube Channels: Data Professor, Coding Professor
    • Blog: Medium (Data Science and Bioinformatics)
    • Focus: Simplifying Data Science and Bioinformatics for beginners

Challenges in Bioinformatics and Data Science

  • Awareness Issue: Lack of familiarity with Scopus, LCBC databases
  • Technical Jargon: Papers often use complex terms, making practical application difficult
    • Example: Random forest for bioactivity prediction goes unnoticed by practitioners
  • Outreach: Blogs and YouTube reach wider audience more effectively than journals
    • Blog example: Article on mastering scikit-learn received 10,000 views in a month versus journal articles with few hundred reads per year
  • Impact: Need for research to reach general public and practitioners for practical application

Disease and Drug Discovery Basics

  • Definition of Disease: Illness caused by malfunction or infection affecting health
  • Role of Drugs: Biological or chemical entities (proteins, peptides, small molecules)
    • Types: Biological (e.g., antibodies) vs. Chemical (small molecules/compounds)
  • Drug Discovery Process: Interaction between target protein and drug
    • Goal: Inhibit or activate protein function, typically to inhibit for disease treatment

Drug Target Networks

  • Bioinformatics: Studies protein-protein interactions and biological networks
    • Node Analysis: Key proteins marked by connections
    • Pathway Analysis: Understanding side-effects and rate-limiting steps
    • Example: Aromatase inhibition to reduce estrogen levels in breast cancer
  • Drug Networks: Analyze how drug interactions affect target and off-target proteins
    • Off-Target Binding: Can lead to side-effects or serendipitous therapeutic benefits
    • Polypharmacology: Drugs may affect multiple proteins causing various effects

Drug Discovery Process

  • Phases: Takes 10-15 years, high failure rate (~90%), and costly (~$2 billion)
  • Steps: Target Discovery, Screening, Lead Optimization
    • High Throughput Screening: Identify potential hits
    • Hit to Lead: Optimize hits by modifying functional groups
  • Example: Lead optimization through structure modification (analog generation)
  • Key Considerations: Ki and IC50 values for determining compound effectiveness
  • Databases: Example - Shambo, PubChem for bioactivity data

Computational Approaches

  • Tools and Databases: High Throughput Screening, Virtual Screening, Molecular Docking
  • Molecular Descriptor Calculation: Assess physical and chemical properties
  • Machine Learning: Build prediction models (QSAR, PCM)
    • Conceptual Workflow: From data collection, descriptor calculation, model building, to evaluation
  • Bayesian Models: Focused on predicting bioactivity and toxicity from molecular descriptors

Approaches in Chemical Space

  • Nature-Inspired Drug Design: Utilizing natural compounds as initial hits
  • Compound Enumeration: Generating new compounds by modifying functional groups
  • Chemical Space: Represents all possible compounds (e.g., 166 billion hypothetical compounds)
  • Lipinski Rule of Five: Criteria for assessing drug-likeness
  • Lead-like Rule of Three: Criteria for identifying promising lead compounds

Key Considerations in QSAR and PCM

  • QSAR: Correlates molecular features with biological activity
    • Workflow: Selecting activity, generating descriptors, applying ML models, performance evaluation
  • PCM: Expansion of QSAR to multiple proteins
    • Applications: Drug repositioning, off-target effect analysis

Utilization of Computational Drug Discovery

  • Prediction Models: Assess structure-activity relationships, predict adverse effects
  • Practical Tools: Software (e.g., PDB, PyMOL, scikit-learn)
  • Open Resources: Utilization of free tools like Google Colab, R, Python for computational tasks

Summary and Conclusion

  • Data Integration: Combining various omics data for enriched model development
  • Educational Resources: Accessibility via blogs, YouTube channels
  • Impact of Research: Reaching wider audiences through open access platforms and simplified learning aids

Q&A Highlights:

  • Scientists from both academia and pharmaceutical companies contribute to drug discovery, with patents playing a critical role in commercial viability.
  • Computational approaches enable drug repurposing by leveraging existing data for new therapeutic uses.
  • Building predictative models demands a mix of domain knowledge and computational skills, making step-by-step learning essential.