Machine Learning in Computational Drug Discovery 💊

Introduction

Presenter: Head of the Center of Data Mining and Biomedical Informatics
Background: 127 research articles, 17 review articles, 5 book chapters
Roles: Associate professor, YouTuber, blogger
- YouTube Channels: Data Professor, Coding Professor
- Blog: Medium (Data Science and Bioinformatics)
- Focus: Simplifying Data Science and Bioinformatics for beginners

Awareness Issue: Lack of familiarity with Scopus, LCBC databases
Technical Jargon: Papers often use complex terms, making practical application difficult
- Example: Random forest for bioactivity prediction goes unnoticed by practitioners
Outreach: Blogs and YouTube reach wider audience more effectively than journals
- Blog example: Article on mastering scikit-learn received 10,000 views in a month versus journal articles with few hundred reads per year
Impact: Need for research to reach general public and practitioners for practical application

Definition of Disease: Illness caused by malfunction or infection affecting health
Role of Drugs: Biological or chemical entities (proteins, peptides, small molecules)
- Types: Biological (e.g., antibodies) vs. Chemical (small molecules/compounds)
Drug Discovery Process: Interaction between target protein and drug
- Goal: Inhibit or activate protein function, typically to inhibit for disease treatment

Bioinformatics: Studies protein-protein interactions and biological networks
- Node Analysis: Key proteins marked by connections
- Pathway Analysis: Understanding side-effects and rate-limiting steps
- Example: Aromatase inhibition to reduce estrogen levels in breast cancer
Drug Networks: Analyze how drug interactions affect target and off-target proteins
- Off-Target Binding: Can lead to side-effects or serendipitous therapeutic benefits
- Polypharmacology: Drugs may affect multiple proteins causing various effects

Phases: Takes 10-15 years, high failure rate (~90%), and costly (~$2 billion)
Steps: Target Discovery, Screening, Lead Optimization
- High Throughput Screening: Identify potential hits
- Hit to Lead: Optimize hits by modifying functional groups
Example: Lead optimization through structure modification (analog generation)
Key Considerations: Ki and IC50 values for determining compound effectiveness
Databases: Example - Shambo, PubChem for bioactivity data

Tools and Databases: High Throughput Screening, Virtual Screening, Molecular Docking
Molecular Descriptor Calculation: Assess physical and chemical properties
Machine Learning: Build prediction models (QSAR, PCM)
- Conceptual Workflow: From data collection, descriptor calculation, model building, to evaluation
Bayesian Models: Focused on predicting bioactivity and toxicity from molecular descriptors

Nature-Inspired Drug Design: Utilizing natural compounds as initial hits
Compound Enumeration: Generating new compounds by modifying functional groups
Chemical Space: Represents all possible compounds (e.g., 166 billion hypothetical compounds)
Lipinski Rule of Five: Criteria for assessing drug-likeness
Lead-like Rule of Three: Criteria for identifying promising lead compounds

QSAR: Correlates molecular features with biological activity
- Workflow: Selecting activity, generating descriptors, applying ML models, performance evaluation
PCM: Expansion of QSAR to multiple proteins
- Applications: Drug repositioning, off-target effect analysis

Prediction Models: Assess structure-activity relationships, predict adverse effects
Practical Tools: Software (e.g., PDB, PyMOL, scikit-learn)
Open Resources: Utilization of free tools like Google Colab, R, Python for computational tasks

Data Integration: Combining various omics data for enriched model development
Educational Resources: Accessibility via blogs, YouTube channels
Impact of Research: Reaching wider audiences through open access platforms and simplified learning aids

Q&A Highlights:

Scientists from both academia and pharmaceutical companies contribute to drug discovery, with patents playing a critical role in commercial viability.
Computational approaches enable drug repurposing by leveraging existing data for new therapeutic uses.
Building predictative models demands a mix of domain knowledge and computational skills, making step-by-step learning essential.