Lecture on Deep Learning and Nanobody Optimization via Machine Learning 🧬
Introduction and Speaker Background
- Speaker: Anirud Venkatramin, Senior at Homestead High School.
- Interest Inception: Started in sophomore year with self-studying deep learning applications, particularly in drug discovery.
- Current Work: Collaborating with the UCLA Molecular Screening Shared Resource, focusing on nanobody optimization.
The ACM and Event Details
- Host Organization: San Francisco Bay Area ACM, established in 1957.
- Mission: Promote knowledge of modern computing and create a community for support, networking, and hiring.
- Upcoming Events: Workshops and talks on topics such as computer vision, VR training, and securing data.
Overview of Nanobodies
- Definition: Antibody fragments that imitate human immune responses, typically produced by camels and sharks.
- Advantages: Rarely have toxic side effects, can cross the blood-brain barrier, promising in treating diseases like Alzheimer’s, Parkinson’s, and Huntington’s.
- Challenge: Identifying optimal nanobodies, traditionally a time-consuming lab process.
- Solution: Computational predictions and optimization pipeline using machine learning.
Methodology
Algorithms and Models Used
- Graph Neural Networks (GNN): Employed for sequence-structure co-design, ensured the generation of biologically plausible CDR3 sequences.
- Neural Network Architectures: Included artificial neural networks, convolutional neural networks, and LSTM RNNs.
- Stacked Ensemble Model: Combined predictions from multiple neural network models to improve binding affinity predictions.
Data Preparation and Encoding
- Training Data: 60,000 CDR3 sequences and their binding affinities used for model training.
- One-Hot Encoding: CDR3 sequences represented in a uniform 20x20 matrix format suitable for neural network input.
Optimization and Pipeline
- Generational Approach: Generated new CDR3 sequences, filtered by binding affinity, and used high-affinity sequences to re-parameterize the model iteratively.
- Ensemble Algorithm (Ensemble Stack): Predicted binding affinities of generated sequences and refined the generational model.
- Physical and Binding Affinity Prediction: Used dihedral angles, distances, and orientation features to predict structural and binding properties.
Results and Evaluation
Predictive Performance
- Metrics: Pearson correlation of 0.88, R² of 0.78, indicating high predictive accuracy.
- Comparison: Outperformed previous non-stacked models in prediction accuracy for binding affinities.
Novelty and Effectiveness
- Novelty Measurement: Higher Levenshtein distance from training data (average of 13), indicating generation of novel sequences.
- Biological Testing: Synthesized a high-affinity nanobody, confirmed binding despite not surpassing specially designed controls.
Future Directions and Improvements
- Focus Areas: Enhancing physical modeling of protein structures for better binding affinity predictions.
- Data Augmentation: Addressing false negatives through more comprehensive data augmentation and acquiring additional data.
Conclusion
- Impact: Developed a comprehensive pipeline for nanobody generation and optimization, achieving high accuracy and novelty in predictions.
- Acknowledgments: Recognized the support from ACM and the UCLA Molecular Screening Shared Resource, and highlighted the importance of community and collaborative efforts in advancing research.
References
List of literature and tools referenced during the research and presentation.
GitHub Repository: Provided link to the project’s GitHub repository for further exploration and replication.
Q&A Highlights
- Training and Testing Splits: Followed pre-defined splits in literature for consistent comparison.
- Metrics of Success: Discussed methodologies for evaluating binding affinity predictions and biological validation.
- Audience Interaction: Addressed questions on methodology intricacies and future research directions.