Lecture on Deep Learning and Nanobody Optimization via Machine Learning 🧬

Introduction and Speaker Background

Speaker: Anirud Venkatramin, Senior at Homestead High School.
Interest Inception: Started in sophomore year with self-studying deep learning applications, particularly in drug discovery.
Current Work: Collaborating with the UCLA Molecular Screening Shared Resource, focusing on nanobody optimization.

Host Organization: San Francisco Bay Area ACM, established in 1957.
Mission: Promote knowledge of modern computing and create a community for support, networking, and hiring.
Upcoming Events: Workshops and talks on topics such as computer vision, VR training, and securing data.

Definition: Antibody fragments that imitate human immune responses, typically produced by camels and sharks.
Advantages: Rarely have toxic side effects, can cross the blood-brain barrier, promising in treating diseases like Alzheimer’s, Parkinson’s, and Huntington’s.
Challenge: Identifying optimal nanobodies, traditionally a time-consuming lab process.
Solution: Computational predictions and optimization pipeline using machine learning.

Graph Neural Networks (GNN): Employed for sequence-structure co-design, ensured the generation of biologically plausible CDR3 sequences.
Neural Network Architectures: Included artificial neural networks, convolutional neural networks, and LSTM RNNs.
Stacked Ensemble Model: Combined predictions from multiple neural network models to improve binding affinity predictions.

Training Data: 60,000 CDR3 sequences and their binding affinities used for model training.
One-Hot Encoding: CDR3 sequences represented in a uniform 20x20 matrix format suitable for neural network input.

Generational Approach: Generated new CDR3 sequences, filtered by binding affinity, and used high-affinity sequences to re-parameterize the model iteratively.
Ensemble Algorithm (Ensemble Stack): Predicted binding affinities of generated sequences and refined the generational model.
Physical and Binding Affinity Prediction: Used dihedral angles, distances, and orientation features to predict structural and binding properties.

Metrics: Pearson correlation of 0.88, R² of 0.78, indicating high predictive accuracy.
Comparison: Outperformed previous non-stacked models in prediction accuracy for binding affinities.

Novelty Measurement: Higher Levenshtein distance from training data (average of 13), indicating generation of novel sequences.
Biological Testing: Synthesized a high-affinity nanobody, confirmed binding despite not surpassing specially designed controls.

Focus Areas: Enhancing physical modeling of protein structures for better binding affinity predictions.
Data Augmentation: Addressing false negatives through more comprehensive data augmentation and acquiring additional data.

Impact: Developed a comprehensive pipeline for nanobody generation and optimization, achieving high accuracy and novelty in predictions.
Acknowledgments: Recognized the support from ACM and the UCLA Molecular Screening Shared Resource, and highlighted the importance of community and collaborative efforts in advancing research.

List of literature and tools referenced during the research and presentation.

GitHub Repository: Provided link to the project’s GitHub repository for further exploration and replication.

Training and Testing Splits: Followed pre-defined splits in literature for consistent comparison.
Metrics of Success: Discussed methodologies for evaluating binding affinity predictions and biological validation.
Audience Interaction: Addressed questions on methodology intricacies and future research directions.