Advances in AI and Protein Synthesis

Jun 26, 2024

Lecture Notes: Advances in AI and Protein Synthesis

1. ESM3: Simulating 500 Million Years of Evolution

Overview

  • ESM3, a Frontier Language Model, can simulate 500 million years of evolution.
  • Similar to GPT models but focuses on programming with the code of life.
  • Potentially revolutionary for biology, drug discovery, and protein engineering.

Background

  • Proteins are vital for life, acting as molecular engines, sensors, and processing systems.
  • Understanding protein codes could make biology programmable, making trial and error obsolete.

Key Concept: Green Fluorescent Protein (GFP)

  • GFP is used in research for DNA editing due to its visibility (it glows).
  • ESM GFP is the new protein created by ESM3, only 58% similar to the closest known fluorescent protein.
  • Equivalent evolution: 500 million years.

Methodology

  • Tokenization of biological properties: sequence, structure, and function.
  • Sequence: Arrangement of amino acids.
  • Structure: 3D configuration (breakthrough by DeepMind's AlphaFold).
  • Function: What proteins do based on their sequence and structure.

Capabilities of ESM3

  • Bridges sequence, structure, and function into a cohesive model.
  • Generates new proteins upon prompts, with control over 3 modalities.
  • Capable of reasoning and improving generation quality through feedback.

Illustration and Experiment

  • Generated a protein sequence called ESM GFP, which glows but was less bright initially.
  • Chain of Thought method: Multiple generations were created and tested, leading to further optimized proteins.
  • Resulted in proteins with comparable brightness to natural GFPs.

Model Details

  • Open model with weights and codes released on GitHub for research and nonprofit use.
  • Potential for wide-reaching impacts including new medicines and environmental solutions.

2. Etched: Fastest AI Chip

Overview

  • Claims the fastest AI chip, outpacing Nvidia and others.
  • Processes over 500,000 tokens per second.
  • Equivalent to processing the entire Harry Potter series in under 3 seconds.

Key Figures

  • Notable backers: Teal Balaji, Stanley Dren Miller, Brian Johnson.

Technical Details

  • ASIC for Transformers: Specialized in running inference on Transformer models.
  • Outperforms Nvidia's GPUs (e.g., H100s) by a significant margin.
  • Comparable to 160 H100s in performance.

Implications

  • Faster, more efficient processing could revolutionize AI applications and products.
  • Competing technologies emerging, promising further advancements and competition with Nvidia.