Lecture Notes on RDKit

Jul 30, 2024

Introduction to RDKit

Overview of RDKit

  • Python library useful for chemical data manipulation.
  • Installation in Google Colab: Not pre-installed; use specific installation commands.
    • Follow installation guide linked in video description.

Getting Started

  • Make a copy of the provided notebook for your own use.
  • Rename the notebook (e.g., rdkit intro).
  • Run the installation; this takes 2-3 minutes.
    • Patience is key as installation completes.

Key Features of RDKit

  • RDKit can perform various tasks with molecular data:

SMILES Conversion

  • Converts SMILES strings (text-based representation of molecules) into molecular structures and vice versa.
  • Example SMILES:
    • Methane: C
    • Ethane: CC
    • Propane: CCC
    • Normal Butane: CCCC
  • Use cheminfo.org to draw molecules and generate SMILES strings interactively.

Molecular Representation

  • Convert SMILES to molecule object using MolFromSmiles
  • Get SMILES from molecule object using MolToSmiles

Calculating Properties

  • Compute molecular attributes such as molecular weight, e.g., for normal butane.

Substructure Searches

  • RDKit allows for searching molecules for specific patterns:
    • Example using amino acids:
      • Glycine, Phenylalanine, Histidine, Cysteine (search using their SMILES strings).
  • Displaying molecules:
    • Create a list of molecule objects and display them visually.

Substructure Search Examples

  • Searching for specific substructures, like sulfur and carboxyl groups:
    • Example code to identify which molecules contain specific elements or groups in their structure.
  • Substructure Patterns:
    • Use SMILES notation to define the search pattern.
    • Example: Search using sulfur’s SMILES string to find matching molecules.
  • General substructure search using SMARTS:
    • SMARTS is a more flexible version of SMILES.
    • Example: Identifying the presence of rings in the structure, where not all SMILES can easily represent this.

Conclusion

  • RDKit is a powerful tool for chemists and researchers dealing with molecular data.
  • Explore more by looking for resources on getting started with RDKit.