Insights into the Human Genome Project

Sep 9, 2024

Lecture Notes on the Human Genome Project

Overview of the Human Genome Project

  • Completion Date: April 2003
  • Duration: 13 years
  • Objective: To sequence and map all genes of Homo sapiens
  • Initiated in 1990 under James Watson at U.S. National Institutes of Health
    • Watson resigned in 1992 due to disagreements over patenting genes
    • Francis Collins took over in 1993
  • Milestones:
    • Working draft released in 2000
    • Complete genome released in 2003
    • Last chromosome sequence published in 2006 in Nature

Human Genome Sequencing

  • Representative Sequence: Composite from several anonymous donors
  • Unfinished Regions: Approximately 1% of the genome remains unfinished, mainly centromeres and telomeres

Sequencing Methods

  • Hierarchical Shotgun Sequencing

    • Used by publicly funded HGP
    • DNA cut into pieces, inserted into bacterial artificial chromosome (BAC) vectors
    • Constructs a "Golden Tiling Path"
    • Advantage: Reduced error in assembly
    • Disadvantage: Time-consuming and expensive
  • Whole Genome Shotgun Sequencing

    • Used by Celera Genomics
    • Random shearing of DNA into small pieces
    • Advantage: Faster and cheaper
    • Disadvantage: More prone to assembly errors
  • Hybrid Approach

    • Combined strengths of both methods
    • Both HGP and Celera achieved goals more efficiently

Details of the Human Genome

  • Total Size: 3.2 gigabases

  • Coding vs. Non-Coding DNA:

    • Non-coding: Majority; includes introns, intergenic regions, and repetitive sequences
    • Coding: 1.25% codes for proteins
  • Genes: Around 25,000 genes

    • Comparison: Nematode worm (18,000 genes), Mouse (similar number of genes)
    • Biological complexity not directly related to genome size
  • Repetitive Sequences: 45% transposons, 3% microsatellite repeats, 5% large segment duplications

Biological Implications

  • Gene Function and Complexity
    • Alternative splicing and post-translational modifications contribute to diversity
    • Protein-protein interactions are crucial for cellular processes

Challenges and Future Directions

  • Unsequenced Areas: 1% heterochromatin region
  • Regulatory Signals: Epigenetic modifications still unclear
  • Single Nucleotide Polymorphisms (SNPs): Over 1.4 million identified
  • Gene Function Exploration: Ongoing efforts to understand gene products

Impact on Medicine

  • Genetic Research: Facilitates discovery of genetic components of complex disorders like diabetes, asthma, cancer
  • Gene Expression Techniques: Enable investigation of gene expression related to disease and drug responses

These notes provide a comprehensive overview of the Human Genome Project, its methods, challenges, and implications for biology and medicine.