Understanding RPKM, FPKM, and TPM in RNA-seq

Aug 20, 2024

StatQuest: RPKM vs. FPKM vs. TPM

Welcome to StatQuest—presented by the Genetics Department at UNC Chapel Hill!

Overview of Topics

  • Focus: High-throughput sequencing of RNA
  • Comparison of three metrics: RPKM, FPKM, and TPM

Definitions

  1. RPKM: Reads Per Kilobase Million
    • Normalizes for sequencing depth and gene length.
  2. FPKM: Fragments Per Kilobase Million
    • Similar to RPKM but for paired-end RNA-seq.
  3. TPM: Transcripts Per Million
    • A new metric that addresses some limitations of RPKM and FPKM.

Why Normalize?

  • Sequencing Depth: More reads = higher depth; we want consistent analysis.
  • Gene Length: Longer genes tend to accumulate more reads.

Example Dataset

  • Consider an RNA-seq dataset with 3 replicates (REP1, REP2, REP3) and 4 genes (A, B, C, D).
  • Notable observation: REP3 has significantly more reads than the others.
  • Gene B has double the length of Gene A, impacting read counts.

Normalizing with RPKM

  1. Step 1: Normalize for Read Depth
    • Calculate total reads per replicate; scale counts by 10 (or 1 million in practice).
  2. Step 2: Normalize for Gene Length
    • Convert to RPKM by dividing by gene lengths.

RPKM Summary

  • RPKM values are adjusted for sequencing depth and gene length.

RPKM vs. FPKM

  • RPKM is for single-end RNA-seq, while FPKM is for paired-end.
  • Importance: FPKM accounts for fragments, ensuring no double-counting of reads from the same fragment.

Normalizing with TPM

  1. Step 1: Normalize for Gene Length (gives RPK)
  2. Step 2: Normalize for Sequencing Depth
    • Calculate total of RPKs, create scaling factors (usually per million).
    • Final step: Divide read counts by these factors to get TPM.

Key Differences in Results

  • Both RPKM and TPM normalize biases, but their total normalized reads differ.
  • RPKM: Different totals for each replicate.
  • TPM: Consistent totals across replicates.

Visualizing Differences with Pies

  • TPM: Standardized "pie" sizes allow easy comparison of proportions among genes.
    • Example: Gene A proportions in different replicates can easily be compared.
  • RPKM: Each "pie" varies in size, complicating proportion comparison.

Conclusion

  • Why Use TPM?
    • Clearer insights into relative gene expression and proportions.
    • More suitable for RNA-seq analysis focusing on comparisons.

Thank you for tuning in to StatQuest! Stay tuned for the next episode.