📊

Understanding RPKM, FPKM, and TPM in RNA-seq

Aug 20, 2024

StatQuest: RPKM vs. FPKM vs. TPM

Welcome to StatQuest—presented by the Genetics Department at UNC Chapel Hill!

Overview of Topics

Focus: High-throughput sequencing of RNA
Comparison of three metrics: RPKM, FPKM, and TPM

Definitions

RPKM: Reads Per Kilobase Million
- Normalizes for sequencing depth and gene length.
FPKM: Fragments Per Kilobase Million
- Similar to RPKM but for paired-end RNA-seq.
TPM: Transcripts Per Million
- A new metric that addresses some limitations of RPKM and FPKM.

Why Normalize?

Sequencing Depth: More reads = higher depth; we want consistent analysis.
Gene Length: Longer genes tend to accumulate more reads.

Example Dataset

Consider an RNA-seq dataset with 3 replicates (REP1, REP2, REP3) and 4 genes (A, B, C, D).
Notable observation: REP3 has significantly more reads than the others.
Gene B has double the length of Gene A, impacting read counts.

Normalizing with RPKM

Step 1: Normalize for Read Depth
- Calculate total reads per replicate; scale counts by 10 (or 1 million in practice).
Step 2: Normalize for Gene Length
- Convert to RPKM by dividing by gene lengths.

RPKM Summary

RPKM values are adjusted for sequencing depth and gene length.

RPKM vs. FPKM

RPKM is for single-end RNA-seq, while FPKM is for paired-end.
Importance: FPKM accounts for fragments, ensuring no double-counting of reads from the same fragment.

Normalizing with TPM

Step 1: Normalize for Gene Length (gives RPK)
Step 2: Normalize for Sequencing Depth
- Calculate total of RPKs, create scaling factors (usually per million).
- Final step: Divide read counts by these factors to get TPM.

Key Differences in Results

Both RPKM and TPM normalize biases, but their total normalized reads differ.
RPKM: Different totals for each replicate.
TPM: Consistent totals across replicates.

Visualizing Differences with Pies

TPM: Standardized "pie" sizes allow easy comparison of proportions among genes.
- Example: Gene A proportions in different replicates can easily be compared.
RPKM: Each "pie" varies in size, complicating proportion comparison.

Conclusion

Why Use TPM?
- Clearer insights into relative gene expression and proportions.
- More suitable for RNA-seq analysis focusing on comparisons.

Thank you for tuning in to StatQuest! Stay tuned for the next episode.

Full transcript