StatQuest: RPKM vs. FPKM vs. TPM
Welcome to StatQuest—presented by the Genetics Department at UNC Chapel Hill!
Overview of Topics
- Focus: High-throughput sequencing of RNA
- Comparison of three metrics: RPKM, FPKM, and TPM
Definitions
- RPKM: Reads Per Kilobase Million
- Normalizes for sequencing depth and gene length.
- FPKM: Fragments Per Kilobase Million
- Similar to RPKM but for paired-end RNA-seq.
- TPM: Transcripts Per Million
- A new metric that addresses some limitations of RPKM and FPKM.
Why Normalize?
- Sequencing Depth: More reads = higher depth; we want consistent analysis.
- Gene Length: Longer genes tend to accumulate more reads.
Example Dataset
- Consider an RNA-seq dataset with 3 replicates (REP1, REP2, REP3) and 4 genes (A, B, C, D).
- Notable observation: REP3 has significantly more reads than the others.
- Gene B has double the length of Gene A, impacting read counts.
Normalizing with RPKM
- Step 1: Normalize for Read Depth
- Calculate total reads per replicate; scale counts by 10 (or 1 million in practice).
- Step 2: Normalize for Gene Length
- Convert to RPKM by dividing by gene lengths.
RPKM Summary
- RPKM values are adjusted for sequencing depth and gene length.
RPKM vs. FPKM
- RPKM is for single-end RNA-seq, while FPKM is for paired-end.
- Importance: FPKM accounts for fragments, ensuring no double-counting of reads from the same fragment.
Normalizing with TPM
- Step 1: Normalize for Gene Length (gives RPK)
- Step 2: Normalize for Sequencing Depth
- Calculate total of RPKs, create scaling factors (usually per million).
- Final step: Divide read counts by these factors to get TPM.
Key Differences in Results
- Both RPKM and TPM normalize biases, but their total normalized reads differ.
- RPKM: Different totals for each replicate.
- TPM: Consistent totals across replicates.
Visualizing Differences with Pies
- TPM: Standardized "pie" sizes allow easy comparison of proportions among genes.
- Example: Gene A proportions in different replicates can easily be compared.
- RPKM: Each "pie" varies in size, complicating proportion comparison.
Conclusion
- Why Use TPM?
- Clearer insights into relative gene expression and proportions.
- More suitable for RNA-seq analysis focusing on comparisons.
Thank you for tuning in to StatQuest! Stay tuned for the next episode.