8.2 Normalization of the read counts

Factors to consider for normalization:

  • Library size
  • Gene length
  • transcriptome size
  • GC content

CPM: counts per million RPKM/FPKM: reads/fragments per kilobase of a million (first library size then gene lengths) TPM: transcripts per million ()

The above mentioned techniques don’t account for library composition (For instance, when comparing transcriptomes of different tissues, there can be sets of genes in one tissue that consume a big chunk of the reads, while in the other tissues they are not expressed at all)

–> DEseq2 & edgeR use different algorithms to tackle this.