Raw Data Processing

Bulk RNA-seq data processing includes usually the following steps:

  • mapping reads onto a genome
  • gene and/or transcript quantification
  • expression matrix creation

In this course we will focus on differential gene expression, and interpretation of the results. Hence, raw data processing has already been performed. STAR was used for mapping, and RSEM for gene quantification.

In general, running each processing step manually on the command line is not a good practice even for small projects as it can lead to errors and irreproducibility in the results. The best approach is to process the data in an automated fashion, creating a pipeline and executing it via a workflow management system.

An example RNA-seq data processing pipeline was made as a showcase for this course. It uses Nextflow as the workflow manager, and implements the STAR-RSEM pipeline.

Check the pipeline on Github