Checking the data

We can have a look at the first ten lines of the quantification matrix with the raw gene counts:

head quantification/raw_counts.tsv
gene_id ENCFF233CZT     ENCFF285BTU     ENCFF360GXF     ENCFF370HFZ     ENCFF443WJB     ENCFF650KXK     ENCFF673HCO     ENCFF681YHC     ENCFF682AFV   ENCFF748SCJ     ENCFF838IYE     ENCFF904PCS
ENSG00000000003.14      14216.00        4473.00 4141.00 3684.00 4855.00 4284.00 5051.00 4243.00 4401.00 4594.00 4202.00 5224.00
ENSG00000000005.5       0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
ENSG00000000419.12      3287.00 953.00  863.00  836.00  1389.00 1227.00 1327.00 908.00  1011.00 981.00  1052.00 1079.00
ENSG00000000457.13      640.00  265.00  251.00  212.00  248.00  223.00  250.00  256.00  243.00  243.00  227.00  309.00
ENSG00000000460.16      3346.00 1029.00 966.00  649.00  1058.00 896.00  1164.00 899.00  883.00  791.00  924.00  1204.00
ENSG00000000938.12      0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
ENSG00000000971.15      1137.48 533.64  799.63  173.88  638.05  212.97  638.51  526.20  520.16  596.35  705.82  803.69
ENSG00000001036.13      5455.00 2354.00 2015.00 1949.00 2416.00 2466.00 2265.00 1959.00 1961.00 1898.00 2143.00 2446.00
ENSG00000001084.12      29869.00        8408.00 7970.00 4457.00 8834.00 6959.00 11237.00        6309.00 8513.00 9017.00 7171.00 10551.00

This table contains a quantification value across our samples for every gene available in our annotation. The first line is a header with an ID for every sample. To understand better our dataset, we can check which is the biological sample that an ID corresponds to. This information is included in the metadata file:

cat quantification/metadata.tsv
SampleID        ExperimentID    Library_Material        Biosample_Name  Treatment       Treatment_Amount        Treatment_Duration      Library_Size  Bioreplicate    Technical_Replicate     Assembly        Annotation
ENCFF233CZT     ENCSR632DQP     polyadenylated mRNA     A549    control 0nM     0hr     >200    4       4_1     GRCh38  V29
ENCFF285BTU     ENCSR632DQP     polyadenylated mRNA     A549    control 0nM     0hr     >200    1       1_1     GRCh38  V29
ENCFF904PCS     ENCSR632DQP     polyadenylated mRNA     A549    control 0nM     0hr     >200    2       2_1     GRCh38  V29
ENCFF360GXF     ENCSR632DQP     polyadenylated mRNA     A549    control 0nM     0hr     >200    3       3_1     GRCh38  V29
ENCFF673HCO     ENCSR924BHF     polyadenylated mRNA     A549    dexamethasone   100nM   2hr     >200    4       4_1     GRCh38  V29
ENCFF370HFZ     ENCSR924BHF     polyadenylated mRNA     A549    dexamethasone   100nM   2hr     >200    1       1_1     GRCh38  V29
ENCFF748SCJ     ENCSR924BHF     polyadenylated mRNA     A549    dexamethasone   100nM   2hr     >200    3       3_1     GRCh38  V29
ENCFF682AFV     ENCSR924BHF     polyadenylated mRNA     A549    dexamethasone   100nM   2hr     >200    2       2_1     GRCh38  V29
ENCFF838IYE     ENCSR326PTW     polyadenylated mRNA     A549    dexamethasone   100nM   4hr     >200    3       3_1     GRCh38  V29
ENCFF650KXK     ENCSR326PTW     polyadenylated mRNA     A549    dexamethasone   100nM   4hr     >200    1       1_1     GRCh38  V29
ENCFF681YHC     ENCSR326PTW     polyadenylated mRNA     A549    dexamethasone   100nM   4hr     >200    2       2_1     GRCh38  V29
ENCFF443WJB     ENCSR326PTW     polyadenylated mRNA     A549    dexamethasone   100nM   4hr     >200    4       4_1     GRCh38  V29

If you want to check tsv files in a more clear way, you may use the following command

column -ts $'\t' quantification/metadata.tsv | less -S