Checking the data
We can have a look at the first ten lines of the quantification matrix with the raw gene counts:
head quantification/raw_counts.tsv
gene_id ENCFF233CZT ENCFF285BTU ENCFF360GXF ENCFF370HFZ ENCFF443WJB ENCFF650KXK ENCFF673HCO ENCFF681YHC ENCFF682AFV ENCFF748SCJ ENCFF838IYE ENCFF904PCS
ENSG00000000003.14 14216.00 4473.00 4141.00 3684.00 4855.00 4284.00 5051.00 4243.00 4401.00 4594.00 4202.00 5224.00
ENSG00000000005.5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
ENSG00000000419.12 3287.00 953.00 863.00 836.00 1389.00 1227.00 1327.00 908.00 1011.00 981.00 1052.00 1079.00
ENSG00000000457.13 640.00 265.00 251.00 212.00 248.00 223.00 250.00 256.00 243.00 243.00 227.00 309.00
ENSG00000000460.16 3346.00 1029.00 966.00 649.00 1058.00 896.00 1164.00 899.00 883.00 791.00 924.00 1204.00
ENSG00000000938.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
ENSG00000000971.15 1137.48 533.64 799.63 173.88 638.05 212.97 638.51 526.20 520.16 596.35 705.82 803.69
ENSG00000001036.13 5455.00 2354.00 2015.00 1949.00 2416.00 2466.00 2265.00 1959.00 1961.00 1898.00 2143.00 2446.00
ENSG00000001084.12 29869.00 8408.00 7970.00 4457.00 8834.00 6959.00 11237.00 6309.00 8513.00 9017.00 7171.00 10551.00
This table contains a quantification value across our samples for every gene available in our annotation. The first line is a header with an ID for every sample. To understand better our dataset, we can check which is the biological sample that an ID corresponds to. This information is included in the metadata file:
cat quantification/metadata.tsv
SampleID ExperimentID Library_Material Biosample_Name Treatment Treatment_Amount Treatment_Duration Library_Size Bioreplicate Technical_Replicate Assembly Annotation
ENCFF233CZT ENCSR632DQP polyadenylated mRNA A549 control 0nM 0hr >200 4 4_1 GRCh38 V29
ENCFF285BTU ENCSR632DQP polyadenylated mRNA A549 control 0nM 0hr >200 1 1_1 GRCh38 V29
ENCFF904PCS ENCSR632DQP polyadenylated mRNA A549 control 0nM 0hr >200 2 2_1 GRCh38 V29
ENCFF360GXF ENCSR632DQP polyadenylated mRNA A549 control 0nM 0hr >200 3 3_1 GRCh38 V29
ENCFF673HCO ENCSR924BHF polyadenylated mRNA A549 dexamethasone 100nM 2hr >200 4 4_1 GRCh38 V29
ENCFF370HFZ ENCSR924BHF polyadenylated mRNA A549 dexamethasone 100nM 2hr >200 1 1_1 GRCh38 V29
ENCFF748SCJ ENCSR924BHF polyadenylated mRNA A549 dexamethasone 100nM 2hr >200 3 3_1 GRCh38 V29
ENCFF682AFV ENCSR924BHF polyadenylated mRNA A549 dexamethasone 100nM 2hr >200 2 2_1 GRCh38 V29
ENCFF838IYE ENCSR326PTW polyadenylated mRNA A549 dexamethasone 100nM 4hr >200 3 3_1 GRCh38 V29
ENCFF650KXK ENCSR326PTW polyadenylated mRNA A549 dexamethasone 100nM 4hr >200 1 1_1 GRCh38 V29
ENCFF681YHC ENCSR326PTW polyadenylated mRNA A549 dexamethasone 100nM 4hr >200 2 2_1 GRCh38 V29
ENCFF443WJB ENCSR326PTW polyadenylated mRNA A549 dexamethasone 100nM 4hr >200 4 4_1 GRCh38 V29
If you want to check tsv files in a more clear way, you may use the following command
column -ts $'\t' quantification/metadata.tsv | less -S