RNA expression visualization in ENCODE

Input

Gene expression matrix

JSON file generated by CRG:

        {
      "ensembl_id" : "ENSG00000000003.10"
      "expression_values" : [
        {
            "rep2_tpm" : 1.33,
            "rep1_fpkm" : 6.86,
            "rep1_tpm" : 2.78,
            "dataset" : "ENCSR000AAA",
            "rep2_fpkm" : 6.02
         },
         {
            "rep2_tpm" : 2.09,
            "rep1_fpkm" : 32.02,
            "rep1_tpm" : 13.7,
            "dataset" : "ENCSR000AAB",
            "rep2_fpkm" : 14.48
         },

...

DCC collections

Extensive metadata associated with each dataset (aka experiment).

Output

One expression TSV table per gene

Example (ENSG00000000003.10.expression.tsv):

accession	biosample_term_name	biosample_type	developmental_slims	fpkm *	library.nucleic_acid_term_name	organ_slims	tpm *	...
ENCSR000AAA	aortic smooth muscle cell	primary cell	mesoderm	6.44	RNA	blood vessel	2.055	...
ENCSR000AAB	bladder microvascular endothelial cell	primary cell	mesoderm	23.25	RNA	blood vessel,urinary bladder	7.895	...
ENCSR000AAC	smooth muscle cell of bladder	primary cell	mesoderm	10.675	RNA	urinary bladder	2.57	...
ENCSR000AAD	bronchial epithelial cell	primary cell	endoderm	6.355	RNA	bronchus	1.04	...
ENCSR000AAE	bronchial smooth muscle cell	primary cell	endoderm,mesoderm	12.72	RNA	bronchus	2.995	...
ENCSR000AAF	endothelial cell of coronary artery	primary cell	mesoderm	18.115	RNA	blood vessel,heart	4.965	...
ENCSR000AAG	smooth muscle cell of the coronary artery	primary cell	mesoderm	10.775	RNA	blood vessel,heart	3.11	...
ENCSR000AAH	regular cardiac myocyte	primary cell	mesoderm	16.65	RNA	NA	1.88	...
ENCSR000AAI	dermis blood vessel endothelial cell	primary cell	ectoderm,mesoderm	28.08	RNA	blood vessel,skin of body	8.415	...
...	...	...	...	...	...	...	...	...

* averaged across replicates

One set of interactive box-plots per gene

Interactive box-plots are generated by feeding the expression TSV to R's plotly library. The R script takes the following parameters:

Parameter	Possible values
$geneId	Any ensembl_id
$metrics	'tpm' or 'fpkm'
$colorBy	Any column name of the input TSV
$groupBy	Any column name of the input TSV

Generic R code:

library(plotly)
plot <- read.table("$geneId.expression.tsv", header=T, as.is=T, sep="\t")
figure<-plot_ly(plot, x=$groupBy, y=log10($metrics+0.01), color=$colorBy, type="box", boxpoints = "all", jitter = 0.3, pointpos = -1.8) %>%
 layout(
        title="$geneId / $groupBy vs $colorBy",
        margin= list(b=300)
        )
htmlwidgets::saveWidget(plotly:::toWidget(figure), "./$groupBy.vs.$colorBy.$geneId.html", selfcontained = FALSE)

Examples of parameter combinations

Input:

Parameter	Assigned value
$geneId	ENSG00000000003.10
$metrics	tpm
$colorBy	library.nucleic_acid_term_name
$groupBy	organ_slims

Output:

Input:

Parameter	Assigned value
$geneId	ENSG00000000003.10
$metrics	tpm
$colorBy	system_slims
$groupBy	biosample_term_name

Output:

Input:

Parameter	Assigned value
$geneId	ENSG00000000003.10
$metrics	tpm
$colorBy	organ_slims
$groupBy	system_slims

Output:

Input:

Parameter	Assigned value
$geneId	ENSG00000000003.10
$metrics	tpm
$colorBy	biosample_type
$groupBy	organ_slims

Output:

Going further...

Given a $geneId, is it possible to let the user choose $metrics, $groupBy and $colorBy dynamically (via some JavaScript magic)?

Plotly interactive plots embed only some data (i.e.,keeps data only for the concerned columns, and ditches everything else), so, a priori, no.

(there might be a workaround I'm not aware of, though).

I have made some tests with Shiny, another R library compatible with plotly (see screenshot below).

It seems powerful enough, unfortunately sharing options are not great.

Contact: Julien Lagarde (julien.lagarde AT crg.eu)

RNA expression visualization in ENCODE

Input

Gene expression matrix

DCC collections

Output

One expression TSV table per gene

One set of interactive box-plots per gene

Examples of parameter combinations

Input:

Output:

Input:

Output:

Input:

Output:

Input:

Output:

Going further...

Given a $geneId, is it possible to let the user choose $metrics, $groupBy and $colorBy dynamically (via some JavaScript magic)?