This directory contains merged transcript models produced using
the "standard" (non-anchored) merging procedure on PacBio HCGMs (High-Confidence Genome Mappings)
within the GENCODE Capture Long-Seq project.
The "compmerge" software (https://github.com/sdjebali/Compmerge) was used for the merging.

All files correspond to genome assemblies hg38 and mm10.

# File naming scheme:

<species>All_Cap1_<tissue>_<merging_method>.compmerge.<subset>.gtf.gz

where:
	species:
		"mm": mouse
		"hs": human

	tissue: self-explanatory, except:
		"all": transcript models merged across all available tissues

	merging_method:
		"anchor": files produced with the "anchored" merging procedure (see Methods section of the paper)
		"noAnchor": files produced with the standard merging procedure (see Methods section of the paper)

	subset:
		"all": all merged transcript models, regardless of their end support
		"cageSupported": merged transcripts models whose 5' end is supported by a FANTOM5 CAGE TSS
		"polyASupported": merged transcripts models whose 3' end is supported by a captured polyA site
				(i.e., composed of poly-adenylated PacBio reads)
		"cage+polyASupported": full-length merged transcripts models whose:
				- 5' end is supported by a FANTOM5 CAGE TSS
				and
				- 3' end is supported by a captured polyA site
			    	(i.e., composed of poly-adenylated PacBio reads)


# File format:

GTF with the internal merged transcript model identifier as the "transcript_id" value,
and the comma-separated list of contributing PacBio reads as the gene_id value.
Note that when the <tissue> value of the filename is "all",
the gene_id attribute value is the list of contributing models merged *within* each tissue,
and not the list of contributing PacBio reads.