This directory contains gene and biotype assignments of all HCGM PacBio reads in human and mouse. They were obtained by comparing the genomic coordinates of HCGMs and GENCODE 20 (human) to GENCODE M3 (mouse) annotations, plus extra, non-GENCODE probed features. # File naming scheme: .pacBioReadId.To.GENCODEgene_id.To.biotype.tsv.gz where: species: "mm": mouse "hs": human # File format (tab-separated): There is one line per HCGM read. column 1: PacBio read identifier column 2: comma-separated list of "annotated gene / annotated biotype values", that the corresponding read overlaps, in the form: :,:,[...] # Note about "biotype" values: - The following GENCODE gene types were tagged “lncRNA”: “antisense”, "lincRNA", "processed_transcript", "sense_intronic" and "sense_overlapping". - In our analysis, we consider any read overlapping multiple genes of distinct biotypes as "multi-biotype"