This directory contains BED files of merged polyA sites generated within the GENCODE Capture Long-Seq project using PacBio reads. These were produced by merging "*polyAsitesNoErcc.bed" files listed in https://public_docs.crg.es/rguigo/Papers/2017_lagarde-uszczynska_CLS/data/polyA/raw/ using the following command: $ cat $RAW_BEDfile | bedtools merge -s -n -d 5 -nms -i stdin | awk '$5>1' | perl -F"\t" -lane 'if($F[5] eq "+"){$F[1]=$F[2]-1}elsif($F[5] eq "-"){$F[2]=$F[1]+1}else{die} print join("\t",@F);'|sortbed > All_Cap1__.clusters.bed All files correspond to genome assemblies hg38 and mm10, and contain PacBio reads except otherwise stated. # File naming scheme: All_Cap1__.bed where: species: "mm": mouse "hs": human tissue: self-explanatory, except: "all": all polyA sites merged across all tissues. subset: "polyAsitesNoErcc": all polyA sites, excluding those called on ERCC spike-in sequences. # BED file format (BED6): There is one read per BED record. column 1: chromosome column 2: chromosome start of polyA site column 3: chromosome end of polyA site column 4: comma-separated list of read identifiers contributing to the site column 5: number of reads contributing to the site column 6: genomic strand of the site