Recent studies have revealed that endogenous circRNAs could generate proteins and the circRNA-derived proteins play important roles in cellular responses to environmental stress and myogenesis, thereby increasing the complexity of transcriptome and proteome. To gain more insights into the novel layer of gene activity, comprehensive identification of circRNAs with protein-coding potential is the first step. Here, to facilitate the biologists to realize this, we present an integrated tool, CircPro. As an automated high-throughput data analysis pipeline, it is capable of detecting circRNAs, predicting their protein-coding potential and discovering junction reads from Ribo-Seq data.
Based on total/poly(A)- RNA-Seq and Ribo-Seq data, CircPro enables users to discover circRNAs with protein-coding potential. CircPro is composed of three modules as below:
• Module 1: circRNA detection
The total/poly(A)- RNA sequencing reads are mapped to reference genome using BWA-MEM. The generated SAM alignment is used by CIRI2 for de novo detection of circRNAs. The gene annotation in GTF file format is used to classify the identified circRNAs based on exon boundaries.
• Module 2: protein-coding potential score
To calculate circRNA coding potential score, first, CircPro extracts circRNA sequences. For the exonic circRNAs, the introns are removed. Then, CPC is used to assess the protein-coding potential of circRNAs. This procedure will generate the coding/noncoding classification, the coding potential score and the information of open reading frame (ORF).
• Module 3: junction reads from Ribo-Seq
First, the Ribo-Seq reads are processed by removing adaptors using FASTX-Toolkit. Then, the reads are mapped to rRNA library, such as Ensembl and RFam, to remove the rRNAs. The cleaned reads are mapped to reference genome using Bowtie2. The unmapped reads extracted from the alignment SAM file using SAMtools are further mapped to the library of circRNA junction sites, which is constructed by extracting N nucleotides from both sides of the junction site (N is the length of Ribo-Seq reads). Finally, CircPro generates a final circRNA list including the information of genomic position, coding status, ORF length and junction reads from Ribo-Seq.
If you use CircPro in your scientific research, please cite us:
Xianwen Meng, Qi Chen, Peijing Zhang, Ming Chen*. CircPro: an integrated tool for the identification of circRNAs with protein-coding potential. Bioinformatics 2017; 33:15. doi:10.1093/bioinformatics/btx446.