About PncPEPDB
1. Introduction
Most of non-coding RNAs are considered to express at low levels and have a limited phylogenetic distribution in the cytoplasm, meaning that they may be only involved in specific biological processes. However, recent studies showed the protein-coding potential of ncRNAs, indicating that they might be source of some special proteins. Although there are increasing non-coding RNAs identified to be able to code proteins, it is challenging to distinguish coding RNAs from previously annotated ncRNAs, and to detect the proteins from their translation. In this article, we tried to identify these non-coding RNAs in Arabidopsis thaliana from three NCBI GEO datasets with coding potential and predict their translation products. 31,311 non-coding RNAs were predicted to be translated into peptides, and they showed lower conservation rate than common proteins. In addition, we built an interaction network between these peptides and annotated Arabidopsis proteins, which included 69 peptides from non-coding RNAs. Peptides in the interaction network showed different characteristics from other non-coding RNA-derived peptides, and they participated in several crucial biological processes, such as photorespiration and stress-responses. These results showed that peptides derived from non-coding RNAs may be important roles in non-coding RNA regulation, which provided another hypothesis that non-coding RNA may regulate the metabolism via their translation products.
2. Workflow of PncPEPDB
3. ncPEP Prediction
Ribo-Seq and RNA-Seq data of leaf, root, shoot and flower bud in Arabidopsis thaliana were obtained from NCBI GEO Datasets (GSE40209, GSE69802, GSE81332). After removal of adapters and processed with TopHat and Cufflinks with TAIR10 genome, all the assembled transcripts were differed into coding and putative non-coding RNAs using CuffCompare, and putative non-coding RNAs were later aligned to TAIR10 in order to filter coding sequences. These transcripts were processed with TransDecoder and CIPHER to get their RNA sequences, peptide sequences and coding scores.
4. Network Prediction
BIPS, a webserver for PPI prediction based on homologs found in PPI databases in BIANA, was used to predict putative interactions between our predicted non-coding peptides (ncPEPs) and proteins included in UniProt. Predicted target proteins was extracted from the network, which were submitted to UniProt to fetch their gene ontology (GO) functions. The predicted ncPPI network was integrated with the information of peptides involved in the network and the GO functions of predicted UniProt proteins, then finally visualized using Cytoscape.