Document

Long noncoding RNAs (lncRNAs) are endogenous noncoding RNAs, arbitrarily longer than 200-nucleotide, that play critical roles in diverse biological processes. LncRNAs exist in different genomes ranging from animals to plants. Here our database PlncRNADB is a searchable database of lncRNA sequences and annotation in plants. We built a pipeline for lncRNA prediction in plants, providing a convenient utility for users to quickly distinguish potential noncoding RNAs from protein coding transcripts. More than five thousands lncRNAs are collected from four plant species (Arabidopsis thaliana, Arabidospsis lyrata, Populus trichocarpa and Zea mays), which is complementary to the A. thaliana lncRNAs in another database PLncDB. Moreover, our database provides the relationship between lncRNAs and various RNA binding proteins, which can be displayed through a user-friendly web interface. PlncRNADB can serve as a reference database to investigate the regulatory function of lncRNAs in plants.

  1. The workflow of PlncRNADB
  2. Data collection
  3. RNA seq datasets of Arabidopsis were downloaded from GEO (A. thaliana: GSM764077, GSM764078, GSM764079 and GSM701934; A. lyrata: GSM605684, GSM605685, GSM605686 and GSM605687) All single-end strand-specific reads were aligned independently to reference genome using the spliced read aligner Tophat version V1.3.3. The transcriptome of each sample was assembled from the mapped reads separately by ab-initio transcriptome assembler Cufflinks. Recently several groups reported that many lncRNAs were identified in P. trichocarpa and Z. mays. These lncRNA were retrieved from literatures (Shuai P et al. 2014, J. Exp. Bot. and Li L et al. 2014).

  4. LncRNA prediction
  5. A step-wise pipeline was built to identify lncRNAs. There filters were applied to determine the lncRNAs: (1) any transcript short than 200nt is discarded; (2) the ORF length of each transcript was filter by 120 AA; (3) the transcripts are assessed their protein coding potential by CPAT software that was trained with A. thaliana lncRNAs data (Liu J et al, 2012, Plant Cell).

  6. LncRNA-protein interaction
  7. The RNA binding proteins were downloaded from the RNA binding protein database. The interaction between lncRNAs and RNA binding proteins are predicted by the webserver of catRAPID (Agostini F et al. 2013, Bioinformatics)

  8. Summary of the interaction
  9. SpeciesSourceTotal RBP No.Library RBP No.Total lncRNA No.LncRNA No.Interaction No.
    Arabidopsis thalianaRiceRBP16335390109948
    Zea maysRiceRBP190421704110722133
    Arabidopsis lyrataBLAST to Phytozome103372994721336420
    Populus trichocarpaBLAST to Phytozome104694232542101644301

References:

  1. Shuai P, Liang D, Tang S et al. Genome-wide identification and functional prediction of novel and drought-responsive lincRNAs in Populus trichocarpa, J Exp Bot 2014.
  2. Li L, Eichten SR, Shimizu R et al. Genome-wide discovery and characterization of maize long non-coding RNAs, Genome Biol 2014;15:R40.
  3. Zhang W, Han Z, Guo Q et al. Identification of maize long non-coding RNAs responsive to drought stress, PLoS One 2014;9:e98958
  4. Jin J, Liu J, Wang H et al. PLncDB: plant long non-coding RNA database, Bioinformatics 2013;29:1068-71.
  5. Liu J, Jung C, Xu J et al. Genome-Wide Analysis Uncovers Regulation of Long Intergenic Noncoding RNAs in Arabidopsis, Plant Cell 2012;24:4333-45.
  6. Bai Y, Dai X, Harrison A, Chen M. RNA regulatory networks in animals and plants: a long noncoding RNA perspective. Briefings in Functional Genomics. 2014, doi: 10.1093/bfgp/elu017