Long non-coding RNAs (lncRNAs) are recognized as a new class of regulatory molecules associated with organisms complexity despite very little is known about their functions in the cellular processes. Due to their overall low expression level and tissue-specificity, the identification and annotation of lncRNA genes still remains challenging. LncRNAs show a low level of sequence conservation, but an evolutionary constraint on lncRNA sequences is localized at splicing regulatory elements suggesting that the recognition of the intron boundaries and their splicing is a crucial step required for their function. We exploited recent annotations by the GENCODE compendium to characterize the splicing features of long non-coding genes, in comparison to protein-coding ones, in the human and mouse genome by using bioinformatics approaches. A significant difference in the splice sites usage was observed between the two gene classes. While the frequency of non-canonical GC-AG splice junctions represents about 0.8% of total splice sites in protein-coding genes, we identified a remarkable enrichment of the GC-AG splice sites in long non-coding genes, both in human (3.0%) and mouse (1.9%). In addition, we identified peculiar characteristics of the GC-AG introns in terms of their intron length, a positional bias in the first intron, their donor and acceptor splice sites strength, poly-pyrimidine tract, and alternative polyadenylation signaling. Genes containing GC-AG introns were found conserved in many species across large evolutionary distances and a functional analysis pointed toward their enrichment in specific biological processes such as DNA repair. Moreover, GC-AG introns appeared more prone to alternative splicing and enriched in a special alternative splicing mechanism termed wobble-splicing. Wobble-splicing appeared to be a rare mechanism, subjected to tissue-specific regulation and involved in inducing subtle changes in the expressed isoforms with a putative regulatory role. Taken together, our data suggests that GC-AG introns represent new regulatory elements mainly associated with lncRNAs, which could contribute to the evolution of complexity, adding a new layer in gene expression regulation.
Long non-coding RNAs (lncRNAs) are recognized as a new class of regulatory molecules associated with organisms complexity despite very little is known about their functions in the cellular processes. Due to their overall low expression level and tissue-specificity, the identification and annotation of lncRNA genes still remains challenging. LncRNAs show a low level of sequence conservation, but an evolutionary constraint on lncRNA sequences is localized at splicing regulatory elements suggesting that the recognition of the intron boundaries and their splicing is a crucial step required for their function. We exploited recent annotations by the GENCODE compendium to characterize the splicing features of long non-coding genes, in comparison to protein-coding ones, in the human and mouse genome by using bioinformatics approaches. A significant difference in the splice sites usage was observed between the two gene classes. While the frequency of non-canonical GC-AG splice junctions represents about 0.8% of total splice sites in protein-coding genes, we identified a remarkable enrichment of the GC-AG splice sites in long non-coding genes, both in human (3.0%) and mouse (1.9%). In addition, we identified peculiar characteristics of the GC-AG introns in terms of their intron length, a positional bias in the first intron, their donor and acceptor splice sites strength, poly-pyrimidine tract, and alternative polyadenylation signaling. Genes containing GC-AG introns were found conserved in many species across large evolutionary distances and a functional analysis pointed toward their enrichment in specific biological processes such as DNA repair. Moreover, GC-AG introns appeared more prone to alternative splicing and enriched in a special alternative splicing mechanism termed wobble-splicing. Wobble-splicing appeared to be a rare mechanism, subjected to tissue-specific regulation and involved in inducing subtle changes in the expressed isoforms with a putative regulatory role. Taken together, our data suggests that GC-AG introns represent new regulatory elements mainly associated with lncRNAs, which could contribute to the evolution of complexity, adding a new layer in gene expression regulation.
Titolo: | Genome-wide Characterization of the Genomic and Splicing Features of Long Non-coding RNAs Using Bioinformatics Approaches |
Autori: | |
Data di pubblicazione: | 15-dic-2020 |
Handle: | http://hdl.handle.net/11571/1370054 |
Appare nelle tipologie: | 8.01 Tesi di dottorato |
File in questo prodotto:
File | Descrizione | Tipologia | Licenza | |
---|---|---|---|---|
abou_alezz_thesis_final.pdf | Genome-wide Characterization of the Genomic and Splicing Features of Long Non-coding RNAs Using Bioinformatics Approaches | Tesi di dottorato | Open Access Visualizza/Apri |