Long non-coding RNAs (lncRNAs) are recognized as a new class of regulatory molecules associated with organisms complexity despite very little is known about their functions in the cellular processes. Due to their overall low expression level and tissue-specificity, the identification and annotation of lncRNA genes still remains challenging. LncRNAs show a low level of sequence conservation, but an evolutionary constraint on lncRNA sequences is localized at splicing regulatory elements suggesting that the recognition of the intron boundaries and their splicing is a crucial step required for their function. We exploited recent annotations by the GENCODE compendium to characterize the splicing features of long non-coding genes, in comparison to protein-coding ones, in the human and mouse genome by using bioinformatics approaches. A significant difference in the splice sites usage was observed between the two gene classes. While the frequency of non-canonical GC-AG splice junctions represents about 0.8% of total splice sites in protein-coding genes, we identified a remarkable enrichment of the GC-AG splice sites in long non-coding genes, both in human (3.0%) and mouse (1.9%). In addition, we identified peculiar characteristics of the GC-AG introns in terms of their intron length, a positional bias in the first intron, their donor and acceptor splice sites strength, poly-pyrimidine tract, and alternative polyadenylation signaling. Genes containing GC-AG introns were found conserved in many species across large evolutionary distances and a functional analysis pointed toward their enrichment in specific biological processes such as DNA repair. Moreover, GC-AG introns appeared more prone to alternative splicing and enriched in a special alternative splicing mechanism termed wobble-splicing. Wobble-splicing appeared to be a rare mechanism, subjected to tissue-specific regulation and involved in inducing subtle changes in the expressed isoforms with a putative regulatory role. Taken together, our data suggests that GC-AG introns represent new regulatory elements mainly associated with lncRNAs, which could contribute to the evolution of complexity, adding a new layer in gene expression regulation.

Long non-coding RNAs (lncRNAs) are recognized as a new class of regulatory molecules associated with organisms complexity despite very little is known about their functions in the cellular processes. Due to their overall low expression level and tissue-specificity, the identification and annotation of lncRNA genes still remains challenging. LncRNAs show a low level of sequence conservation, but an evolutionary constraint on lncRNA sequences is localized at splicing regulatory elements suggesting that the recognition of the intron boundaries and their splicing is a crucial step required for their function. We exploited recent annotations by the GENCODE compendium to characterize the splicing features of long non-coding genes, in comparison to protein-coding ones, in the human and mouse genome by using bioinformatics approaches. A significant difference in the splice sites usage was observed between the two gene classes. While the frequency of non-canonical GC-AG splice junctions represents about 0.8% of total splice sites in protein-coding genes, we identified a remarkable enrichment of the GC-AG splice sites in long non-coding genes, both in human (3.0%) and mouse (1.9%). In addition, we identified peculiar characteristics of the GC-AG introns in terms of their intron length, a positional bias in the first intron, their donor and acceptor splice sites strength, poly-pyrimidine tract, and alternative polyadenylation signaling. Genes containing GC-AG introns were found conserved in many species across large evolutionary distances and a functional analysis pointed toward their enrichment in specific biological processes such as DNA repair. Moreover, GC-AG introns appeared more prone to alternative splicing and enriched in a special alternative splicing mechanism termed wobble-splicing. Wobble-splicing appeared to be a rare mechanism, subjected to tissue-specific regulation and involved in inducing subtle changes in the expressed isoforms with a putative regulatory role. Taken together, our data suggests that GC-AG introns represent new regulatory elements mainly associated with lncRNAs, which could contribute to the evolution of complexity, adding a new layer in gene expression regulation.

Genome-wide Characterization of the Genomic and Splicing Features of Long Non-coding RNAs Using Bioinformatics Approaches

ABOU ALEZZ, MONAH
2020-12-15

Abstract

Long non-coding RNAs (lncRNAs) are recognized as a new class of regulatory molecules associated with organisms complexity despite very little is known about their functions in the cellular processes. Due to their overall low expression level and tissue-specificity, the identification and annotation of lncRNA genes still remains challenging. LncRNAs show a low level of sequence conservation, but an evolutionary constraint on lncRNA sequences is localized at splicing regulatory elements suggesting that the recognition of the intron boundaries and their splicing is a crucial step required for their function. We exploited recent annotations by the GENCODE compendium to characterize the splicing features of long non-coding genes, in comparison to protein-coding ones, in the human and mouse genome by using bioinformatics approaches. A significant difference in the splice sites usage was observed between the two gene classes. While the frequency of non-canonical GC-AG splice junctions represents about 0.8% of total splice sites in protein-coding genes, we identified a remarkable enrichment of the GC-AG splice sites in long non-coding genes, both in human (3.0%) and mouse (1.9%). In addition, we identified peculiar characteristics of the GC-AG introns in terms of their intron length, a positional bias in the first intron, their donor and acceptor splice sites strength, poly-pyrimidine tract, and alternative polyadenylation signaling. Genes containing GC-AG introns were found conserved in many species across large evolutionary distances and a functional analysis pointed toward their enrichment in specific biological processes such as DNA repair. Moreover, GC-AG introns appeared more prone to alternative splicing and enriched in a special alternative splicing mechanism termed wobble-splicing. Wobble-splicing appeared to be a rare mechanism, subjected to tissue-specific regulation and involved in inducing subtle changes in the expressed isoforms with a putative regulatory role. Taken together, our data suggests that GC-AG introns represent new regulatory elements mainly associated with lncRNAs, which could contribute to the evolution of complexity, adding a new layer in gene expression regulation.
15-dic-2020
Long non-coding RNAs (lncRNAs) are recognized as a new class of regulatory molecules associated with organisms complexity despite very little is known about their functions in the cellular processes. Due to their overall low expression level and tissue-specificity, the identification and annotation of lncRNA genes still remains challenging. LncRNAs show a low level of sequence conservation, but an evolutionary constraint on lncRNA sequences is localized at splicing regulatory elements suggesting that the recognition of the intron boundaries and their splicing is a crucial step required for their function. We exploited recent annotations by the GENCODE compendium to characterize the splicing features of long non-coding genes, in comparison to protein-coding ones, in the human and mouse genome by using bioinformatics approaches. A significant difference in the splice sites usage was observed between the two gene classes. While the frequency of non-canonical GC-AG splice junctions represents about 0.8% of total splice sites in protein-coding genes, we identified a remarkable enrichment of the GC-AG splice sites in long non-coding genes, both in human (3.0%) and mouse (1.9%). In addition, we identified peculiar characteristics of the GC-AG introns in terms of their intron length, a positional bias in the first intron, their donor and acceptor splice sites strength, poly-pyrimidine tract, and alternative polyadenylation signaling. Genes containing GC-AG introns were found conserved in many species across large evolutionary distances and a functional analysis pointed toward their enrichment in specific biological processes such as DNA repair. Moreover, GC-AG introns appeared more prone to alternative splicing and enriched in a special alternative splicing mechanism termed wobble-splicing. Wobble-splicing appeared to be a rare mechanism, subjected to tissue-specific regulation and involved in inducing subtle changes in the expressed isoforms with a putative regulatory role. Taken together, our data suggests that GC-AG introns represent new regulatory elements mainly associated with lncRNAs, which could contribute to the evolution of complexity, adding a new layer in gene expression regulation.
File in questo prodotto:
File Dimensione Formato  
abou_alezz_thesis_final.pdf

accesso aperto

Descrizione: Genome-wide Characterization of the Genomic and Splicing Features of Long Non-coding RNAs Using Bioinformatics Approaches
Tipologia: Tesi di dottorato
Dimensione 14.06 MB
Formato Adobe PDF
14.06 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11571/1370054
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact