2003年度 森泰吉郎記念研究振興基金 成果報告書

線虫におけるセンスーアンチセンス転写調節機構の解析
沼田興治
慶応義塾大学 政策・メディア研究科 後期博士課程1年 


1. Abstract 2. Motivations 3. Results and Discussions 4. Conclusions 5. Acknowledgements 6. References
1. ABSTRACT

Naturally occurring sense-antisense transcription, in which a transcript contains reciprocally complement sequence, is often speculated as critical phenomenon in RNA-mediated regulation, such as developmental control, gene silencing, and translational repression.  In this study, we focused on sense-antisense transcription in nematode Caenorhabditis elegans. Hundreds of trans- sense-antisense pairs, e.g. let-7 and lin-4, have been reported in worm, whereas the existence of cis- sense-antisense pairs and its regulation is poorly understood. Using C. elegans genome annotation, we implemented computational genome-wide screening to identify cis- sense-antisense pairs, which are overlapped in same genomic locus but in opposite orientation. As a result of initial computational screening, 5% of the genes in C. elegans are bi-directionally overlapped (518 pairs). Especially, 252 of which are overlapped within their exonic region, such as 3’ external exons. On the other hand, the genes located in introns of opposite genes are also frequently presented (243 pairs). Several of these are conserved in the genome of C. briggsae, a related soil nematode. Additionally, the retrieval of microarray expression profiles revealed that the several pairs are correlated their expression, both of positive and negative, in developmental stages. Our results indicate sense-antisense overlapped genes are common observation in C. elegans genome, and this implies general regulation mechanisms might be resided in C. elegans.


2. MOTIVATIONS

According to the sequencing and its annotation of genomes and transcriptomes in several eukaryotes, the importance of RNA-mediated gene regulation has become more evident (Eddy 2002; Szymanski and Barciszewski 2002; Numata, Kanai et al. 2003). With this situation, numerous numbers of endogenously transcribed antisense RNAs have been increasingly observed in various eukaryotic organisms. Although primary effects of antisense transcripts in the cells are not well documented, a number of experimental evidences suggest that they are involved in critical phenomenon in the cell, such as transcriptional/translational regulation, mRNA processing and chromosomal inactivation (Vanhee-Brossollet and Vaquero 1998).

There are two models of sense-antisense transcription. One is the trans- encoded antisense transcription that both transcripts are expressed from different genomic locus. The transcripts in such case are having complementary sequence in mature level. The other model is the cis- encoded antisense, transcribed from same genomic locus but in reverse orientation.

In this study, we focused on especially cis- encoded antisense transcripts. Whereas trans- encoded RNAs are widely remarked with their regulation mechanisms, the existence and its effects of cis- antisense transcripts are not clearly understood. Especially in mammals, although large number of cis- antisense transcripts is accumulated based on full-sequenced transcriptome materials, the involvement of these antisense transcripts with the corresponding sense transcripts has not been established.

For the further functional analyses and its validation of cis- antisense, we used Caenorhabditis elegans. C. elegans is a good model organism because the enormous amount of information is available on each of nearly 20K genes in complete genome, and their mutant strains could be readily obtained. Likewise, the other genome-wide experimental data, such as RNAi phenotype and DNA microarray expression profiles, are publicly disclosed. Thus, we performed genome-wide analysis of bi-directionally overlapped gene pairs in C. elegans with the classification of its patterns and its conservation in C. briggsae genome, a related soil nematode diverged roughly 100 million years ago (Stein, Bao et al. 2003). The sense-antisense pairs listed in this study would be actual candidates for further experimental analysis.


3. RESULTS AND DISCUSSIONS

3.1 Genome-wide extraction of bi-directionally overlapped gene pairs
We initially listed up pairs of genes that are located in same genomic locus but in opposite strand. Because the number of publicly available full-length transcripts (cDNA) sequence data is too small, we initially applied all of protein-coding mRNAs (19,917 sequences), including ones annotated as hypothetical protein, according to the Caenorhabditis elegans genome annotation. Accordingly, 518 pairs (comprised of 986 genes) of bi-directionally overlapped gene pairs are extracted. Also, a similar number of bi-directionally overlapped gene pairs are predicted in D. melanogaster genome based on our same approach.

We then categorized their overlapped patterns, with exon/intron structure, into six groups. In particular, 243 (46.9%) of these are completely located within intron of odd gene. Although the genes in this category do not contain complementary region in mature transcripts, it is inferred that the regulations in transcription and/or processing level are likely to be occurred. In mice, however, most of the both transcripts in such pattern are not cloned in same library even though several hundred of such pairs are identified (Kiyosawa, Yamanaka et al. 2003). This implies remanent origin of these bi-directionally overlapped genes might be resided.

On the other hand, whereas only 15 pairs are overlapped in their 5’ region, approximately 40% of the listed pairs (215 pairs) are overlapped in their 3’ region. Additionally, 118 pairs (55%) of these pairs are overlapped in their external exons. This observation is also arisen in mammals. The majority of overlapping patterns in human and mouse are having a 3’ to 3’ arrangement (Lehner, Williams et al. 2002; Shendure and Church 2002). Also, large number of overlapping within their 3’ or 5’ external exons is presented in mouse and human (Kiyosawa, Yamanaka et al. 2003; Yelin, Dahary et al. 2003). Likewise, several genes in another eukaryotes, such as A. thaliana and D. melanogaster, are overlapped in their 3’ boundary region (Quesada, Ponce et al. 1999; Peters, Rohrbach et al. 2003). Although the origins of these 3’-to-3’ overlapping are remaining still unclear, our results indicates that the overlapping between 3’ boundaries are frequently observed not only in mammals, but in C. elegans. This pattern of overlapping might be involved with critical cellular processes, i.e. RNA-mediated regulation. Further experimental validations would be required especially for such pairs conserved throughout various species.

3.2 Expression profiles in developmental stages
For the validation of actual expression of the genes in every pairs, we applied to use publicly available microarray expression profile in developmental stages (Jiang, Ryu et al. 2001). From the dataset, 366 of the listed pairs are successfully captured their expression profile for both genes. For these pairs, we calculated their coefficient of expression oscillation of two genes. As a result of the calculation, 38 (10.4%) of the pairs that are correlated their expression were observed (examples are shown in Fig. 1). First example is the pair of genes annotated as annexin-family and hypothetical protein (Fig. 1A). It seems to be preferred either overlapping or non-overlapping according to its isoforms of the hypothetical protein. Both genes are narrowed their expression in L3 stage. The other example (Fig. 1B), both are annotated as hypothetical protein, are thinly increasing their expression through the developmental stages. These and another correlated pairs might be reciprocally regulated their expression, such as translational repression and ADAR-dependent RNA-editing (Peters, Rohrbach et al. 2003).


FIGURE1. Examples of positively correlated gene pairs
Two examples of pairs that positively correlated their expression are described with their exon/intron structure (left) and the expression oscillation through the developmental stages (right). The arrows indicate the direction of transcription. Black boxes and bars indicate exons and introns, respectively.

On the other hand, 22 (6.0%) of the pairs are negatively correlated their expression (examples are shown in Fig. 2). Indeed, it is previously reported the cases that sense-antisense overlapped transcripts are reversely expressed in N. crassa and D. melanogaster (Kramer, Loros et al. 2003; Peters, Rohrbach et al. 2003). The examples represented here are involved with mig-13, a trans-membrane protein expressed in the anterior and central body regions, and intestinal acid phosphatase, pho-1. In both case, because the overlapped genes are transcribed reciprocally in different developmental timing, it is inferred that a regulation might be occurred in transcriptional level rather than a regulation between two mature transcripts.


FIGURE2. Examples of negatively correlated gene pairs
Two examples of pairs that positively correlated their expression are described with their exon/intron structure (left) and the expression oscillation through the developmental stages (right). The arrows indicate the direction of transcription. Black boxes and bars indicate exons and introns, respectively.


4. CONCLUSIONS

In this study, we notably focused on cis- encoded sense-antisense overlapped genes. The genome-wide extraction implied that the 5% of all genes in C. elegans are bi-directionally located with another genes. Also, categorization of overlapping pattern showed that the two patterns are significantly observed in C. elegans. One is that a gene located completely within intron of opposite gene. Several cases in this are comprised of transposase gene. This implies that this type of overlapping was arisen according to the evolutionary genomic rearrangement. The other frequent pattern is 3’-to-3’ (especially within external exons) overlapping reciprocally. Because it is also abundantly observed in several eukaryotes, there might be occurred RNA-mediated gene regulation in the cell through means of 3’-to-3’ overlapping. Also, some of both cases are conserved in related nematode C. birggsae genome. It might be necessary to apply similar analysis not only in C. birggsae but also in another species. The retrieval of publicly disclosed microarray expression profiles resulted that both of positive and negative correlations are presented for the expression oscillations in developmental stages. In current situation, the number of extracted candidates is too large, further enrichment of reliable pairs is required for the coming experimental validations.


5. ACKNOWLEDGEMENTS

We acknowledge Dr. Akio Kanai, Dr. Rintaro Saito for helpful discussion and technical support; Dr. Ben Lehner, Mr. Nozomu Yachie and Ms. Hiromi Kochiwa for impressive discussions. We also acknowledge all the supports of Prof. Masaru Tomita.


6. REFERENCES

Eddy, S. R. (2002). "Computational genomics of noncoding RNA genes." Cell 109(2): 137-40..

Jiang, M., J. Ryu, et al. (2001). "Genome-wide analysis of developmental and sex-regulated gene expression profiles in Caenorhabditis elegans." Proc Natl Acad Sci U S A 98(1): 218-23.

Kiyosawa, H., I. Yamanaka, et al. (2003). "Antisense transcripts with FANTOM2 clone set and their implications for gene regulation." Genome Res 13(6B): 1324-34.

Kramer, C., J. J. Loros, et al. (2003). "Role for antisense RNA in regulating circadian clock function in Neurospora crassa." Nature 421(6926): 948-52.

Lehner, B., G. Williams, et al. (2002). "Antisense transcripts in the human genome." Trends Genet 18(2): 63-5.

Numata, K., A. Kanai, et al. (2003). "Identification of putative noncoding RNAs among the RIKEN mouse full-length cDNA collection." Genome Res 13(6B): 1301-6.

Peters, N. T., J. A. Rohrbach, et al. (2003). "RNA editing and regulation of Drosophila 4f-rnp expression by sas-10 antisense readthrough mRNA transcripts." Rna 9(6): 698-710.

Quesada, V., M. R. Ponce, et al. (1999). "OTC and AUL1, two convergent and overlapping genes in the nuclear genome of Arabidopsis thaliana." FEBS Lett 461(1-2): 101-6.

Shendure, J. and G. M. Church (2002). "Computational discovery of sense-antisense transcription in the human and mouse genomes." Genome Biol 3(9): RESEARCH0044.

Stein, L. D., Z. Bao, et al. (2003). "The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative Genomics." PLoS Biol 1(2): E45.

Szymanski, M. and J. Barciszewski (2002). "Beyond the proteome: non-coding regulatory RNAs." Genome Biol 3(5): reviews0005. Epub 2002 Apr 15.

Vanhee-Brossollet, C. and C. Vaquero (1998). "Do natural antisense transcripts make sense in eukaryotes?" Gene 211(1): 1-9.

Yelin, R., D. Dahary, et al. (2003). "Widespread occurrence of antisense transcription in the human genome." Nat Biotechnol 21(4): 379-86.