2005年度森基金研究成果報告書

 

Large-Scale Analysis of Expressed Pseudogenes in Higher Eucaryotes

 

Hayataro Kochi

Graduate School of Media and Governance

 

 

Abstract

 

Pseudogenes are genomic DNA sequences that are homologous to functional genes.  They do not code proteins as there are many mutations i.e. frameshifts and/or in-frame stop codons, in coding regions.  However, a processed pseudogene (Pψgs) Makorin1-p1, which had an important function, was found recently in mouse.  Additionally, it is known that expressed pseudogene acts as a natural antisense regulator of homologous functional gene in snail.  These facts suggest that some of the expressed pseudogenes may have certain functions.

 To predict the expressed processed pseudogenes (EPψgs), infer their regulatory roles and investigate how they are transcribed, we predicted EPψgs considering strand information and analyzed characteristics of CpG sequence patterns in promoter regions.  As a result, 16.1% and 9.0% of Pψgs in human and mouse, respectively, were observed as EPψgs, some with the expression from  both sense and antisense strands.  There was no significant increase of CpG O/E value around transcriptional start site.  We suggest that Pψgs may have some functions and a transcriptional regulation of EPψgs may be different from general transcriptional regulation of functional genes.

 

 

 

 

 

1  Introduction

 

 Pseudogenes are genomic DNA sequences that are homologous to functional genes.  They do not code proteins as there are many mutations i.e. frameshifts and/or in-frame stop codons, in coding regions.  Therefore, pseudogenes are thought to be simply molecular fossils.  However, a processed pseudogene Makorin1-p1, which had an important function, was found recently in mouse [2].  Additionally, it is known that expressed pseudogene acts as a natural antisense regulator of homologous functional gene in snail [3].  These facts suggest that the pseudogenes may have certain functions.  To predict the expressed processed pseudogenes (EPψgs), infer their regulatory roles and investigate how they are transcribed, we predicted EPψgs considering strand information and analyzed characteristics of CpG sequence patterns in promoter regions.

 

2  Method and Results

 

 Human and mouse processed pseuodogene (Pψgs) data were downloaded from Pseudogene.org [5].  Exon information were checked to confirm that they were indeed Pψgs.  We obtained 4900 and 2511 Pψgs for human and mouse respectively.  Sequences in dbEST, RefSeq, and Unigene were downloaded from the NCBI FTP server.  We also used cDNA sequence of H-invitational and FANTOM. The outline of our method to extract EPψgs candidates is shown in Fig. 1.  Pseudogenes were predicted as EPψgs if cDNA sequences contained sequence patterns specific to pseudogenes.  cDNA sequences homologous to pseudogene sequences were checked with the following strict criteria: (1) cDNAs can be mapped to the same region as the pseudogenes on the genome, (2) cDNAs have a higher identity to the pseudogene sequence than to the functional gene sequence, (3) cDNAs have more Pψgs specific regions than those of functional gene and (4) cDNA sequences contain frameshifts or stop codons.  Encoded strand information was also analyzed using sim4.

 Observed / Expected (O/E) values were calculated for  the promoter regions of the EPψcandidates predicted by full-length cDNA data, and compared to those of DBTSS data, functional genes, and intronic regions (negative control).

 

 

Figure 1: Outline of our method to extract EPψgs candidates method

 

 

 

Using the above method, 788 and 227 EPψgs candidates (16.1% and 9.0% of Pψgs ) were found in human and mouse, respectively (Table 1).  As a result of strand analysis, at least 13.8% and 16.3% of EPψgs seemed to be expressed from the antisense of the Pψgs in human and mouse, respectively (Table. 2).

 There were no increases of CpG O/E value around TSS in neither strand of EPψgs in neither human nor mouse (Fig. 2 ).  The same patterns were observed for intronic regions. In contrast, a significant increase was observed around TSS in functional genes and DBTSS.

 

 

 

Table 1: Number of EPψgs candidates found in human(above) and mouse(below)

 

 

 

 

 

Table2: Number of EPψgs that strand information was confirmed by exon-intron consensus sequences.

cDNAs were mapped to the genome using sim4, which considers the exon-intron consensus sequences such as GT-AG, to validate the strand information.  2426 and 702 cDNAs were used for human and mouse, respectively.  18% (85/476) and 11% (13/120) of cDNAs were mapped to the different strand from the database.

 

 

 

 

 

  

Figure 2: Average of CpG O/E ratio around TSS. (right: humnan, left:mouse)

 

 

 

3  Discussions

 

 In this work, we found that 16.1% and 9.0% of human and mouse Pψgs are expressed respectively.  These percentages are much higher than those in Harrison’s work [1] (4~6% in human) or Yano’s work [4] (2~3% in human and 0.5~1% in mouse) due to our usage of a number of databases and our prediction algorithm focusing on pseudogene specific regions.  We also found that there could be many EPψgs, which act as the antisense regulator of homologous gene.  In CpG analysis, there were no increase of the O/E ratio around TSS in neither sense nor antisense of EPψgs, suggesting that a transcriptional regulation of EPψgs could be different from general transcriptional regulation of functional genes.

 

4          Acknowledgements

 

 We are grateful to Dr. Rintaro Saito for scientific discussions and advices.  We also acknowledge Prof. Masaru Tomita for his comprehensive supports.

 

 

Publications

 

“Large-Scale Analysis of Expressed Pseudogenes in Higher Eucaryotes

Kochi H, Saito R, Tomita M, Genome Informatics 2005; 16; P098

 

Oral Contributions

 

高等生物の転写される偽遺伝子のコンピュータ解析

斎藤輪太郎,河内隼太郎,冨田勝,新しいRNA/RNPを見つける会 in 鶴岡,2005

 

Poster Presentations

 

“Large-Scale Analysis of Expressed Pseudogenes in Higher Eucaryotes

Kochi H, Saito R, Tomita M, the Sixteenth International Conference on Genome Informatics GIW 2005, 2005

 

References

[1] Harrison, PM., Zheng, D., Zhang, Z., et al., Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability, Nucleic Acids Res., 33(8):2374-2383, 2005.

[2] Hirotsune, S., Yoshida, N., Chen, A., et al., An expressed pseudogene regulates the messenger-RNA stability of its homologous coding gene, Nature, 423(6935):91-96, 2003.

[3] Korneev, SA., Park, JH.,  O’Shea, M., Neuroral expression of neural nitric oxide synthase (nNOS) protein is suppressed by an antisense RNA transcribed from an NOS pseudogene, J Neurosci., 19(18):7711-7720, 1999.

[4] Yano, Y., Saito, R., Yoshida, N., et al., A new role for expressed pseudogenes as ncRNA: regulation of mRNA stability of its homologous coding gene, J Mol Med., 82(7):414-422, 2004.

[5] Zhang, Z., Carriero, N., Gerstein, M., Comparative analysis of processed pseudogenes in the mouse and human genomes, Trends Genet., 20(2):62-67, 2004.