Hayataro
Kochi
Graduate
Abstract
Pseudogenes are genomic
DNA sequences that are homologous to functional genes. They do not code proteins as there are
many mutations i.e. frameshifts and/or in-frame stop codons, in coding regions. However, a processed pseudogene
(Pψgs)
Makorin1-p1, which had an important function, was found recently in mouse. Additionally, it is known that expressed
pseudogene acts as a natural antisense
regulator of homologous functional gene in snail. These facts suggest that some of the
expressed pseudogenes may have certain functions.
To predict the expressed processed pseudogenes (EPψgs), infer their
regulatory roles and investigate how they are transcribed, we predicted EPψgs considering
strand information and analyzed characteristics of CpG
sequence patterns in promoter regions.
As a result, 16.1% and 9.0% of Pψgs in human and
mouse, respectively, were observed as EPψgs, some with
the expression from both sense and antisense strands.
There was no significant increase of CpG O/E
value around transcriptional start site.
We suggest that Pψgs may have
some functions and a transcriptional regulation of EPψgs may be
different from general transcriptional regulation of functional genes.
Pseudogenes are genomic
DNA sequences that are homologous to functional genes. They do not code proteins as there are
many mutations i.e. frameshifts and/or in-frame stop codons, in coding regions. Therefore, pseudogenes
are thought to be simply molecular fossils. However, a processed pseudogene
Makorin1-p1, which had an important function, was found recently in mouse [2]. Additionally, it is known that expressed
pseudogene acts as a natural antisense
regulator of homologous functional gene in snail [3]. These facts suggest that the pseudogenes may have certain functions. To predict the expressed processed pseudogenes (EPψgs), infer
their regulatory roles and investigate how they are transcribed, we predicted
EPψgs considering
strand information and analyzed characteristics of CpG
sequence patterns in promoter regions.
Human and mouse processed pseuodogene (Pψgs)
data were downloaded from Pseudogene.org [5]. Exon
information were checked to confirm that they were indeed Pψgs. We obtained 4900 and 2511 Pψgs
for human and mouse respectively. Sequences
in dbEST, RefSeq, and Unigene were downloaded from the NCBI FTP server. We also used cDNA
sequence of H-invitational and FANTOM. The outline of our method to extract EPψgs
candidates is shown in Fig. 1. Pseudogenes were predicted as EPψgs
if cDNA sequences contained sequence patterns
specific to pseudogenes. cDNA sequences
homologous to pseudogene sequences were checked with
the following strict criteria: (1) cDNAs can be
mapped to the same region as the pseudogenes on the
genome, (2) cDNAs have a higher identity to the pseudogene sequence than to the functional gene sequence,
(3) cDNAs have more Pψgs
specific regions than those of functional gene and (4) cDNA
sequences contain frameshifts or stop codons. Encoded
strand information was also analyzed using sim4.
Observed / Expected (O/E) values were
calculated for the promoter regions
of the EPψcandidates predicted by full-length cDNA data, and compared to those of DBTSS data, functional
genes, and intronic regions (negative control).
Figure 1: Outline of our
method to extract EPψgs
candidates method
Using
the above method, 788 and 227 EPψgs
candidates (16.1% and 9.0% of Pψgs
) were found in human and mouse, respectively (Table 1). As a result of strand analysis, at least
13.8% and 16.3% of EPψgs
seemed to be expressed from the antisense of the Pψgs
in human and mouse, respectively (Table. 2).
There were no increases of CpG O/E value around TSS in neither strand of EPψgs
in neither human nor mouse (Fig. 2 ).
The same patterns were observed for intronic
regions. In contrast, a significant increase was observed around TSS in functional
genes and DBTSS.
Table
1: Number of EPψgs
candidates found in human(above) and mouse(below)
Table2: Number of EPψgs that strand
information was confirmed by exon-intron consensus
sequences.
cDNAs
were mapped to the genome using sim4, which considers the exon-intron
consensus sequences such as GT-AG, to validate the strand information. 2426 and 702 cDNAs
were used for human and mouse, respectively. 18% (85/476) and 11% (13/120) of cDNAs were mapped to the different strand from the database.
Figure 2: Average of CpG O/E ratio
around TSS. (right: humnan, left:mouse)
In this work, we found that 16.1% and
9.0% of human and mouse Pψgs
are expressed respectively. These
percentages are much higher than those in
We are grateful to Dr. Rintaro Saito for scientific discussions and advices. We also acknowledge Prof. Masaru Tomita for his comprehensive supports.
“Large-Scale Analysis of Expressed Pseudogenes in Higher Eucaryotes”
Kochi H, Saito R,
Tomita M, Genome Informatics 2005; 16; P098
“高等生物の転写される偽遺伝子のコンピュータ解析”
斎藤輪太郎,河内隼太郎,冨田勝,新しいRNA/RNPを見つける会 in 鶴岡,2005
“Large-Scale Analysis of Expressed Pseudogenes in Higher Eucaryotes”
Kochi H, Saito R, Tomita M, the Sixteenth
International Conference on Genome Informatics GIW 2005, 2005
[1]
[2] Hirotsune,
S., Yoshida, N., Chen, A., et al., An
expressed pseudogene regulates the messenger-RNA
stability of its homologous coding gene, Nature,
423(6935):91-96, 2003.
[3] Korneev,
SA., Park, JH., O’Shea, M., Neuroral expression of neural nitric oxide synthase (nNOS) protein is
suppressed by an antisense RNA transcribed from an
NOS pseudogene, J
Neurosci., 19(18):7711-7720, 1999.
[4] Yano, Y., Saito, R.,
Yoshida, N., et al., A new role for
expressed pseudogenes as ncRNA:
regulation of mRNA stability of its homologous coding gene, J Mol Med., 82(7):414-422, 2004.
[5] Zhang, Z., Carriero, N., Gerstein, M., Comparative analysis of
processed pseudogenes in the mouse and human genomes,
Trends Genet., 20(2):62-67, 2004.