2006年度
森基金報告書
研究課題名:遺伝子発現がタンパク質間相互作用に与える影響の解析
慶應義塾大学 政策・メディア研究科 博士課程1年
北川統之
1.
Introduction
As phenotypes of organisms are not fully
described by only few sets of genes, we should take biochemical networks such
as Protein-Protein Interaction (PPI) and Protein-DNA Interaction (PDI) into
consideration. Recent progress of high-throughput PPI detection methods are
producing vast amount of PPI data in human (Rual
et al., 2005) as well as other model organisms. With the increase of data,
various tools and methods have been developed to analyze the data. However,
most of those methods are effective for only unicellular organisms. Higher
eukaryotes have highly differentiated cells and the gene expression patterns
vary greatly among the cell types. In addition, higher eukaryotes have a wide
variety of post-transcriptional modification, such as alternative splicing
(AS), polyadenylation and RNA editing. Among those modifications, AS would have
the largest impact on PPI networks (Figure 1).
Figure 1. A model of a protein interaction map in
higher eukaryotes. A reconstructed PPI network using high-throughput PPI data
cannot be used to retrieve biological meanings because highly differentiated
cells have different gene expression and post-transcriptional modification.
More than half of human genes thought to be
alternatively spliced. That means many PPI motifs might be alternatively used
as shown in Figure 2. There is a research which demonstrated alternative
splicing are statistically not related to the border of protein motifs (Offman
et al., 2004). The result of the research seems to suggest AS is not important
for PPI networks. However, the result also demonstrated AS events do not avoid
PPI motifs, suggesting AS could change PPI networks greatly if different AS
variants are expressed in different types of cells. As a lot of gene expression
data support this assumption, I decided to investigate the impact of AS on PPI
networks concretely.
Figure 2. A model of a gene which has alternatively
deleted PPI motif. This gene would not interact other proteins through the PPI
motif when the variant 2 are expressed dominantly.
2.
Materials and Methods
I used Ensembl Release 40 (Birney
et al., 2006) as MySQL data, especially human gene expression (EST) and protein
features (signal peptide, transmembrane, low complexity, coil, InterPro) in
BioMart. These data were downloaded from ftp.ensembl.org in September 2006.
Table 1. Materials (Ensembl release-40 coding gene
and transcript data)
Although Ensembl registered genes are more
than described in Table 1 (e.g., 31,718 genes and 57,048 transcripts in human
data), I used coding genes as parent population in this study because the
objective are investigating alternative protein motifs.
I also used protein descriptions of
InterPro (Mulder
et al., 2007), pathway data of KEGG (Kanehisa
et al., 2006) through KEGG API, and AEdb, manual generated alternative splicing
data in ASD (Stamm
et al., 2006).
In Ensembl protein features, coiled coil
regions are annotated with ncoils program (Lupas
et al., 1991), low-complexity regions are annotated with SEG program (Wootton
and Federhen, 1996),
signal sequence regions are characterised with SignalP (Nielsen
et al., 1997; Nielsen and Krogh, 1998; Bendtsen
et al., 2004), and transmembrane regions are annotated with TMHMM. (Sonnhammer
et al., 1998; Krogh et al., 2001)
3.
Results and Discussion
A few information about experimentally
demonstrated alternative splicing events that would change biochemical
reactions
To investigate how much experimentally
demonstrated alternative motifs exist, I searched “Regulary feature” in
sequence data file of manual generated AS database AEdb. As a result, 55.4% of
all entries (2244) were unclear. “Stop codon” and “Frameshift” were 11.6% and
6.7%, respectively. The entries including “bind” in the other 26.3% were only
64 (2.9%). This means binding regulatory AS is small ratio as Offman et al. suggest, or it is difficult to detect the binding function of AS
event experimentally. In the 64 entries, human, mouse, Norway rat were 23, 20,
9 entries, respectively. This result suggests it is important to analyze
genome-wide transcript data from the perspective of alternative motifs.
Higher eukaryotes tend to have more
alternative motif genes
To investigate genome-wide alternative
protein motifs and the difference among model eukaryotes (H. sapiens, M.
musculus, D. melanogaster and C. elegans), I used common Ensembl protein features (coiled coil, signal
peptide, transmembrane, low complexity). The results are shown in Table 2.
Table 2. The number of genes which have variable
motifs. “Variable genes” means the genes which have the motif when expressed as
specific variants of the alternative splicing events.
The ratios of the number of variable genes
to the number of genes are shown in Figure 3. This result suggests higher
eukaryotes tend to have more alternative motif genes. That also suggests that
the variety of phenotypes of higher eukaryotes are caused by alternative
motifs, considering the number of human genes are fewer than that of predicted.
As shown in Figure 3, low complexity motifs are likely to have less important
protein motifs than the other protein features. In addition, low complexity
regions could be found in many genes for their weak consensus, suggesting low
complexity region might be more highly estimated than they are.
Figure 3. The rate of genes which have variable motifs
among the transcripts.
To confirm that this result was not
influenced by the number of transcripts data for each species, I calculated the
rate of transcripts for each coding gene which is registered in Ensembl (Figure
4). As shown in Figure 4, the rate of M. musculus transcripts are fewer than that of D. melanogaster, and the rate of C. elegans
transcripts are nearly equal to that of M. musculus transcripts. These results suggest the result of Figure 3 is not
strongly influenced by the number of transcripts.
Figure 4. The rate of transcripts for each coding gene
which are registered in Ensembl.
Types of the most influenced protein
motifs by alternative splicing in human
By the analyses of common protein motifs,
it was suggested that the phenotypes of higher eukaryotes could be reflected
the wide variety of alternative motifs. Then I focused on the human alternative
motifs to investigate what types of motifs are deleted in alternative splicing
events.
I investigated the most variable InterPro
motifs, which are deleted in a specific alternative splicing variants (Table
3), and the most variable binding related InterPro motifs for investigating the
changeability of PPI networks (Table 4).
Table 3. Variable
InterPro motifs Table 4. Variable InterPro binding motifs
The result of Table 3 suggests variable
alternative motifs are not restricted to PPI motifs and various types of
biochemical reactions could be changed through AS events. Most of “zinc finger”
works as DNA binding motif, which support IVV’s c-Fos variants result. The C2H2
is highly counted probably because it is well-explored class. Other types of
zinc finger are also found (not shown in Table 3). Many types of kinase region
(e.g. Wall-associated kinase) are found, which suggests AS can change signaling
cascades without PPI changes. “Toropmyosin” variants can vary cytoskeleton
through actin-binding, which is also supported by the result of Table 4. Pleckstrin
homology domain (PH domain) is also thought to participate in signaling
cascades. The function of “EGF-like region” is not yet clear. However, the
motif is found in extracellular domain and there is a research which report
that “Immunoglobulin-like” domain is probably essential for efficient
interaction of an EGF-like domain with ErbB receptors (Eto
et al., 2006). The number of “Proline-rich region” possibly reflects its low
specificity. These results suggest alternative motifs could regulate signaling
cascades via various reactions such as PPI, PDI, receptor-ligand interaction,
kinase activity, and cytoskeletal changes.
Alternative Ca2+ binding
motifs and their expression patters
Table 4 shows the result of binding-related
alternative motifs. This result is consistent with that of Table 3. As the result
suggest the importance of Ca2+ binding, I conducted further
investigation about alternative motifs which can change Ca2+
binding.
To investigate the importance of each
alternative Ca2+ binding gene in signaling cascades, I mapped the
genes which would change Ca2+ binding to KEGG pathways by KEGG API.
An example of the results is shown in Figure 5. The path which connect the
genes and Ca2+ would be cut when the alternative variants are
dominantly expressed.
Figure 5. An example of the pathways including genes which have variable calcium binding motifs. Ras signaling and/or NFAT regulated gene expression would change in MAPK signaling cascades when the motifs are not in dominantly expressed AS variants.
AS variants which do not have
calcium-binding motif could be aberrant variants. Therefore, I examined the
tissues in which AS variants having calcium-binding motif or not. As a result,
only Calcineurin subunit B gene (Ensembl gene ID: ENSG00000115953) has
different expressed tissue pattern. In pathological data, AS variants which do
not have calcium-binding motif are expressed at 'retinoblastoma', 'leukemia',
'carcinoid', 'ascites' and 'lymphoblastic'. These are not described in OMIM (601302),
suggesting the AS variants cause those diseases via changing signaling
cascades.
Changing protein motif via AS might be a
cause of neural diversity in human
To investigate whether motif changing AS
events are important for diversity of the human phenotype, I examined the rate
of genes which have variable motifs (signal peptide, coiled coil,
transmembrane, low complexity) to genes which do not have variable motifs. The
result is shown in Table 5. Many neural cells related tissues are ranked in
top10, especially in signal peptide. That might not reflect the bias of the
original dataset because low complexity motifs are thought to be less important
because of its weak consensus and neural cell related tissues are not highly
ranked in low complexity.
Table 5. The rate of genes which have alternative motifs and expressed in each tissue (anatomy). The rate was calculated as the ratio of alternative motif genes to non-alternative motif genes. Red-colored anatomies are neural cell related.
Until now, some research has tried to link the diversity of human phenotype and post-transcriptional modification especially AS, however strong evidence was not shown in those researches. Taking account of this study, motif changing AS would be important rather than AS events itself, also suggesting not all AS variants are biologically essential.
References
Bendtsen, J.D., Nielsen, H., von Heijne, G.
and Brunak, S. (2004) J Mol Biol, 340, 783-95.
Birney, E.,
Andrews, D., Caccamo, M., Chen, Y., Clarke, L., Coates, G., Cox, T.,
Cunningham, F., Curwen, V., Cutts, T., et al. (2006) Nucleic Acids Res, 34, D556-61.
Eto, K., Eda, K.,
Kanemoto, S. and Abe, S. (2006) Biochem Biophys Res Commun, 350, 263-71.
Kanehisa, M.,
Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S., Katayama,
T., Araki, M. and Hirakawa, M. (2006) Nucleic Acids Res, 34, D354-7.
Krogh, A.,
Larsson, B., von Heijne, G. and Sonnhammer, E.L. (2001) J Mol Biol, 305, 567-80.
Lupas, A., Van
Dyke, M. and Stock, J. (1991) Science, 252, 1162-4.
Mulder, N.J.,
Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., Bork, P.,
Buillard, V., Cerutti, L., Copley, R., et al. (2007) Nucleic Acids Res, 35, D224-8.
Nielsen, H.,
Engelbrecht, J., Brunak, S. and von Heijne, G. (1997) Protein Eng, 10, 1-6.
Nielsen, H. and
Krogh, A. (1998) Proc Int Conf Intell Syst Mol Biol, 6, 122-30.
Offman, M.N.,
Nurtdinov, R.N., Gelfand, M.S. and Frishman, D. (2004) BMC Bioinformatics, 5, 41.
Rual, J.F.,
Venkatesan, K., Hao, T., Hirozane-Kishikawa, T., Dricot, A., Li, N., Berriz,
G.F., Gibbons, F.D., Dreze, M., Ayivi-Guedehoussou, N., et al. (2005) Nature, 437, 1173-8.
Sonnhammer, E.L.,
von Heijne, G. and Krogh, A. (1998) Proc Int Conf Intell Syst Mol Biol, 6, 175-82.
Stamm, S.,
Riethoven, J.J., Le Texier, V., Gopalakrishnan, C., Kumanduri, V., Tang, Y.,
Barbosa-Morais, N.L. and Thanaraj, T.A. (2006) Nucleic Acids Res, 34, D46-55.
Wootton, J.C. and
Federhen, S. (1996) Methods Enzymol, 266, 554-71.