Achievements report for Graduate Student Researcher Development Grant by Taikichiro Mori Memorial Research Fund for the academic year 2009

Student name: Mohamed Helmy Mahmoud Attia Shehata

Student number: 80725860

Student grade: Ph.D., 1^st grade.

Affiliation: Gradute School for Media and Governance

Research project: Rice Proteogenomics project

Introduction

Using the Graduate Student Researcher Development Grant by Taikichiro Mori Memorial Research Fund, I was able to improve my research environment and therefore, achieve better results in my research. This resulted in the contribution in three international conferences. The details of the achievements are mentioned below. The research grant helped me to upgrade my workspace by adding powerful hardware e.g. monitor and memory. Add new mobile workstation by purchasing new powerful Mac Book Pro and some peripheral. Moreover, I used the grant to cover some of the traveling expenses while traveling for the conference.

International Conferences Contributions

1- The 10^th international conference on Systems Biology (ICSB2008). Stanford University, CA, USA 2009.

Date and Venue: August 2009, Stanford University, USA.

Title: Novel genomic features of the rice genome revealed by proteogenomic analysis

Abstract: Rice, considered as one of the world’s most important plants. Almost half of the world population are relaying on rice either totally or partially. Moreover, it’s one of the plant model organisms because of its relatively small genome (12 chromosomes and 430 Mbp). Thus, research projects concerning the rice plant are currently running such as genome sequencing and genome annotation projects. Although, the rice whole genome sequence and annotation were published and updated several time (6 builds to date), there was no attempt to include the whole proteome data in genome the annotation. Here, we followed systems biology approach, integrating experimental data and bioinformatics tools, to improve the rice genome annotation using whole proteome data. We performed mass spectrometry shotgun analysis (LC-MS/MS) on digested peptides extracted from undifferentiated cultured cells in our lab. The peptides were extracted in-gel and in-solution, and fractionated using different fractionation methods (SCX and IEF) to broaden our coverage range. Next, Mascot search against the database of The Institute for Genomic Research (TIGR) was used to perform proteins/peptides identification. Through these steps 5,989 proteins were identified including 69,876 peptides. The raw data files were then compared against the identification results and the identified MS/MS spectra were excluded. Mascot was used to search the remaining spectra against the gene database and transcript database available from TIGR DB. This identification resulted in 577 and 1347 peptides identified from the gene and the transcript databases, respectively. These numbers were reduced through the peptide’s score, identification confidence and minimum length. Thus, we finally got 177 and 697 peptides form the genes and mRNA respectively, indicating existence of miss-annotated regions. The identified peptides from the gene database were blasted against the corresponding transcripts. The blasting results in 80 peptides aligned to regions previously considered non-coding regions. Similarly, the transcript identified peptides will be aligned to the gene database hoping to find more significant genomic features.

Poster (PDF)

2- The 32nd Annual Meeting of the Molecular Biology Society of Japan (MBSJ2009).Yokohama, Japan.

Date and Venue: December 2009, Yokohama, Japan.

Title: Novel proteogenomic approaches to maximize the utilization of MS/MS data for genome reannotation

Abstract: The LC-MS/MS-based proteomics presents incomparable approach to study both whole- and phospho-proteome. The MS/MS data usually compared with reference database to identify the peptide sequences correspond to the MS/MS spectra. However, comparing the MS/MS data with the nucleotide sequence database remains one major challenge, because the enormous size of the database and the MS/MS data. Thus, several methods were developed to reduce the effort and time of searching the nucleotide sequence database by reducing the search space through searching certain genomic features only. Despite these efforts, searching the whole database containing all its features remains main challenge. Here, we present MSRI (MS Spectra Reduction after Identification), a novel method to compare the MS/MS data against protein and nucleotide sequence databases to find novel peptides and genomic features. The main principle in MSRI is to search the MS/MS data against the protein database then remove the MS/MS spectra corresponding to all indentified peptides and search the remaining MS/MS spectra against the nucleotide sequence database. Therefore, we reduce the query size instead of the database size. Using this method in analyzing 27 LC-MS/MS samples of rice cultured cells, performed in our lab, against protein, cDNA and transcript databases, we were able to identify 1,924 novel peptides. The identified peptides were used to point new genomic features, e.g. new coding regions and splicing isoforms, in 210 gene models. Next, we will apply our method to the genome database searching to proof its utility in searching the 6-frame translated databases.

Poster (PDF)

3 - The 20^th International Conference on Genome Informatics (GIW2009). Yokohama, Japan.

Date and Venue: December 2009, Yokohama, Japan.

Title: PGFeval: Software tool and web server for evaluation and visualization of proteogenomic features

Abstract: Utilization of the proteome information in the improvement of the genome annotation, in the so-called proteogenomic studies, became well-appreciated approach. LC-MS/MS proteomics provides translation-level expression evidence for the genes annotated using the conventional genomic methods. For the evaluation and visualization of the genomic features revealed by proteogenomics, generic genome browsers and genome annotation tools are always used. To experimentalists, installation, configuration and using such tools remains challenge and time consuming. Moreover, the generic visualization tools cannot tell if the identified peptides represent novel genomic features or not [Ansong et al, 2008].

Here we present PGFeval (ProteoGenomic Features evaluation and visualization) a software tool and web server specialized for proteogenomic analysis. PGFeval represents an easy-to-use tool that evaluate and visualize the genome annotations obtained from different sources and methods, and stored in standard GFF3 files. In addition, PGFeval analyzes the features obtained from the proteome experiments and show graphical annotation indicates the novelty of these features e.g. peptides from intronic regions, exon acceptor spanning or exon donor spanning.

Analyzing our LC-MS/MS based proteome data obtained from the rice undifferentiated cultured cells using PGFeval, shows new features and miss-annotations in about 100 gene models in the rice genome.

Poster (PDF)