Mori Scholarship Program Achievement Report
for the Year 2001

Research Topic

文脈依存型意味的ドキュメントマイニング方式
(A Context Dependent Semantic Document Mining Method for Document Data

Assigned Project and Group:		Status
Novel Computing Project		M2 in Graduate School of Media and Governance
Kiyoki Seminar Group		Dai Sakai
		t95620ds@mdbl.sfc.keio.ac.jp

Introduction

In pursuing the research topic, "A Context Dependent Semantic Document Mining Method for Document Data", there have been several fruitful accomplishments in this academic season. Ideas proposed while constructing systems and writing a conference paper and a master's thesis have improved the semantic document mining method. Now the respective research has accomplished realization of metadata reformation method and its application to a semantic clustering method. The abstract of my research topic is first discussed, then a detail about this season's works will be shown.

A Context Dependent Semantic Document Mining Method for Document Data

The proposal of a semantic document mining method with a context recognition mechanism has been a main theme of my master's thesis. It is realized by combining following algorithms.

A dynamic reformation method of metadata according to a given context
A context dependent information filtering method of document data
A semantic clustering method with context recognition for document data

Each method described here is effective for enabling efficient document mining. Combinations of above methods according to various situations realize the context dependent semantic document mining method. A dynamic clustering method with context recognition can be carried out after execution of restructuring of metadata and/or semantic filtering method. By this semantic document mining method, efficient knowledge extraction according to situations or the analyzer's viewpoint are enabled.

In this academic season, realization of a metadata reformation method as a system and conceptualizing a semantic document mining method (in a process of writing a thesis) have been main accomplishments.

Overview of Achievements in this Academic Season

Two papers have been accepted by two conferences, one of which is an international. I have presented research results in respective conferences.
Realizing metadata reformation method as a system.
Conducting experiments to show feasibility of the reformation method as well as its application to a semantic clustering method.
Writing a paper that explains in details about the metadata reformation algorithms as well as experimental results.
Writing a master's thesis.

Details of each Achievement

Presentation at Conferences Two papers concerning a semantic information filtering method and its application to a semantic clustering method have been accepted by "Database Engineering Work Shop (IEICE)" and "11th European-Japanese Conference on Information Modelling and Knowledge Bases" in February. With respect to the former conference, I have done presentation in early March (2001) at Atami, Izu. With respect to the latter conference, my presentation has been engaged in early June (2001) at Maribor, Slovenia. The papers have been published from those two conferences.
[3] D. Sakai, Y. Kiyoki, N. Yoshida, T. Kitagawa ``A Context Dependent Information Filtering Method of Document Data and its Application to Document Mining'' 12th Database Engineering Work Shop Proceedings,IEICE(電子情報通信学会), 2001.
[4] D. Sakai, Y. Kiyoki, N. Yoshida ``A Semantic Information Filtering and Clustering Method for Document Data with a Context Recognition Mechanism'' The 11th European-Japanese Conference On Information Modelling And Knowledge Bases, 2001.
Realizing Metadata Reformation Method: I have proposed a semantic clustering[2] and a semantic information filtering[3][4] in the previous academic season. Papers about them have been published. In this academic season, a metadata reformation method has been realized as a system. There have been a study about the metadata reformation method[1]. I have applied the idea presented in [1] to the semantic clustering method[2] in this academic season. I have planned a system implemented with the metadata reformation algorithms and coded a program of 1000 lines using C, Perl, and php3. It now works as a preprocessing algorithm of the semantic clustering method in the context dependent semantic document mining method.
I have also planned and coded a program that connects a sysmtem implemented with the metadata reformation algorithms and a system equipped with the semantic clustering method in this academic season.
Experiment: After construction of a system implemented with the metadata reformation algorithms, effectivness of applying metadata reformation to semantic clustering is assessed in experiment. This experiment is done using 240 English medical document data. By comparing classification ability of the semantic clustering method after and before application of semantic filtering, efficiency of semantic filtering have been evaluated. Experimental results have been shown in the thesis paper.
Writing Paper: Details about the metadata reformation method, its application to the semantic clustering method, and experimental results have been written in a paper. (This paper has been collected in the thesis paper.) This paper is written to prepare for a conference submission.
Writing a Thesis Paper: In this academic season, I have engaged in writing a thesis paper that collects concepts and algorithms of the semantic document mining method (semantic clustering, metadata reformation, and semantic information filtering). This process has been important in putting all concepts and algorithms into a single framework. The paper also includes details of 5 different experiments and results.

In this section, abstracts of the paper about the metadata reformation method and the thesis are reproduced, in order to introduce the basic ideas and systems realized in this academic season. Here, "we" refer to myself and co-authors (Dr.Kiyoki, Naofumi Yasushi, Dr.Kitagawa).

1. A Metadata Reformation Method and its Application to a Semantic Clustering Method

In this paper we present a metadata reformation method and its
application to a dynamic clustering method. The main feature of the
method is semantically analyzing metadata given to document data and
reforming them by removing words from them that are only weekly
related to a given context. This method is applied to a semantic
document clustering according to a related context. By applying
metadata reformation to document data, we can obtain a set of clusters
each containing semantically similar documents. The aim of this
algorithm is to realize an efficient data mining system for documents. 
We clarify the feasibility of the method by showing several
experimental results.

2.Thesis: A Dynamic Document Mining Method with a Context Recognition Mechanism using the Semantic Associative Search Method

In this thesis, a dynamic document mining method with a context
recognition mechanism by the Semantic Associative Search Method is
introduced. The main feature of this method is to enable efficient
knowledge extraction from document data by analyzing meanings of the
data according to a given context (the analyzer's viewpoint). This
method works by first reducing irrelevancy of target document data by
a metadata reformation method and a semantic information filtering
method, and then semantically clustering the data by a semantic
clustering method according to the given context. The dynamic document
mining method is realized by the Semantic Associative Search Method.

I have stated here the research accomplishment of this academic year. The money funded by the Mori Scholarship Program has been a great help.

Dai Sakai

References


[1] Yukinobu Kitagawa, ``Research about Context-Dependent Metadata
Formation using the Semantic Associative Search Method'', a graduation
thesis issued by Tsukuba University.
\end{verbatim}

[2] D. Sakai, Y. Kiyoki, N. Yoshida ``The Dynamic Clustering Algorithm
for Documents using the Semantic Associative Processing Method'' 11th
Database Engineering Work Shop Proceedings,IEICE, 2000.

Mori Scholarship Program Achievement Report for the Year 2001

Research Topic

文脈依存型意味的ドキュメントマイニング方式(A Context Dependent Semantic Document Mining Method for Document Data