Mori Scholarship Program Achievement Report
for the Year 2000

Research Topic

ドキュメントデータ群を対象とした意味的連想処理機構による
動的クラスタリング方式の実現
(A Context Dependent Semantic Clustering Method for Document Data
using the Semantic Associative Processing Method)

Assigned Project and Group:		Status
Novel Computing Project		M2 in Graduate School of Media and Governance
Kiyoki Seminar Group		Dai Sakai
		t95620ds@mdbl.sfc.keio.ac.jp

Introduction

In pursuing the research topic, "A Context Dependent Semantic Clustering Method for Document Data using the Semantic Associative Processing Method", there have been several fruitful accomplishments in this academic season. Ideas proposed while constructing systems and writing conference papers have improved the dynamic clustering method. Now the respective research include the study of a semantic filtering method and metadata restructuring of document data, which can be used to improve efficiency of the dynamic clustering method. The abstract of my research topic is first discussed, then a detail about this season's works will be shown.

A Context Dependent Semantic Clustering Method for Document Data using the Semantic Associative Processing Method

The proposal of a semantic document mining method (dynamic clustering method) with context recognition will be an entire theme of my master's thesis (graduating in September 2001). Its contents will mainly include:

A dynamic re-structuring method of metadata according to contexts
A context dependent information filtering method of document data
A semantic clustering method with context recognition for document data

Each method described here is an effective method for enabling efficient document mining. Combinations of above methods according to various situations makes possible various applications, such as a dynamic clustering method with context recognition after the removal of irrelevant document data by re-structuring of metadata and/or semantic filtering method. In this academic season, a framework of the theme, "a semantic document mining method (dynamic clustering method) with a semantic filtering method", has been built through several achievements. They are shown in the following section.

Achievements in this Academic Season

Writing and submitting a paper, "A Context Dependent Semantic Clustering Method for Document Data". The following proposals made in this paper helped to improve the semantic document mining method with a semantic filtering method.
1. Creation of medical document data in English and a semantic space specialized for those data.
2. Defining an evaluation method of precision of semantic clustering.
3. Creating an experimental system.
4. Performing experiments and gaining result data.
Writing and submitting two papers concerning A Context Dependent Information Filtering Method of Document Data and its Application to Document Mining (two papers are submitted and currently in process of evaluation by respective conference) These papers include the following proposals that serve important roles in the system equipped with the dynamic document mining method with semantic information filtering
1. Proposing a method to equalize (2-normalizing) an amount of information included in metadata of each document datum for semantic analysis.
2. Proposing a method to evaluate relevancy of document data with respect to the user's viewpoint.
3. Defining a removing method of irrelevant data.
4. Choosing an evaluation method for filtering ability.
5. Defining a way for applying the semantic clustering method to the semantic information filtering method.
6. Defining an evaluation method for clustering after semantic information filtering is performed.
7. Creating an experimental system.
8. Performing experiments and gaining result data.

Brief Summary of Studies Promoted in this Academic Year

In this section, abstracts of written papers are reproduced, in order to introduce the basic ideas proposed in this academic season. Here, "we" refer to myself and co-authors (Dr.Kiyoki, Naofumi Yasushi, Dr.Kitagawa).

1."A Context Dependent Semantic Clustering Method
for Document Data"

In this paper we propose a context dependent semantic clustering
method for document data. The main feature of the method is
semantically and dynamically clustering document data according to a
given context. We use a given context, or a viewpoint of a searcher,
to clarify semantic relationship among document data whose meaning
varies according to a context. By using this method, we can obtain a
set of semantic clusters each containing semantically similar
documents from a set of raw document data. The dynamic interpretation
of meaning of document data with respect to a given context or a view
is realized by a semantic projection operation to select a semantic
subspace of the orthogonal image space where document data are mapped.
We propose three algorithms to stress and clarify characteristics of
each document datum in the semantic subspace. By using these
algorithms, we can clarify and suitably obtain semantic clusters. The
method is applied to document data containing a set of words as
metadata, which are attached automatically or semi-automatically from
the document data. The semantic correlation of metadata of each
document datum with a given context is evaluated by creating a
characterized vector from the metadata by the use of the semantic
associative processing method. The method enables efficient knowledge
acquisition from document data according to a given context or a
user's viewpoint. We clarify the feasibility and effectiveness of the
method by showing several experimental results.

2."A Semantic Information Filtering and Clustering Method
for Document Data with a Context Recognition Mechanism"

(Two papers concerning this topic are submitted and now in evaluation process by two conferences)

In this paper we propose a semantic information filtering method
with context recognition and its application to document mining. The
filtering method is able to remove irrelavant data that are less
correlated in meaning with given context words.

Our method is based on an idea that meaning of document data varies
with contexts or viewpoints. By filtering out the target data items
with low semantic correlation with given context words, information
retrieval and data mining become effective because analysis of data is
only performed on data items with high correlation with the given
context words.

This filtering is realized by the use of the Semantic Associative
Processing Method. Dynamic semantic analysis of document data
according to the given context words is made possible by a semantic
projection operation to select a semantic subspace of an orthogonal
multiple dimension space where document data are mapped.

In this proposal, we apply this filtering method on a semantic
clustering method. The meaning of document data obtained after the
information filtering are analyzed according to the given context
words. In the clustering process, semantic correlation of document
data with respect to each other is calculated in the subspace to form
distance matrix. A semantic distance calculation formula designed
based on the machinery of the semantic subspace is applied to stress
the characteristics of document data as the distances are computed. By
this process, relevant clusters can be obtained.

The feasibility of our information filtering method and its
application to a semantic clustering is shown in 2 experiments. The
application of the context dependent semantic filtering of data on a
document mining method enables efficient knowledge acquisition from
document data according to a given context words.

Future Works

The specific goals of research activity for the upcoming academic season are as follows:

Finish the creation of a system for metadata restructuring method.
Prepare and practice for upcoming presentations for an interval presentation for the master's research. Get ready for two papers in the evaluation process if they are accepted by respective conferences.
Creating a integrated system implemented with three methods introduced in the master's thesis, which are a semantic clustering method with context recognition, a semantic information filtering method with context recognition, and a dynamic re-structuring of metadata. This process needs a lot of programming.
Writing a master's thesis. This should be accompanied by careful planning and outlining.

I have stated here the research accomplishment of this academic year. The money funded by the Mori Scholarship Program has been a great help.

Dai Sakai 2/6/01

Mori Scholarship Program Achievement Report for the Year 2000

Research Topic

ドキュメントデータ群を対象とした意味的連想処理機構による動的クラス タリング方式の実現(A Context Dependent Semantic Clustering Method for Document Data using the Semantic Associative Processing Method)