<<Taikichiro Mori Memorial Research
Fund>>
Graduate Student Researcher Development Grant Report
Research Project: Automatic translation system for multilingual compound
words
Project Researcher: Vo Ho Bao Khanh
Research Contents
Compound words are highly frequent and productive in many languages,
especially in Japanese. Therefore, translating Japanese compound words to other
languages is indispensable in multilingual processing applications such as
machine translation and mutlilingual information
retrieval that require the exact translations of compounds of words. However,
compound word translation is a difficult and challenging task due to variable
compound structures, multiple relations of constituents within compound words,
and several possible translations of each constituent. Machine translation of
Japanese compound words to other popular languages, yet to less popular
language, has been done to solve these difficulties thanks to the adequate
multilingual data and the excellent morphological processing tools of popular
languages. We aim to automatically translate Japanese compound words to a less
popular language in this thesis with the hope to propose a general translation
framework for less popular languages. We chose Vietnamese as the target language
because the need of understanding Japanese in Vietnam has become a huge demand
since the two countries established a full-fledged
relation.
Research Activity Results
Our proposed approach, in general, consists of two phases: generation
and selection. In the first phase, we applied the morphologically-based
compositional translation method that utilized inflected features, grammatical
features and semantic links of Japanese constituents to generate translation
candidates regardless of the sparseness of the dictionary. We then selected the
most likely translation candidates by evaluating the term frequencies of the
generated Vietnamese candidates with the help of available search engines. This
selection method is appropriate for less popular languages which are limited at
language processing tools and good quality corpora. We developed the actual
implementation of our approach and evaluated it for various data sets. The
results show that our approach is effective and adaptive for not only Vietnamese
but also other languages. It can also be applicable for other researches such as
multiword expression compilation and multilingual information retrieval
research.
All these phases have been done, and the retrieved results are very
good, about 80% of translated Japanese compound words are correct.
Conclusion
In general, I have finished the main tasks that I proposed in Mori
Grant about one year ago. I have finished translating Japanese compound words to
Vietnamese with quite high accuracy in translated results.