<<Taikichiro Mori Memorial Research Fund>>

Graduate Student Researcher Development Grant Report


Research Project: Automatic translation system for multilingual compound words

Project Researcher: Vo Ho Bao Khanh


Research Contents

Compound words are highly frequent and productive in many languages, especially in Japanese. Therefore, translating Japanese compound words to other languages is indispensable in multilingual processing applications such as machine translation and mutlilingual information retrieval that require the exact translations of compounds of words. However, compound word translation is a difficult and challenging task due to variable compound structures, multiple relations of constituents within compound words, and several possible translations of each constituent. Machine translation of Japanese compound words to other popular languages, yet to less popular language, has been done to solve these difficulties thanks to the adequate multilingual data and the excellent morphological processing tools of popular languages. We aim to automatically translate Japanese compound words to a less popular language in this thesis with the hope to propose a general translation framework for less popular languages. We chose Vietnamese as the target language because the need of understanding Japanese in Vietnam has become a huge demand since the two countries established a full-fledged relation.


Research Activity Results

 Our proposed approach, in general, consists of two phases: generation and selection. In the first phase, we applied the morphologically-based compositional translation method that utilized inflected features, grammatical features and semantic links of Japanese constituents to generate translation candidates regardless of the sparseness of the dictionary. We then selected the most likely translation candidates by evaluating the term frequencies of the generated Vietnamese candidates with the help of available search engines. This selection method is appropriate for less popular languages which are limited at language processing tools and good quality corpora. We developed the actual implementation of our approach and evaluated it for various data sets. The results show that our approach is effective and adaptive for not only Vietnamese but also other languages. It can also be applicable for other researches such as multiword expression compilation and multilingual information retrieval research.

All these phases have been done, and the retrieved results are very good, about 80% of translated Japanese compound words are correct.



In general, I have finished the main tasks that I proposed in Mori Grant about one year ago. I have finished translating Japanese compound words to Vietnamese with quite high accuracy in translated results.