A Semantic Image Search System for Cultural Japanese Images

Ali Ridho Barakbah

Graduate School of Media and Governance, Keio University, Japan


ABSTRACT

This research presents a semantic image search system with an emotion-oriented context recognition function. Our motivation implementing a semantic search with the emotional context is to express user’s impressions for retrieval process in the image search system. This emotional context recognizes the most important features by connecting the user’s impressions to the system. The key issues in this research encompass three ideas: (1) to present emotion-oriented context recognition for image search system, (2) To provide an aggregation function of representative query color generation for processing multi-query images, (3) To realize analytical functions of cluster based image feature extraction by our Pillar algorithm. This research develops an easy and high-precise image search system for representing Japanese cultural image collections that helps the users to find and explore the Japanese cultural images through our image search engine. For experimental purpose, this research performs the retrieval with the Ukiyo-e image dataset from Tokyo Metropolitan Library for representing the Japanese cultural image collections.

Keywords: Image search, emotional context, multi-query images, subspace feature selection, cluster based similarity.


1. INTRODUCTION

The image search systems based on the contents are attracting and challenging in research areas of image searching. Many content-based image retrieval (CBIR) systems have been proposed and widely applied to both commercial purposes and research systems. The system analyzes the content of an image by extracting primitive features such as color, shape, texture, etc. Most approaches have been introduced to explore the content of an image and identify the primary and dominant features inside the image.

Several researches addressed emotional recognition problems for the semantic image search system. The search system commonly constructs the emotion model driven by the user interaction to the system [1]. Park and Lee [2] introduced an emotion-based image retrieval driven by users. The system constructed emotion recognition by analyzing consistency feedbacks from the users. Solli and Lenz [3] developed an image retrieval system involving bags of emotion. The system used color emotion models derived from psychophysical experiments which are activity, weight and heat. However, it has not connected yet directly the queries of emotional expressions to the models. Wang and He [4] presented a survey on emotional semantic image retrieval. The supervised learning techniques usually used to bridge semantic gap between image features and emotional semantics.


2. OBJECTIVE

This research presents a semantic image search system with an emotion oriented context recognition mechanism by connecting a series of emotion expressions to the color based impression. The presented search system addresses a dynamic manipulation of unsupervised emotion recognition. The motivation implementing an emotion context in the image search system is to express user’s impressions for retrieval process. This emotion context recognizes the most important features by connecting the user’s impressions to the image queries. In this system, the Mathematical Model of Meaning (MMM: [4], [5] and [6]) is applied and transformed to the color features with a color impression metric for subspace feature selection. Barakbah and Kiyoki [7] presented how to connect the user’s impressions to the queries by involving a series of emotion contexts (such as “happy”, “calm”, “beautiful”, “luxurious”, etc.) and recognize the most important features for the image dataset and the image query. This research continues Barakbah and Kiyoki [7]'s work by expanding the MMM vector space ([4], [5] and [6]) with the lists impressions in the Color Image Scale. This research also introduces a multi-query image search system by applying an aggregation mechanism to generate representative query colors for processing multi-query images. The Mathematical Model of Meaning (MMM) is applied and transformed to the color features with a color impression metric for subspace feature selection.

This research implements a cluster based similarity measurement in order to tie the similar colors of the subspace color features in a same group in the process of similarity measurement. The system applies our Pillar Algorithm for the cluster based similarity measurement with involving a semantic filtering mechanism to filter out the irrelevant data. Applying Pillar Algorithm for cluster based similarity measurement is important to reach high precision of the clustering result as well as to speed up the computational time of the clustering. Figure 1 shows the system architecture of the our semantic image search system in this research.


Figure 1. System architecture of our semantic image search system

This research develops an easy and high-precise image search system for representing Japanese cultural image collections that helps the users to find and explore the Japanese cultural images through our image search engine. The system uses the Ukiyo-e image datasets from Tokyo Metropolitan Library for representing the Japanese cultural image collections. These datasets contain typical images and artworks of famous paintings in Edo and Meiji era, including Hiroshige, Toyokuni, Kunisada, Yoshitoshi, Kunichika, Sadahige, Kuniteru, etc.


3. SYSTEM DESIGN

An idea in this research to recognize an emotion context in the image search system is to provide a function in which the users can express their impressions (such as “happy”, “calm”, “beautiful”, “luxurious”, etc.) for image search. This function finds the most essential features related to an emotion context, given as the user’s impressions to the image query. The Mathematical Model of Meaning (MMM) is applied for recognizing a series of emotion contexts for retrieving the most highly correlated impressions to the context.

3.1. An Overview of the Mathematical Model of Meaning

In the Mathematical Model of Meaning [4][5][6], an orthogonal semantic space is created for semantic associative search. Retrieval candidates and queries are mapped onto the semantic space. The dynamic interpretation of meaning of data according to the given context words is realized through the selection of a semantic subspace from the entire semantic space that consists of approximately 2000 orthogonal vectors. A subspace is extracted by the semantic projection operator when context words, or the user’s impressions, are given. Thus, vectors of document data in the semantic subspace have norms adjusted accordingly with the given context words. The semantic interpretation is performed as projections of the semantic space dynamically, according to the given contexts. The most correlated information resources to the given context are extracted in the selected subspace by applying the metric defined in the semantic space. Figure 2 shows the semantic association according to the given contexts in MMM. The 2000 Longman vector space is expanded in MMM that was used in [7] to 180 impression words of Color Image Scale. The most highly correlated words to the context are the representative impressions for Color Image Scale in order to select subspace color features.


Figure 2. Semantic interpretation according to contexts in MMM

3.2. Color Feature Extraction

The system extracts color features using 130 basic color features of Color Image Scale [8]. These features consist of non-uniform quantization of RGB color space based on human impression. The features contain 120 chromatic colors and 10 achromatic colors. These features have encompasses 10 hues and 12 tones. Each hue may be bright or dull, showy or sober, and has a number of tones. The tone of a color [8] is the result of the interaction of two factors: brightness or value, and color saturation or chroma. Colors of the same tone are arranged in order of hue, starting from red at the left of the scale. The lines linking colors of the same tone show the range of images that tone can convey [8]. These 130 basic color features will be projected to the lists of impressions. Figure 3 shows the 130 basic color features mapped on RGB color space and used for expressing relations between colors and impressions.


Figure 3. The 130 basic color features mapped on RGB color space and used for expressing relations between colors and impressions

3.3. Representative Query Color Generation

In this research, our semantic image search system provides a multi-query input that allows users to assign the image query more than one image. With this multi-query input, the users have more spaces and flexibility to express what they want to search in the image dataset. For realizing this, the system constructs an aggregation mechanism of representative query color generation for processing multi-query images. The mechanism for representative query color generation can identify non-representative feature and remove them from the selection, as shown in Figure 4.


Figure 4. The identified non-representative colors (indicated by red color) will be removed from query feature extraction

3.4. Subspace Feature Selection

The most highly correlated impressions from MMM, is projected to the Color Impression Metric defined by Color Image Scale [8]. The Color Impression Metric consists of 130 basic color features and 180 key impression words. The projection calculates the relationships between the representative impressions from MMM and key image impression words in the Color Image Scale. The most significant colors which have the highest values of the projection are obtained and then used for selecting the color features among 130 color features of the image dataset and the representative image query colors.

3.5. Semantic Filtering Mechanism

Before clustering the selected subspace color features for similarity calculation, it is important to filter out the irrelevant data items those have low correlation to the emotion contexts. The semantic information filtering was introduced in [9]. It works by providing a mechanism with a way to express user’s impressions. When the users give contexts to express their impressions to the system, the contexts lead number of data items to be low and high correlation to the contexts. By filtering out retrieval candidate data items with low semantic information retrieval with the given contexts, the retrieval process becomes effective because analysis of data is only performed on data items with high correlation with the contexts. By filtering out the irrelevant data, it can reduce number of data items and speed up the computational time. The irrelevant data semantically locates close to zero point in the vector space of the subspace color features. A case-dependent threshold th is used for selecting semantic information filtering. The vectors with norms less than th are considered unnecessary and filtered out from the subspace, as shown in Figure 5. The users can decide a high threshold if they want to filter out a relatively large amount of data and retrieve limited data which are highly related to their impressions, or set the threshold at a lower value so that they gain most data for thorough analysis. Here, the system sets th as average color distances to the zero.


Figure 5. Semantic filtering mechanism for filtering out irrelevant data

3.5. Pillar Algorithm

After applying subspace feature selection, the system then clusters the subspace color features of the image dataset using our Pillar Algorithm [10]. Pillar Algorithm [10] is an algorithm to optimize the initial centroids for K-means clustering. This algorithm is very robust and superior for initial centroids optimization for K-means by positioning all centroids far separately among them in the data distribution. Figure 6 shows illustrations of locating a set of pillars (white points) withstanding against different pressure distribution of roofs. The centroids of clustering results from the Pillar Algorithm are used for calculating the similarity measurements to the representative query color features of the image queries. In this case, the Cosine distance metric is used for similarity calculation.


Figure 6. Illustrations of locating a set of pillars (white points) withstanding against different pressure distribution of roofs


4. SEMANTIC SEARCH

To apply our semantic image search, the system implements it for cultural image dataset. For experimental study, the system performs the retrieval with the Ukiyo-e image dataset from Tokyo Metropolitan Library for representing the Japanese cultural image collections is used. It contains 8743 typical images and artworks of famous paintings in Edo and Meiji era, including Hiroshige, Toyokuni, Kunisada, Yoshitoshi, Kunichika, Sadahige, Kuniteru, etc.

Experiment 1

Four images are given as multiple queries, shown in Figure 7. The experiment sets two emotion contexts which are "calm" and "quiet", for expressing the impressions to the queries in which a user wants to retrieve in the image search system. Figure 8 shows the top 15 retrieval results of our semantic image search system.


Figure 7. Multiple queries given to the search system with "calm quiet" emotion contexts



Figure 8. The top 15 retrieved image results of "calm quiet" emotion contexts

Figure 9 shows the precision of the retrieval results in line with i-th number of image results. In that figure, PR1 indicates the precision of the image results containing impressions those are definitely same to the contexts, PR2 indicates the precision of the image results containing impressions those are very close to the similar impressions of the contexts (or in other word, semantically same to the contexts), and MaxPR is the maximum bound of the precision. Even though PR1 just reached 53.33% of the precision in line with the top i image results, but PR2 performed the all correct retrieval results.


Figure 9. The precision of the retrieval results for "calm quiet" emotion contexts in line with i-th number image results

Experiment 2

Figure 10 shows eight images as queries. The experiment sets two emotion contexts which are "luxurious" and "elegant", for expressing the impressions to the queries in which a user wants to retrieve in the image search system. Figure 11 shows the top 15 retrieval results of our semantic image search system.


Figure 10. Multiple queries given to the search system with "luxurious elegant" emotion contexts



Figure 11. The top 15 retrieved image results of "luxurious elegant" emotion contexts

Figure 12 shows the precision of the retrieval results in line with i-th number of image results. In that figure, PR1 indicates the precision of the image results containing impressions those are definitely same to the contexts, PR2 indicates the precision of the image results containing impressions those are very close to the similar impressions of the contexts, and MaxPR is the maximum bound of the precision. Figure 12 shows that PR1 reached 73.33% correct retrieval results, and PR2 performed the all correct results in line with the top i image results.


Figure 12. The precision of the retrieval results for "luxurious elegant" emotion contextsin line with i-th number image results

PUBLICATIONS

Regarding this research on semantic image search system during April 2010-February 2011, we already published several papers. Here are results of our research:

  1. Ali Ridho Barakbah, Yasushi Kiyoki, "Cluster Oriented Image Retrieval System with Context Based Color Feature Subspace Selection", Industrial Electronics Seminar (IES) 2009, October 21, 2009, Surabaya, Indonesia.
  2. Ali Ridho Barakbah, Yasushi Kiyoki, "A New Approach for Image Segmentation using Pillar-Kmeans Algorithm", International Journal of Information and Communication Engineering, Vol. 6, No. 2, pp. 83-88, WASET, 2010.
  3. Ali Ridho Barakbah, Yasushi Kiyoki, "An Emotion-Oriented Image Search System with Cluster based Similarity Measurement using Pillar-Kmeans Algorithm", Accepted and to appear in International Journal of Information Modelling and Knowledge Bases, Vol. XXII, IOS PRESS, March, 2011.
  4. Ali Ridho Barakbah, Yasushi Kiyoki, "A Fast Algorithm for K-Means Optimization using Pillar Algorithm", The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, August 2-4, 2010, Tokyo, Japan. (Distinguished Young Researcher Award)
  5. Ali Ridho Barakbah, Yasushi Kiyoki, "Image Search System with Automatic Weighting Mechanism for Selecting Features", The 6th International Conference on Information and Communication Technology and Systems, September 28, 2010, Surabaya, Indonesia.

REFERENCES

[1] S. Wang, X. Wang, Emotion Semantics Image Retrieval: An Brief Overview, ACII 2005, LNCS 3784, pp. 490–497, Springer-Verlag Berlin Heidelberg, 2005.
[2] E.J. Park, J.W. Lee, Emotion-Based Image Retrieval Using Multiple-Queries and Consistency Feedback, The 6th IEEE International Conference on Industrial Informatics (INDIN) 2008, pp. 1654-1659, 2008.
[3] M. Solli, R. Lenz, Color Based Bags-of-Emotions, CAIP 2009, LNCS 5702, pp. 573–580, Springer-Verlag Berlin Heidelberg, 2009.
[4] W. Wang, Q. He, A Survey On Emotional Semantic Image Retrieval, The 15th IEEE International Conference on Image Processing (ICIP) 2008, San Diego, USA, 2008.
[4] T. Kitagawa, Y. Kiyoki, A mathematical model of meaning and its application to multidatabase systems, Proc. 3rd IEEE International Workshop on Research Issues on Data Engineering: Interoperability in Multidatabase Systems, pp.130-135, 1993.
[5] Y. Kiyoki, T. Kitagawa, T. Hayama, A metadatabase system for semantic image search by a mathematical model of meaning, ACM SIGMOD Record, Vol.23, No. 4, pp.34-41, 1994.
[6] Y. Kiyoki, , T. Kitagawa, Y. Hitomi, A fundamental framework for realizing semantic interoperability in a multidatabase environment, International Journal of Integrated Computer-Aided Engineering, Vol.2, No.1 (Special Issue on Multidatabase and Interoperable Systems), pp.3-20, John Wiley & Sons, 1995.
[7] A.R. Barakbah, Y. Kiyoki: Cluster Oriented Image Retrieval System with Context Based Color Feature Subspace Selection, Proc. Industrial Electronics Seminar (IES) 2009, pp. C101-C106, Surabaya, Indonesia, 2009.
[8] S. Kobayashi, Color Image Scale, 1-st edition, Kodansha International publisher, 1992.
[9] D. Sakai, Y. Kiyoki, N. Yoshida, T. Kitagawa, A Semantic Information Filtering and Clustering Method for Document Data with a Context Recognition Mechanism, Journal of Information Modelling and Knowledge Base, Vol. XIII, pp. 325-343, 2002.
[10] A.R. Barakbah, Y. Kiyoki: A Pillar Algorithm for K-Means Optimization by Distance Maximization for Initial Centroid Designation, The IEEE Symposium on Computational Intelligence and Data Mining, Nashville-Tennessee, 2009.