Image Search Engine with Automatic Weighting Mechanism for Selecting Features

Ali Ridho Barakbah

Graduate School of Media and Governance, Keio University, Japan

ABSTRACT

In this research, we develop an easy-to-use and highly precise image search engine. The key technology of the system is an automatic weighting mechanism for selecting features based on combining color, shape and structure features. Generally, the users should consider and determine weights for features to represent their preferences for selecting the features. These conventional systems make difficult for the novice users because it needs technical consideration for retrieval system. We implement automatic weighting mechanism for selecting the features by analyzing the distribution of color information to determine representative features. We extract color moments of an image, and calculate the color distances for the color weight and the texture density for the structure weight. The shape property is measured to extract the shape area and, then, calculate the shape adjustment from the texture density to determine the shape weight. We evaluate our system using 1000 JPEG from COREL image collections. The experimental results clarify effectiveness of the proposed system to improve the accuracy for image retrieval.

Keywords: Image retrieval, CBIR, feature extraction, automatic weighting.

1. INTRODUCTION

The rapid growth of the internet technology accelerates inter-media exchanges, including image data. According to a recent study, there are 180 million images on the publicly indexable Web, a total amount of image data of about 3Tb [terabytes], and an astounding one million or more digital images are being produced every day [1]. An efficient image searching, browsing, and retrieval systems are widely developed in order to provide better ways and approaches for such kinds of activities. The image search engine, known as image retrieval systems, based on the contents are very attracting and challenging in research areas of image searching. The image retrieval systems based on the contents are very attracting and challenging in research areas of image searching. Many content-based image retrieval (CBIR) systems have been proposed and widely applied for both commercial purposes and research systems. The system analyzes the contents of an image by extracting primitive features such as color, shape, texture, etc. Most approaches have been introduced to explore the content of an image and identify the primary and dominant features inside the image. QBIC [2] introduced an image retrieval system based on color information inside an image. VisualSeek [3] represented a system by diagramming spatial arrangements based on representation of color regions. NETRA [4] developed a CBIR system by extracting color and texture features. Virage [5] utilized color, texture, and shape features for the image retrieval engine. CoIRS [6] also introduced a cluster oriented image retrieval system based on color, shape, and texture features. Veltkamp and Tanase [7] and Liu et al [8] presented a survey to many image retrieval systems using diverse features.

Generally, the users should consider and determine weights for features to represent their preferences for selecting the features. These conventional systems make difficult for the novice users because it needs technical consideration for retrieval system. We implement automatic weighting mechanism for selecting the features by analyzing the distribution of color information to determine representative features. Figure 1 shows an illustration of our novel system compared with the conventional image search engine.


Figure 1. Illustration of image retrieval system

2. OBJECTIVE

In this research, we develop an online image database retrieval system with automatic weighting mechanism for selecting the features. This system realizes a retrieval engine for image retrieval based on combining color, shape and structure features. The system pre-processes the images involving noise removal and image segmentation. A new hybrid color system using HSL and CIELAB is introduced in this paper to improve the quality of image segmentation. In this paper we also introduce a new mechanism for clustering a big size of image in order to improve the precision and computation time. We extract the shape feature by measuring the image properties such as eccentricity, area, equivalent diameter, and convex area. In the system, we also identify structures of image contents by applying 2D forward mirror-extended curvelet transform. The system also introduces a new approach for extracting color feature by applying histogram of 3-dimentional color vector quantization. After features extraction, the metadata of shape, structure, and color are created. The metadata of image query are used to measure the similarity with metadata repository of image database. The retrieval engine analyzes the distribution of color information to determine dominant features and set automatic weighting for selecting the features. For matching process between image query and image database, we introduce a distinctive idea which represents a semantic closeness of distances. The retrieved results are ranked based on highest similarity between image query and image database. Figure 2 shows the system architecture of the proposed image retrieval system.


Figure 2. System architecture of proposed research

3. DESIGN SYSTEM

In this section, we describe functions to extract three kinds of image features: shape, structure, and color, in our system. For shape feature extraction, we involve eccentricity, area, equivalent diameter, and convex area. The structure feature is extracted using Curvelet. We apply 3D-Color Vector Quantization for color feature extraction. In this section we also discuss our proposed approach for automatic weighting mechanism.

3.1. Shape feature

First, the image segmentation is applied for pre-process shape feature extraction. We use our Pillar algorithm [9] for image segmentation. After that, our system converts a segmented RGB image into gray-scale. In order to improve precision of shape metadata, we make partitioning 4x4 of the image. Then, we apply edge detection using Canny and measure the image properties which are eccentricity, area, equivalent diameter, and convex area.

3.2. Structure feature

First of all, the image segmentation based on Pillar algorithm is applied for pre-process structure feature extraction. After that, we convert a segmented RGB image into gray-scale. Then, after applying 4x4 of image partitioning, the structure feature is extracted. We utilize the Curvelet algorithm to extract the structure of an image. The Curvelet is two multiscale geometric transforms that have revealed themselves quite useful over the past few years in diverse fields. In such image processing, it is used for multi-scale image representation where the image can be represented at different layers of image transformation. However, applying the Curvelet algorithm naively to the extended image will result in (at least) a fourfold increase in computational complexity and redundancy. Solving this matter, in this paper we use 2D forward mirror-extended Curvelet transform [10] for identifying the structure of an image. The mirror-extended Curvelet can cut down redundant computations where it is possible.

3.3. Color feature

Noise removal and 4x4 image partitioning are applied before extracting color features. Then, for each block we extract color information using our previous research work 3D-Color Vector Quantization [11]. In this paper, we use the 64x64x64 quantization size of the RGB color space so that it can be represented with 125 positions in the RGB color space, as shown in Figure 3.


Figure 3. Illustration of 3D-Color Vector Quantization of RGB color space

3.4. Automatic Weighting Mechanism

We implement automatic weighting mechanism for selecting the features by analyzing the distribution of color information to determine representative features. First, we apply the image segmentation using our Pillar algorithm. We extract color moments of an image, and calculate the color distances for the color weight and the texture density for the structure weight. Color moments have been succesfully used in many retrieval systems and proved to be efficient and effective in representing color distributions of images [12]. The color distances are calculated from the first order color moment by applying the shape independent clustering and obtaining the distance of color hierarchies. The texture density is calculated from the third order color moment to be more sensitive to scene the structures of images. The shape property is measured to extract the shape area and, then, calculate the shape adjustment from the texture density to determine the shape weight.

4. IMAGE RETRIEVAL SYSTEMS

After feature extraction, each metadata of shape, structure and color are created. These metadata are used for matching process to metadata of an image query. The retrieved images are acquired from images which have the highest similarity of the metadata to the metadata of an image query. Regarding the image query, we use Query By Example (QBE) as an image query. Regarding similarity, the cosine distance metric is only applied to calculate similarity measure for color. For shape and structure, we use our own distance metric which represents a semantic closeness of distance between two data points those correspond an image query and image data.

For our experimental study, we use a-general purpose image database containing of 1000 JPEG from COREL image collections. These images are manually divided into 10 categories which are African people, beach, historian building, bus, dinosaur, elephant, rose, horse, mountain, and food. For the experimental study, we use one image for each category as an image query. We determine 15 top correct retrieved images for the query. We provide the online version of our image search engine at http://www.mdbl.sfc.keio.ac.jp/~ridho/autocbir/. Figure 4 and Figure 5 show the snapshots of our image search engine with automatic weighting mechanism.


Figure 4. Snapshot of our image search engine with automatic weighting.


Figure 5. Snapshot of retrieved results of our image search engine with automatic weighting.

Figure 5 shows the error comparison between the automatic weighting and the manual weighting with best weights selection. Our image search engine with automatic weighting only increased 1.5 of the average errors for each experiment comparing to the manual weighting.


Figure 6. Error comparison between the automatic weighting and the manual weighting.

PUBLICATIONS

During the reseach period April 2008-February 2009, we already published several papers of our research regarding image search engine. Here are our research results:

  1. Ali Ridho Barakbah, Yasushi Kiyoki, "3D-Color Vector Quantization for Image Retrieval Systems", International Database Forum (iDB) 2008, September 21-23, 2008, Iizaka, Japan.
  2. Ali Ridho Barakbah, Yasushi Kiyoki, "Image Retrieval Systems with 3D-Color Vector Quantization and Cluster based Shape and Structure Feature Extraction", Poster, Open Research Forum (ORF) 2008, November 20-22, 2008, Keio University (Shonan Fujisawa Campus), Tokyo, Japan.
  3. Ali Ridho Barakbah, Yasushi Kiyoki, "A Pillar Algorithm for K-Means Optimization by Distance Maximization for Initial Centroid Designation", IEEE Symposium on Computational Intelligence and Data Mining (CIDM) 2009, March 30-April 2, 2009, Nashville-Tennessee, USA. (Accepted)
  4. Ali Ridho Barakbah, Yasushi Kiyoki, "An Image Database Retrieval System with 3D Color Vector Quantization and Cluster-based Shape and Structure Features", The 19th European-Japanese Conference on Information Modelling and Knowledge Bases, Finland, 2009. (Submitted)
REFERENCE

[1] A.A. Goodrum: Image information retrieval: an overview of current research, Special Issue on Information Science Research 3 (2), 2000.
[2] C. Faloutsos, R. Barber, M. Flickner, J. Hafner, W. Niblack, D. Petkovic, W. Equitz: Efficient and effective querying by image content, Journal of Intelligent Information Systems 3 (3-4), pp. 231-262, 1994.
[3] J.R. Smith, S.F. Chang: VisualSEEk: a fully automated content-based image query system, Proc. The Fourth ACM International Conference on Multimedia, Boston, MA, pp. 87-98, 1996.
[4] W.Y. Ma, B.S. Manjunath: Netra: A toolbox for navigating large image databases, Multimedia Systems 7 (3), pp. 184-198, 1999.
[5] J. Bach, C. Fuller, A. Gupta, A. Hampapur, B. Gorowitz, R. Humphrey, R. Jain, C. Shu: Virage image search engine: an open framework for image management, Proc. The SPIE, Storage and Retrieval for Image and Video Databases IV, San Jose, CA, pp. 76-87, 1996.
[6] H.M. Lotfy, A.S. Elmaghraby: CoIRS: Cluster-oriented Image Retrieval System, Proc. 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI) 2004, pp. 224-231, 2004.
[7] R.C. Veltkamp, M. Tanase, Content-Based Image Retrieval Systems: A survey, Technical Report UU-CS-2000-34, 2000.
[8] Y. Liu, D. Zhang, G. Lu, W.Y. Ma: A survey of content-based image retrieval with high-level semantics, Pattern Recognition 40, pp. 262-282, 2007.
[9] A.R. Barakbah, Y. Kiyoki: A Pillar Algorithm for K-Means Optimization by Distance Maximization for Initial Centroid Designation, The IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Nashville-Tennessee, USA, 2009.
[10] L.Demanet, L. Ying: Curvelet and wave atoms for mirror-extended images, Proc. Wavelets XII conf, San Diego, 2007.
[11] A.R. Barakbah, Y. Kiyoki: 3D-Color Vector Quantization for Image Retrieval Systems, The International Database Forum (iDB), Iizaka, Japan, 2008.
[12] F. Long, H. Zhang, D.D. Feng: Fundamentals of Content-based Image retrieval, The Multimedia Information Retrieval and Management - Technological Fundamentals and Applications, 2002.