Chapter 25: From Low Level Features to High Level Semantics | Handbook of Video Databases: Design and Applications (Internet and Communications)

Cha Zhang and Tsuhan Chen
Department of Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, Pennsylvania, USA
{czhang,tsuhan}@andrew.cmu.edu

1. Introduction

A typical content-based information retrieval (CBIR) system, e.g., an image or video retrieval system, includes three major aspects: feature extraction, high dimensional indexing and system design [1]. Among the three aspects, high dimensional indexing is important for speed performance; system design is critical for appearance performance; and feature extraction is the key to accuracy performance. In this chapter, we will discuss various ways people have tried to increase the accuracy of retrieval systems.

If we think over what "accuracy" means for a retrieval system, we may find it very subjective and user-dependent. The similarity between objects can be very high-level, or semantic. This requires the system to measure the similarity in a way human beings would perceive or recognize. Moreover, even given exactly the same inputs, different users probably have different feeling about their similarity. Therefore, a retrieval system also needs to adapt to different users quickly through on-line user interaction and learning.

However, features we can extract from objects are often low-level features. We call these low-level features because most of them are extracted directly from digital representations of objects in the database and have little or nothing to do with human perception. Although many features have been designed for general or specific CBIR systems with high level concepts in mind, and some of them showed good retrieval performance, the gap between low-level features and high-level semantic meanings of the objects has been the major obstacle to better retrieval performance.

Various approaches have been proposed to improve the accuracy performance of CBIR systems. Essentially, these approaches fall into two main categories: to improve the features and to improve the similarity measures. Researchers have tried many features that are believed to be related with human perception, and they are still working on finding more. On the other hand, when the feature set is fixed, many algorithms have been proposed to measure the similarity in a way human beings might take. This includes off-line learning based on some training data, and on-line learning based on the user's feedback.

The chapter is organized as follows. Section 2 overviews some feature extraction algorithms that emphasize the high level semantics. Section 3 discusses the similarity measure. Section 4 presents some off-line learning methods for finding better similarity measures. Section 5 examines algorithms that learn on-line based on the user's feedback. Section 6 concludes the chapter.