Chabot: A System for Retrieval from a Relational Database of Images

Virginia E. Ogle
Michael Stonebraker
University of California, Berkeley
{ginger, mike}@cs.berkeley.edu

Abstract

Chabot is a picture retrieval system for a database that will eventually include over 500,000 digitized multi-resolution images. We describe the design and construction of this system which uses the relational database management system POSTGRES for storing and managing the images and their associated textual data. For retrieval, Chabot uses tools provided by POSTGRES, such as representation of complex data types, a rich query language, and extensible types and functions. To implement retrieval from the current collection of 11,643 images, Chabot integrates the use of stored text and other data types with content-based analysis of the images to perform "concept queries".

1. Introduction

The Chabot project was begun at UC Berkeley to assist the State of California Department of Water Resources (DWR) in its goal of providing on-line access to its collection of images. DWR is one of the governmental sponsors of the global change research project Sequoia 2000 [22], which Berkeley database researchers have participated in. In its role as the state agency that oversees the system of reservoirs, aqueducts and water pumping stations throughout California known as the State Water Project, DWR maintains a growing collection of over 500,000 photographs, negatives, and slides, primarily images of State Water Project facilities, but also many images of California natural resources. Some examples of these images are shown below .


Over the years, as DWR has made its collection available to the public, it has found itself devoting increasing resources toward filling requests for prints and slides. The agency receives 100-150 requests a month from a variety of sources: other government agencies, regional magazines, encyclopedia, university libraries, wildlife organizations, and individuals. Requests vary from those where the ID number of the desired picture is already known, to very general requests for "scenic pictures" of lakes and waterways. DWR keeps the slides that are requested most often in lighted display boxes for browsing; the rest of the collection is housed in archival containers and slide drawers.

To facilitate retrieval, DWR began a project last year to digitize all its images using Photo-CD technology; several years ago the agency began entering descriptive data about each image into a single-user personal computer style database. To process a request, the staff uses keyword look-up on the text descriptions stored in the database to find an ID number for the requested images. The ID is used to locate the container or drawer where the print or slide is stored.

While an attempt is made to annotate each image with as much descriptive information as possible, keyword indexing for images has limitations, especially for non-specific requests such as "a scenic image of a lake". Other problems arise: misspellings ("azalia" for "azalea"), inaccurate descriptions (a photo of a bright red anemone in full flower with the stored description "Close-up of a pansy"), and incomplete descriptions: some images in the DWR collection are old and cannot be identified, so they are being digitized and loaded into the database with minimal descriptive data. As a result, most retrievals currently rely on having a staff member who is familiar enough with the collection to know where to find the desired prints.

Keyword searches using only the metadata are not effective, but the current system cannot support complex data types such as time, geographical location, and images. In addition, DWR would like to load and edit its database remotely, and to enable browsing of its images by offsite users who are interested in ordering prints or slides. The current database does not meet the needs of the agency for on-line, multi-user access, and it will not scale to accommodate the 500,000 images in the collection. The Chabot project was initiated to replace the existing system. Chabot provides:

In Chapter 2 we describe the motivation and goals for the project. Chapter 3 discusses current research in the field. Chapter 4 contains a description of the Chabot project, and in Chapter 5 we summarize the project and give our plans for further enhancements.

2. System Motivation and Goals

The design of Chabot is influenced by DWR's existing system for storing its metadata, by the types of requests it receives, and by the methods now in use for queries and updates.

Integration of Data Types: Each image is accompanied by a sizeable amount of metadata. Below is a sample entry for one image from DWR's existing database:

0162 A-9-98 6/1/69 SWP Lake Davis Lahontan Region (6) Grizzly Dam, spillway and Lake Davis, a scenic image. DWR 35 mm slide Aerial 2013 0556 18
In this example, "0162" is the first four digits of the CD number, "A-9-98" is the DWR ID, followed by the date the photo was taken (6/1/69), the category ("SWP"), the subject ("Lake Davis"), the location ("Lahontan Region (6)"), a description of the image, the organization ("DWR"), the type of film used, the perspective of the photo, the last eight digits of the Photo-CD, and finally, the number of the image on the photo-CD.

DWR needs a DBMS that can support a variety of complex data types including text, numerical data, relative and absolute time, and geographical location. Retrievals should be possible on any combination of the complex data types that are associated with the images, as well as with the content of the images themselves.

Scalability and Storage Concerns: Since each of the multi-resolution Photo-CD images is from 4 to 6 MB in size, the entire database of 500,000 images and their associated text data will require in excess of 2.5 terabytes of storage. The desire for fast access for browsing images must be balanced with the need to minimize the cost to store the images. Therefore a multiple level storage plan is needed including a tertiary memory device to store images.

Simplicity of Use, Simplicity of Design: The browser needs to be simple enough for non-technical staff to use, while also protecting against accidental modification of data already contained in the database. The user interface should to be similar in structure to the existing system and as intuitive and self-documenting as possible. The design of the system should also be simple; it should use existing functions and established models both for ease of implementation as well as to simplify future modifications.

Flexible Query Methods: The retrieval system must be flexible enough to quickly fetch images by ID number, but it must also be able to handle more complex queries that combine several of the attributes of the image. To process a query such as "Find a picture of a sunset taken near San Francisco during 1994", the retrieval system must be able to search on multiple data types such as geographical location ("San Francisco"), time ("after 12/31/93 and before 1/1/95"), and content ("a sunset").

Querying by Image Content: Because of the size of the DWR collection, queries that are too general might return a result set of unmanageable size, so we must take steps to increase the precision of retrievals, thereby reducing the set of images that a user must browse to find the images of interest. More importantly, since the primary data type of this database is the image, standard querying by stored descriptive data will not always yield satisfactory results. Therefore, the system must integrate stored textual information with image content information. Ideally, the user should be able to register a conceptual description like "sunset" with the retrieval system, which should respond by initiating the appropriate functions to analyze the content of the images stored in the database that meet the user's expectation of what constitutes a "sunset". In Chapter 4 we will describe our implementation of "concept queries" like this example.

3. Current Research

Much work is underway in the area of image feature indexing, especially color indexing [24,25]. Given the unpredictable nature of the DWR queries, however, it is not likely that an index can be made for our database that will be relevant for more than a small number of requests. Moreover, indexing often presupposes similarity matching for retrievals ("Find other pictures that look like this one") and pre-identification of interesting features. DWR would like to "fish" from the database rather than present a sample image for matching, and image-by-image review to delineate content features is not feasible because of the large number of images in the collection.

The Photobook project [16] at the M.I.T. Media Lab seeks to circumvent the issue of pre-determined search criteria by storing enough information about each image so that run-time computations are possible. Images are classified at load time as having "face", "shape", or "texture" properties; some techniques have been developed to automate this process, such as foreground extraction. Once classified, the image is compressed by encoding salient semantic information according to its category, and these smaller encoded versions are used at query time both to reconstruct the image and also to compute any additional search criteria such as a color histogram. Photobook has been used to match faces in a collection of photographic portraits and to identify hand tools in a small collection of images. However, this project does not use an underlying relational database; moreover, static pre-analysis is not practical for our application.

One of the most closely related projects to Chabot is the QBIC project [6,14] at IBM Almaden, which uses image analysis to process queries for an image database. This project uses color, shape, and texture to match images in the database to a user's query, which has the form "find more pictures like this one". The user can make a sketch of a shape, select colors and color distributions from a color wheel, and select textures from a predetermined range. The system returns a ranked list of best matches to the user's query. However, the DWR application needs a coarser granularity for color distinctions and we would like to allow the user to define functions like "find a sunset" that query by concept rather than by the color or shapes in individual images. Support is also needed for integration of image features with text and other data types.

In the DBMS community, image database research focuses on storage methods for large objects: spatial data such as geographical maps may be stored in structures like R-trees [10] and Quad-trees [7]. Current work includes Digital Equipment Corporation's multimedia object support for Rdb [19], DEC's relational database. Multimedia object files are physically stored in segments on a WORM device and within the database as BLOBs, guaranteeing transactions and concurrency for these objects. However, this project is not investigating ways to incorporate content-based queries to retrieve images.

The particular needs of our application require the services of a powerful relational database model as the foremost consideration. We also require the flexibility to implement "concept" queries that use image content in conjunction with the text-based queries supported by the DBMS. In these ways Chabot differs noticeably from the systems described above.

4. Description of Chabot

Chabot includes a top-level user interface that handles both queries and updates to the database. Our querying mechanism retrieves images on the basis of stored textual data as well as on more complex relations among the stored data, and we have implemented a method for color analysis of the images as a first step toward integrating content analysis into the retrieval system. In this section we give the implementation details of the Chabot.

POSTGRES

To store the images and textual data, we are using POSTGRES, an object-relational DBMS developed at the University of California, Berkeley. POSTGRES is particularly attractive for use with a database like Chabot; in addition to the standard relational database features, it provides features not found in traditional relational DBMS's, such as:

Object-oriented properties: Classes can be defined for objects in a POSTGRES database and attributes can be inherited among classes. The "Schema" section below explains this in more detail.

Complex types: POSTGRES provides a flexible assortment of data types and operators that are useful for a database like Chabot such as time (absolute and relative), variable-length arrays, and images. In addition, users can define new data types for a database, along with operators that are particular to the type. For example, a type "PhotoCD" can be defined that includes operators to manipulate the image at runtime.

User-defined indices: A secondary index can be defined using access methods specified by the user. The index can be implemented either as a B-tree or as an R-tree. Partial indices that include a qualifying operator can be extended incrementally.

User-defined functions: Functions written in C can be registered with a POSTGRES database. The first time the function is invoked, POSTGRES dynamically loads the function into its address space; repeated execution of the function causes negligible additional overhead since it remains in main memory. For the Chabot database, we wrote a function that analyzes at retrieval time color histograms that have been previously computed and stored in the database.

Storage

Each of our images is received in Photo-CD format in five different resolutions, ranging from a "thumbnail" (128x192 pixels) to the highest resolution of 2048x3072 pixels. The size of each image is from 4 to 6 MB. Since DWR's goal is to allow on-line access to both images and data, Chabot must provide reasonably fast browsing of the stored images over a network. A random access medium such as a magnetic disk that is fast enough for remote browsing is too expensive to store the large number of images we are storing; cheaper alternatives such as tape may be so slow that on-line browsing is virtually impossible. Our solution is to use a two-level storage scheme. We use magnetic disk for storing the thumbnail images and text needed for browsing the database and we archive the large multi-resolution image files on a tertiary device, a Metrum VHS-tape jukebox. The Metrum holds 600 VHS tapes, each tape having a 14.5 GB capacity. With a total capacity of 10.8 TB, the Metrum is more than adequate as a repository for the DWR image library. The average time for the Metrum to find a tape, load it, and locate the required file is about 2 minutes - too slow for browsing a set of images but fast enough for filling a request from a DWR client once the desired image has been identified.

The Schema

The schema for the Chabot project was designed to fit with those of other research projects in progress at Berkeley -- a collection of technical reports and a video library. The image class in our database is called PHOTOCD_BIB, for "Photo-CD Bibliography", which inherits the attributes "title" and "abstract" from the DOC_REFERENCE class, which is shared by the technical report and video object classes. As shown below, the PHOTOCD_BIB class contains "bibliographical" information about the image object, such as the ID number, the name of the photographer, the film format, the date the photo was taken, and so on. A complete list of attributes for the PHOTOCD_BIB class is shown in Table 1 below.


Schema for technical report, video, and photo-cd classes

Most of the attributes for the image class are stored as text strings; there are two fields that have type abstime - the "shoot_date" of the photo and the "entry_date" that the information was entered into the database, which allows us to perform time-relative searches, for example, "Find all shots of Lake Tahoe that were taken after January 1, 1994".


The User Interface

We have implemented a graphical point-and-click Motif interface for Chabot using Tcl/Tk [15] and based on pgbrowse [4], a general interface written for POSTGRES databases. The interface is designed to prevent accidental corruption of data while browsing the database; the main screen gives the user three options: find, edit, and load. The database can be modified only via the edit and load screens and user authorization for these screens is required. The find screen is for running queries and for browsing the database. An example of the current implementation for the find window appears below.


The user can build queries by clicking on the appropriate buttons and typing text into the entry fields next to the search criteria. Pull-down menus, indicated by a downward pointing arrow next to the entry field, are provided for some search criteria, those that have limited options such as "Region", "Film Type", "Category", "Colors", and "Concept". The user selects one or more of these fields and then clicks on the button labelled "Look Up" to initiate the query, and a Postquel query is constructed and issued to the database. For example, using the search criteria from the find screen shown, the Postquel query would be:

retrieve (q.all) from q in PHOTOCD_BIB where
q.shoot_date>"Jan 1 1994" and
q.location~"2" and
MeetsCriteria("SomeOrange",q.histogram)
As we will describe in the next section, "MeetsCriteria" is a function we have written and registered with the database to do content analysis. Figure 2 shows some of the images that were returned from the above query. When the query is processed, the result set of data is displayed in a pop-up "Query Result" window; the user can then print the data, save it to a file, or click on a "Show Image" button, which calls xv [2] to display the selected images; up to 20 images can be displayed at once. In the example above, eight of the images were selected from the "Query Result" window and the resulting xv display is shown in Figure 2 .

MeetsCriteria

To implement concept queries, we use two capabilities that POSTGRES provides: storage of pre-computed content information about each image (a color histogram) as one of the attributes in the database, and the ability to define functions that can be called at run-time as part of the regular querying mechanism to analyze this stored information. The function "MeetsCriteria" is the underlying mechanism that is used to perform concept queries. It takes two arguments: a color criterion such as "Mostly Red" and a color histogram. The user selects a color criterion from a menu on the find screen, and a call to MeetsCriteria is incorporated into the query using the selected color. Colors implemented so far are shown in the example below:


For the histograms, we have experimented with quantizing the colors in our images to a very small number so that run-time analysis is speeded up. Our tests were conducted on histograms containing 20 elements that were computed using djpeg's [11] Floyd-Steinberg quantization.

The POSTGRES query executor calls the function MeetsCriteria for each histogram in the database, checking to see whether it meets the criterion that is presented. POSTGRES's query optimization facility is used to minimize the search set of histograms. The function returns true if the histogram meets the criterion, false if it does not. Although the method for finding histograms that meet the criterion varies according to which color is being checked, in general the algorithm employs two metrics: compliance and count.

Compliance: Each of the colors in the histogram is checked to see whether it complies with the values that have been pre-defined for the requested color. For example, in the RGB model the color white is represented by (255,255,255) for (red, green, blue); a color whose RGB values are all above 241 qualifies as "white" in our approach.

Count: As we check each color in the histogram for compliance, we keep two counts: the number of colors in the current histogram that have matched the criterion, and the number of pixels contained in the matching colors as a function of total pixels in the image. The former count is used when we are looking for "Some" colors; in the "Some Yellow" example, we get a true result if just one or two of the twenty colors in the histogram qualify as yellow. We use the total pixel count for the "Mostly" matches: more than 50% of the total pixels of an image must be "red" in order for the image to meet the "Mostly Red" criterion.

Concept Queries

In addition to using color directly for content analysis, users can compose higher level content-based queries to the database that embody contextual information such as "sunset" and "snow". These queries are called concept queries. The Concepts selection on the find screen of the interface lists the concept queries that are available, each of which has been previously defined by the user:


Selecting a concept from the pull-down menu generates a Postquel query that incorporates a combination of search criteria that satisfy the concept. Typically MeetsCriteria is used in these queries for color analysis in combination with some other textual criteria. For example, when "sunset" is chosen from the Concepts menu, the following query is sent to the database:

retrieve (q.all) from q in PHOTOCD_BIB where
q.description ~ "sunset" or
MeetsCriteria("MostlyRed",q.histogram) or
MeetsCriteria("MostlyOrange",q.histogram)
In this case, the user has defined the concept "sunset" as including images that have the stored keyword "sunset" associated with them, or images that have red or orange as their predominant color. Concept queries can be used in conjunction with other criteria. The query "Find pictures of Lake Tahoe at sunset" would be generated by choosing "sunset" from the Concept menu and setting the Location to "Lake Tahoe". Users interactively define new concepts and add them to the Concepts menu by selecting criteria from the find screen that should be included. Below is an example of the definition of a new concept called "purple flowers".


Testing

To test our content analysis, we measured the recall and precision [20] of the concept queries that are shown in the User Interface section. Recall is the proportion of relevant materials retrieved, while precision quantifies the proportion of retrieved materials that are relevant to the search. For each concept query, we identified by hand all the images in the collection that we thought should be included in the result set. We then tried various implementations of the concept using different combinations of content-based and stored textual data. We measured recall and precision for each implementation.

Table 2 shows the results from one of the test queries that is representative of our findings, the concept "yellow flowers". For this concept, we first identified 22 pictures in the collection that were relevant; we then implemented the "yellow flowers" function in seven different ways using different combinations of search criteria. As shown below, queries 1-3 used keyword search only, queries 4 and 5 used only content-based information, and queries 6 and 7 used a combination of keyword and content-based data.


In this test, two different methods for finding yellow were tried. SomeYellow (2) means there were at least two yellowish colors in a 20-element histogram. SomeYellow (1) means that only one yellow color is needed for the picture to be counted as having "some yellow". As shown for query 5, pictures can be retrieved with 100% recall if the color definition is broad enough, but the precision is too low: the 377 images retrieved from query 5 would require the user to browse nineteen screens of thumbnails (each xv screen displays 20 images) to find the pictures of yellow flowers. Using the coarse definition for yellow in conjunction with the keyword "flower" gives the best result: query 7 has a recall of 63.6% with a very high precision of 93%. Figure 3 shows the fifteen images that were retrieved from this query; only the image in the upper left corner of the group - a plant with pink stems and leaves but with only a small amount of yellow in its petals - was not considered relevant. Figure 4 shows the five images retrieved from query 3, where the keywords "flower" and "yellow" were used. The second picture in this group was not considered relevant.

We found that retrieving images on keywords alone or on content alone produced unsatisfactory results. For example, recall and precision are in inverse proportion: when we retrieve a high percentage of the relevant images, as retrieving all "Mostly Red" images in order to find sunsets, we also retrieve many more images that are not sunsets. But if we more closely restrict the search criteria using carefully chosen keywords so that precision increases, fewer of the relevant images are retrieved. For our application, the best results are achieved when both content and some other search criteria were used, and this is the method we use to implement concept queries.

5. Conclusions and Future Work

We set out to integrate a relational database retrieval system with content analysis techniques that would give our querying system a better method for handling images. We have found that even the simple color analysis method we employ, if used in conjunction with other search criteria, improves our ability to retrieve images efficiently. We concluded that the best result is obtained when text-based search criteria are combined with content-based criteria and when a coarse granularity is used for content analysis. Our concept queries take advantage of this combination.

Future Work

We would like to experiment with other content analysis techniques besides color. For example, we would like to handle concept queries such as "Find a picture of Lake Anza with people swimming in it" using texture information, "Find a picture of Chabot Reservoir when the water is low" using edge detection. Implementation is underway to include techniques to improve our color analysis. We plan to divide each image into segments and compute and store a histogram for each of these areas. Since most of our images are outdoor shots, we would then be able to distinguish between colors in the lower (ocean, ground) and upper (sky) halves of the picture. Storing a histogram of the center diamond of the picture is another consideration; since many of the photos were taken by a professional photographer, the "interesting" objects can often be found in the center of the picture. We would also like to experiment with adjusting our quantization factor according to the color distribution in individual images; images having a large number of different colors would be allotted a histogram with a larger than average number of elements.

References

[1] Manis Arya, William Cody, Christos Faloutsos, Joel Richardson, Arthur Toga, "QBISM: A Prototype 3-D Medical Image Database System", IBM Almaden Research Center.

[2] John Bradley, xv , by anonymous ftp from ftp://plaza.aarnet.edu.au/usenet/comp.sources.x/volume10

[3] Hadmut Danisch, hpcdtoppm, by anonymous ftp from ftp://usc.edu/archive/usenet/sources/comp.sources.misc/volume34

[4] Jim Davidson, University of California, Santa Barbara, pgbrowse, by anonymous ftp from ftp://crseo.ucsb.edu/pub/pgbrowse.

[5] Jeff Dozier, "Information Systems for the Study of Global Change", University of California, Berkeley and Santa Barbara, Spring 1994.

[6] Christos Faloutsos, Myron Flickner, Wayne Niblack, Dragutin Petkovic, Will Equitz, Ron Barber, "Efficient and Effective Querying by Image Content", IBM Research Report RJ 9453, August 3, 1993.

[7] R.A. Finkel and J.L. Bentley, "Quad-Trees - A Data Structure for Retrieval on Composite Keys", Acta Informatica 4, 1974.

[8] James D. Foley, Andries van Dam, Steven K. Feiner, John F. Hughes, and Richard L. Phillips, "Introduction to Computer Graphics", Addison-Wesley, 1994

[9] Arif Ghafoor, course notes for "Multimedia Database Systems", ACM Multimedia `94, October 15, 1994

[10] Antonin Guttman, "R-Trees: A Dynamic Index Structure for Spatial Searching", Proceedings of the 1984 ACM SIGMOD Conference on Management of Data, Boston, Mass. June 1984.

[11] Independent JPEG Group, cjpeg and djpeg, by anonymous ftp from ftp://princeton.edu/pub/tmp/AmyJpeg

[12] Ramesh Jain, editor, "NSF Workshop on Visual Information Management Systems", SPIE, Volume 1908 (1993)

[13] Kodak "Photo-CD Access Developer Toolkit: Programmer's Guide for Unix Systems", Eastman Kodak Company, 1992.

[14] Wayne Niblack, Ron Barber, Will Equitz, Myron Flickner, Eduardo Glasman, Dragutin Petkovic, Peter Yanker, Christos Faloutsos, "The QBIC Project: Querying Images by Content Using Color, Texture, and Shape", IBM Research Report, RJ 9203, February 1, 1993.

[15] John K. Ousterhout, "Tcl and the Tk Toolkit", Addison-Wesley Publishing Company, 1994.

[16] Alex Pentland, Rosalind Picard, and Stan Sclaroff, "Photobook: Tools for Content-Based Manipulation of Image Databases." SPIE PAPER 2185-05 Storage and Retrieval of Image and Video Databases II, San Jose, CA. February 6-10, 1994.

[17] The POSTGRES Group, "The POSTGRES Reference Manual", Computer Science Division, University of California, Berkeley, January 1993.

[18] Jef Poskanzer, pbmplus , available by anonymous ftp from ftp://usc.edu/archive/usenet/sources/comp.sources.misc/volume26.

[19] Mark F. Riley, James J. Feenan, Jr., John L. Janosik, Jr., T.K. Rengarajan, "The Design of Multimedia Object Support in DEC Rdb", Digital Technical Journal, Vol.5, No.2, Spring 1993

[20] Gerard Salton, "Automatic Text Processing", Addison-Wesley Publishing Company, 1989.

[21] Conference Proceedings of the SPIE - The International Society for Optical Engineering, "Storage and Retrieval for Image and Video Databases", San Jose, California February 1993., Vol., 1908.

[22] Michael Stonebraker, "An Overview of the Sequoia 2000 Project", Sequoia 2000 Report 91/5, University of California, Berkeley, December 1991.

[23] Michael Stonebraker, et al., "The Implementation of POSTGRES," IEEE Transactions on Knowledge and Data Engineering, March 1990.

[24] Marcus A. Stricker, "Bounds for the Discrimination Power of Color Indexing Techniques" Storage and Retrieval of Image and Video Databases II, San Jose, CA. February 6-10, 1994.

[25] Michael J. Swain, "Interactive Indexing into Image Databases", IS&T/SPIE International Symposium on Electronic Imaging: Storage and Retrieval for Image and Video Databases, February 1993.

[26] Wallace, Gregory K. "The JPEG Still Picture Compression Standard", Communications of the ACM, April 1991 (vol. 34, no. 4).