Chapter 16. Database Management

The INGRES project was initiated in 1973 to design and implement a full-function relational database management system. When the INGRES system was first released in 1976, it contained over 100,000 lines of code in the C programming language and ran on PDP-11s under the UNIX operating system. INGRES, along with several other research prototypes, proved that the relational model was a suitable basis for an efficient, easy-to-use database management system. The system was distributed to over 150 sites around the world and provided the foundation for several commercial products.

In the late 1970s, the INGRES project shifted its focus to distributed database systems and application-development environments. A distributed database management system allows users to query and update data stored on different host computers in a computer network. The prototype distributed INGRES system, running on two VAX machines, was demonstrated in 1982. During this period, a group also worked on the problem of connecting heterogeneous database systems (e.g., hierarchical, network, and relational) that might be geographically dispersed. Also demonstrated in 1982 was a prototype forms-based programming environment called the Forms Application Development System (FADS) that allowed end users to code their applications by filling in forms displayed on a terminal screen. In the 1980s, all of this research was transferred to commercial products. Distributed databases, gateways to heterogeneous database systems, and forms-based application development tools are now available from many vendors.

In the mid-1980s, the focus of the INGRES project again shifted to a new topic: database support for complex, dynamic data. Early database systems, particularly those based on the relational model, could solve many of the problems facing data-processing organizations that dealt with rigidly structured, regular data (e.g., business data). However, they have not solved the problems posed by more dynamic data, such as documents, geographical data, CAD/CAM data, and programs.

In 1985, the INGRES project embarked on the design and implementation of a new database management system called POSTGRES (for "after INGRES") and a new application development environment called Picasso to provide support for these applications. In 1987 the focus of POSTGRES was expanded to include effective execution on a shared-memory multiprocessor. This resulting system, XPRS (eXtended POSTGRES on RAID and Sprite), attempts to provide both high performance and high availability. The major design objectives for these systems and for other research on database management systems are described in the research summaries below.

Tioga: Providing Data Management Support for Scientific Visualization Applications

Caroline Paxson, Jolly Chen, Nobuko Nathan, Alan Su, and Jiang Wu
(Professor M. R. Stonebraker)
(ARO) DAALO3-91-G-0 183, (ARPA) DABT63-92-C-0007, Digital Equipment, and (NSF) IRI-91-07455

As part of the Sequoia 2000 project, researchers in global change would like to be able to write data manipulation and visualization programs in a graphical, intuitive way. We are defining a tool that will allow these researchers to describe the series of operations to be performed on their data using a dataflow description based on drawing graphs on the screen. This boxes and arrows notation is then compiled into a series of queries on a database containing their data. The series of manipulations is called a recipe, and recipes may be stored arid retrieved from the database as well. Once the researcher has an image of the data on the screen, we will provide a joystick interface to move through the data.

In this way the researcher can request new data representing a different point of view in space or a different resolution (i.e., a zoom or pan). Thus we intend to provide a system where a researcher can define how data is to be visualized, move around the visualization space, and animate the data. This work will provide the impetus for advances in query compilation, query optimization, and database performance. To date we have defined a prototype system that has been used by the California Snow Survey to visualize their data. Continuing research focuses on the areas of transaction protection, a hybrid push/pull model for execution of a diagram, and the use of rules and triggers in this new context.

RASQL: A Graphical Query Language

Jolly Chen
(Professor M. R. Stonebraker)
California Department of Water Resources, Construction Engineering Research Laboratory, (ARPA) DABTO3-92-C-0007, Digital Equipment, Hewlett-Packard, Hughes, Information Storage, Metrum, MCI, MICRO, (NSF) IRI-91-07455, PictureTel, Research Systems, Inc., Science Applications International Corporation, and United States Geological Survey

To support the programming needs of scientific users of databases, we propose a new user interface paradigm called recipe management. Recipes are visual programs that can be easily constructed and executed in an integrated data management and visualization environment Once large collections of such recipes are constructed, users will want to query for recipes just as they currently query for other types of data. We introduce a graphical query language for recipes called RASQL (pronounced rascal). Because recipes are complex, graph-based data structures, it is difficult to specify queries for recipes in traditional, textual query languages. RASQL is a graphical query language that is an extension to the graphical recipe syntax itself. A RASQL query is itself a recipe with special query components and additional execution semantics. In this way, a user who is familiar with creating recipes can easily use a similar language to query for recipes. While RASQL is tailored specifically toward querying recipes, techniques and experiences learned from designing and implementing RASQL may extend to query languages for nontraditional data types in general.

Content-Based Retrieval from a Photo-CD Image Database

Virginia Ogle
(Professor M. R. Stonebraker)
California State Department of Water Resources

The California Department of Water Resources has a library of around 500,000 slides, negatives, and prints of natural resources around the state. These images are currently being digitized using Kodak's photo-CD technology and stored in the POSTGRES database as Large Objects. Each photo-CD image is 3 to 4 MB of data, so the resulting database will be quite large. We are exploring methods for retrieving images from this database using visual clues such as color content and simple patterns contained in the image. This allows us to find a particular photo without relying entirely on keywords that must be predetermined and stored in advance. It will also enable us to filter out a smaller subset of images to browse from a much larger set. For example, a request for photos of the Tuolumne River at sunset will produce a handful of the desired photos from the hundreds of images that contain the phrase "Tuolumne River" in the textual description stored with the image. The techniques we are using include color analysis as well as information based on the context of the image library itself.

Optimization of Large Multidimensional Arrays

Sunita Sarawagi
(Professor M. R. Stonebraker)
(ARO) 91-G-0183, (ARPA) T63-92-C-0007, Digital Equipment, and (NSF) IRI-91-07455

Large multidimensional arrays are widely used in scientific and engineering database applications. Because of the large storage requirements for such arrays, they are usually allocated to tertiary storage devices. Achieving high performance in spite of the non-uniform access times and the high latency of such storage devices requires good allocation strategies. The traditional method of storing a multidimensional array is linear allocation whereby the array is laid out linearly by a nested traversal of the axes in some predetermined order. This strategy, which mimics the way FORTRAN stores arrays in main memory, can lead to disastrous results on a secondary or tertiary memory device.

We studied methods of organizing arrays to make their access on secondary and tertiary memory devices fast and efficient. We have developed four techniques for doing this: 1) storing the array in multidimensional "chunks" to minimize the number of blocks fetched, 2) reordering the chunked array to minimize seek distance between accessed blocks, 3) maintaining redundant copies of the array, each organized for a different chunk size and ordering, and 4) partitioning the array onto platters of a tertiary memory device so as to minimize the number of platter switches. Our measurements on real datasets obtained from global change scientists demonstrate that accesses on arrays organized using the above techniques are often an order-of-magnitude faster than on the original unoptimized data.

Mariposa: A New Distributed Data Manager

Paul M. Aoki and Robert Devine
(Professor M. R. Stonebraker)
(ARO) DAALO3-91-G-O183, (ARPA) DABT63-92-C-0007, Digital Equipment, and (NSF) IRI-91-07455

We are in the process of designing Mariposa, a distributed data manager that supports location-transparent access to data across multiple hosts and storage levels and automatic (rule-driven) redistribution of data. Redistribution may include replication and fragmentation as well as simple movement, and the rules that control redistribution may be based on arbitrary criteria. Unlike existing systems, Mariposa does not assume that data placement is relatively static. One critical issue in any distributed data manager is the optimization and (efficient) control of query processing. Our current design assumes that a query execution plan (QEP) produced by a conventional single-site query optimizer can be post-processed to take advantage of the parallelism made possible by data replication/fragmentation. A runtime query execution system then distributes portions of the post-processed QEP (subqueries) onto hosts throughout the network. We are developing a distributed heuristic for the placement of subqueries.

An additional complication involves optimization of data layout. The Mariposa design assumes that each query execution engine and rule-driven storage management system generate data accesses that may result in automatic data redistribution. Hence, query execution engines and storage managers (as well as storage managers at different hosts) must arbitrate their data movement requests in order to prevent thrashing. We are developing an arbitration mechanism that uses "scoring functions" to determine the desirability of a given data movement.

Additional details, including our initial design proposal, are contained in [1].

[1]: M. R. Stonebraker, P.M. Aoki, R. Devine, W. Litwin, and M. Olson, Mariposa: A New Architecture for Distributed Data, UC Berkeley Computer Science Division, Report No. UCB/CSD 93/31, May 1993.

Concurrency Control Performance in Databases

Vigyan Singhal
(Professor A. J. Smith)
Digital Equipment, IBM, MICRO, and (NSF) CCR-91-17028

We are studying the performance aspects on concurrency control in database systems. Our first task is to characterize the transaction workloads in real database systems. Concurrency control has been a popular area of research but few studies have used real data to drive their models; this is mainly due to lack of such data. We have obtained traces of locking activity at several large commercial database systems. Using these, we plan to characterize the transaction behavior in the present commercial systems. We are also using these traces to investigate how realistic the modeling assumptions have been in the past, and more importantly, to what extent these assumptions affect the simulation results. We have characterized most modeling assumptions and have performed experiments to check their validity in their traces. Some of the important modeling assumptions deviate from our measurements of the traces. We have provided measurements from the traces that give data that may be used to model concurrency control more realistically.

We have also developed a contention model to estimate the contention in a transaction workload just by statically measuring the locking events from the traces. Besides indicating the amount of inherent contention in the workload and determining the presence of data bottlenecks, this model is also helpful in analyzing the sensitivity of the modeling assumptions. Our preliminary experiments indicate that some modeling assumptions and simplifications can seriously affect the performance resu Its.

For future research, we need to study what the transaction workloads mean for multiprocessor database systems. Data and transaction partitioning based on the workloads is an interesting problem. Also, depending on the transaction workloads, some multiprocessor system configurations might be more optimal than others. Also, the transaction data we provide can be used to simulate integrated buffer management and concurrency control schemes.