[Berkeley] [EECS Dept.] [CS Division] [Database Research] [Seminar]


U.C. Berkeley Database Group Seminar
Topics from Spring 1997


Index for other semesters

May 29, 1997

Title: Dynamic Itemset Counting and Implication Rules for Market Basket Data 
Authors: Sergey Brin, Rajeev Motwani, Jeffrey Ullman and Shalom Tsur 
Appeared in: SIGMOD'97 
Presenter: Megan Thomas, UCB 
Abstract: We consider the problem of analyzing market-basket data and present several important contributions. First, we present a new algorithm for finding large itemsets which uses fewer passes over the data than classic algorithms, and yet uses fewer candidate itemsets than methods based on sampling. We investigate the idea of item reordering, which can improve the low-level efficiency of the algorithm. Second, we present a new way of generating "implication rules," which are normalized based on both the antecedent and the consequent and are truly implications (not simply a measure of co-occurence), and we show how they produce more intuitive results than other methods. Finally, we show how different characteristics of real data, as opposed to synthetic data, can dramatically affect the performance of the system and the form of the results. 


May 22, 1997

Title: Partial Answers for Unavailable Data Sources 
Speaker: Dr. Anthony Tomasic, INRIA 
Abstract: Many heterogeneous database system products and prototypes exist today; they are deployed in a wide variety of environments. All existing systems (that we are aware of) suffer from an Achilles' heel: if some sources of data are unavailable when accessed by the heterogeneous database system, these systems either silently ignore the unavailable source of data or generate an error, i.e. they ungracefully fail. This behavior is improper in environments where there is a non-negligible probability that data sources cannot be accessed (e.g., Internet). In this talk, we propose a novel approach to this issue where, in presence of unavailable data sources, the answer to a query is a ``partial answer.'' A partial answer is itself a query that results from the partial evaluation of the original query; it is composed of the data that have been obtained and processed during the evaluation and of a representation of the unfinished work to be done. Partial answers can be resubmitted to the system in order to obtain the final answer to the original query, or another partial answer. Additionally, the application program can extract information from a partial answer through the use of a secondary query. This secondary query is called a ``parachute query.'' In this talk we give a taxonomy of types of partial answers and parachute queries. We present algorithms for the evaluation of queries in presence of unavailable data sources, and we describe an implementation. 

Joint work with Philippe Bonnet (BULL/Dyade, Grenoble, France) 

Note: the seminar will be held in 320 Soda this week. 


April 24, 1997

Title: Modeling and Identifying Bottlenecks in EOSDIS 
Speaker: Melody Y. Ivory, UC Berkeley
Abstract: Many parallel application areas that exploit massive parallelism, such climate modeling, require massive storage systems for the archival and retrieval of data sets. As such, advances in massively parallel computation must be coupled with advances in mass storage technology in order to satisfy I/O constraints of these applications. We demonstrate the effects of such I/O-computation disparity for a representative distributed information system, NASA's Earth Observing System Distributed Information System (EOSDIS). We use performance modeling to identify bottlenecks in EOSDIS for two representative user scenarios from climate change research. 

Joint work with Jim Demmel and Sharon Smith. 

Related URLs: [HTML] [PS


April 16, 1997

Title: Enhancing Complex Data in the Predator DBMS 
Speaker: Prof. Praveen Seshadri, Cornell University 
Abstract: Database systems need to support rich data types efficiently. Images, audio, video, time-series, matrices, and documents are examples of complex types supported by current object-relational DBMS technology. However, performance has lagged behind functionality. Queries involving such data are often processed very inefficiently. This is because the semantics of the new data types are ignored in query processing and optimization. 

I present Enhanced Abstract Data Types (E-ADTs) as a paradigm for supporting complex data types in database systems. While each data type remains encapsulated, it presents enhanced interfaces that expose query processing semantics to the database system. E-ADTs are the primary focus of Predator, a full-fledged DBMS being developed at Cornell. The support for E-ADTs in Predator leads to an efficient yet highly extensible system. I present many categories of query optimizations that can be performed using E-ADTs, and show the effects of some of them on query performance. This work opens up an exciting range of novel research opportunities -- I will touch briefly on some of them. 


April 10, 1997

Title: Datacube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals 
Authors: J. Gray, A. Bosworth, A. Layman and H. Pirahesh 
Appeared in: ICDE'96 
Title: An Array-Based Algorithm for Simultaneous Multidimensional Aggregates 
Authors: Zhao, Deshpande and Naughton 
Appeared in: SIGMOD'97 
Presenter: Megan Thomas, UC Berkeley
Related URLs: http://www.cs.wisc.edu/~zhao/sigmod97.ps


April 3, 1997

Title: Fundamental Techniques for Order Optimization 
Authors: D. Simmen, E. Shekita and T. Malkemus 
Appeared in: SIGMOD'96 
Presenter: Kirk Wylie, UC Berkeley 
Abstract: The paper describes techniques for extending the System R algorithm to treat "order by" and "group by" clauses as interesting orders in its optimization. A good time is guaranteed for all. 


March 20, 1997

Title: The Infomaster System 
Presenter: Keith Vanderveen, UC Berkeley 
Related URLs: http://logic.stanford.edu/papers/iiaw95-infomaster.ps


March 13, 1997

Title: The SR-Tree: An Index Structure for High-Dimensional Nearest-Neighbor Queries 
Speaker: Norio Katayama, Japan National Center for Science Information Systems 
Abstract: Recently, similarity queries on feature vectors have been widely used to perform content-based retrieval of images. To apply this technique to large databases, it is required to develop multidimensional index structures supporting nearest neighbor queries efficiently. The SS-tree had been proposed for this purpose and is known to outperform other index structures such as the R*-tree and the K-D-B-tree. One of its most important features is that it employs bounding spheres rather than bounding rectangles for the shape of regions. However, we demonstrate in this paper that bounding spheres occupy much larger volume than bounding rectangles with high-dimensional data and that this reduces search efficiency. To overcome this drawback, we propose a new index structure called the SR-tree (Sphere/Rectangle-tree) which integrates bounding spheres and bounding rectangles. A region of the SR-tree is specified by the intersection of a bounding sphere and a bounding rectangle. Incorporating bounding rectangles permits neighborhoods to be partitioned into smaller regions than the SS-tree and improves the disjointness among regions. This enhances the performance on nearest neighbor queries especially for high-dimensional and non-uniform data which can be practical in actual image/video similarity indexing. We include the performance test results that verify this advantage of the SR-tree and show that the SR-tree outperforms both the SS-tree and the R*-tree. 


February 20, 1997

Title: MMM: A Web-Based Method Management System for Using Software Modules Remotely 
Speaker: Prof. Oliver Gunther, Humboldt University and ICSI 
Abstract: The World Wide Web has been highly successful as a tool for the distributed publishing and sharing of online documents. This raises the question whether the distributed authoring and execution of software modules can be supported in a similar manner. In this talk we present a middleware for executing software modules over the Internet. Unlike the well-known Java paradigm, our MMM approach ist not based on porting. Instead, we provide software authors with a toolkit that enables them to wrap their modules in a generic manner and make them available on the Internet. The software modules can thus remain on the machine where they have been implemented originally. Once a module has been checked in, other users can execute it through a Web browser, sending the required parameters to the server where the module resides. 

The first MMM prototype was geared to applications in statistical computing, where software modules are typically written in scripting languages such as Mathematica or Matlab. Current work includes a CORBA-based reimplementation and customizations for other application areas, including environmental modeling, spatial database benchmarking, operations research (scheduling), and corporate finance. 

Joint work with R. Mueller, P. Schmidt, H. Bhargava and R. Krishnan 

NOTE: This talk will be held at 1:00 rather than the usual 12:30, in the Wozniak Lounge on the 4th floor of Soda. The talk is not going to be a brown-bag setting, so please eat beforehand. 


February 13, 1997

Title: Distributed Enforcement of Integrity Constraints 
Speaker: Prof. Opher Etzion, Technion - Israel Institute of Technology 
Abstract: Integrity constraint enforcement was identified as one of the most labor intensive tasks in database programming. A proposed way to reduce the amount of programming in this area has been to employ a collection of of high-level abstractions, called stabilizers, that are translated into active rules. This approach has been discussed before by several works. Such a solution can ease the difficult task of programming distributed databases, but its implementation should take into consideration some obstacles that arise in coordinating the different part of distributed database applications. 

The talk reports on a research that concentrates on the structure and execution of distributed transactions that contain stabilizers. Two issues are discussed: 

  • The issue of atomicity assurance of such transaction (atomic commitment protocol) which extends the 2PC protocol. 
  • The issue of coordination among different stabilizers 
Joint work with Boris Dahav. 


February 6, 1997

Title: The Common Object Request Broker Architecture: Goals and Achievements 
Presenter: Arno Jacobsen, UC Berkeley/ICSI 
Abstract: The Common Object Request Broker Architecture (CORBA) defined by the Object Management Group is an emerging open standard for interconnection software components across heterogeneous distributed computing environments. 

These components may be implemented in different programming languages and paradigms, as well as, run under different operating systems. Interoperability across system boundaries is achieved by specifying interfaces in an universal interface definition language (IDL) imposed by the standard. 

CORBA platforms automate common network programming tasks such as object registration, activation and object location, parameter marshalling, de-marshalling, and operation dispatching. 

I will briefly present an overview of the architecture and its goals. Emphasis will be put on 'non-standard' concepts such as language mappings and dynamic method invocation. I will summarize OMG's efforts in incorporating OODBMS access in the standard and compare it with the efforts undergoing in the Objet Database Management Group ( ODMG). 

Related URLs: Background for the discussion can be found in the following paper:  The web page contains further pointer to CORBA related material. 


January 30, 1997

Title: Efficient Optimistic Concurrency Control Using Loosely Synchronized Clocks 
Authors: Atul Adya, Barbara Liskov, Robert Gruber, and Umesh Maheshwari 
Appeared in: SIGMOD'95 
Presenter: Adam Sah, UC Berkeley 
Related URLs: http://www.pmg.lcs.mit.edu/~adya/


January 23, 1997

Title: Broadcast Disks 
Authors: Acharya, Alonso, Franklin and Zdonik 
Appeared in: various 
Presenter: Joe Hellerstein, UC Berkeley 
Abstract: I was reminded that we had set a goal of covering "outside" material in the seminar this semester -- i.e. papers from outside the db group at Berkeley. Hopefully we will also draw in some guest speakers as well. 

I will cover the main ideas of the "Broadcast Disks" work of Acharya, Franklin and Zdonik. I'll focus on the intial paper by Acharya, Alonso, Franklin and Zdonik, which appeared in SIGMOD '95, and also touch on some of the extensions in Acharya/Franklin/Zdonik's later papers from ICDE '96 and VLDB '96 on prefetching and updates. Apparently there will be a SIGMOD '97 paper as well, but it's not available yet. 

NOTE: New Time! The DBLunch seminar is held from 12:30 noon to 2:00 PM every Thursday, room 606 Soda Hall. 

Related URLs: http://www.cs.brown.edu/people/sa/pub.html


Ryan Huebsch, huebsch@cs.berkeley.edu, Last modified: 24-Sept-03