[Berkeley] [EECS Dept.] [CS Division] [Database Research] [Seminar]
![]() |
U.C. Berkeley Database Group Seminar |
Title: | Dynamic Itemset Counting and Implication Rules for Market Basket Data |
Authors: | Sergey Brin, Rajeev Motwani, Jeffrey Ullman and Shalom Tsur |
Appeared in: | SIGMOD'97 |
Presenter: | Megan Thomas, UCB |
Abstract: | We consider the problem of analyzing market-basket data and present several important contributions. First, we present a new algorithm for finding large itemsets which uses fewer passes over the data than classic algorithms, and yet uses fewer candidate itemsets than methods based on sampling. We investigate the idea of item reordering, which can improve the low-level efficiency of the algorithm. Second, we present a new way of generating "implication rules," which are normalized based on both the antecedent and the consequent and are truly implications (not simply a measure of co-occurence), and we show how they produce more intuitive results than other methods. Finally, we show how different characteristics of real data, as opposed to synthetic data, can dramatically affect the performance of the system and the form of the results. |
Title: | Partial Answers for Unavailable Data Sources |
Speaker: | Dr. Anthony Tomasic, INRIA |
Abstract: | Many heterogeneous database system products and prototypes exist
today; they are deployed in a wide variety of environments. All existing
systems (that we are aware of) suffer from an Achilles' heel: if some
sources of data are unavailable when accessed by the heterogeneous
database system, these systems either silently ignore the unavailable
source of data or generate an error, i.e. they ungracefully fail. This
behavior is improper in environments where there is a non-negligible
probability that data sources cannot be accessed (e.g., Internet). In this
talk, we propose a novel approach to this issue where, in presence of
unavailable data sources, the answer to a query is a ``partial answer.'' A
partial answer is itself a query that results from the partial evaluation
of the original query; it is composed of the data that have been obtained
and processed during the evaluation and of a representation of the
unfinished work to be done. Partial answers can be resubmitted to the
system in order to obtain the final answer to the original query, or
another partial answer. Additionally, the application program can extract
information from a partial answer through the use of a secondary query.
This secondary query is called a ``parachute query.'' In this talk we give
a taxonomy of types of partial answers and parachute queries. We present
algorithms for the evaluation of queries in presence of unavailable data
sources, and we describe an implementation.
Joint work with Philippe Bonnet (BULL/Dyade, Grenoble, France) Note: the seminar will be held in 320 Soda this week. |
Title: | Modeling and Identifying Bottlenecks in EOSDIS |
Speaker: | Melody Y. Ivory, UC Berkeley |
Abstract: | Many parallel application areas that exploit massive parallelism, such
climate modeling, require massive storage systems for the archival and
retrieval of data sets. As such, advances in massively parallel
computation must be coupled with advances in mass storage technology in
order to satisfy I/O constraints of these applications. We demonstrate the
effects of such I/O-computation disparity for a representative distributed
information system, NASA's Earth Observing System Distributed Information
System (EOSDIS). We use performance modeling to identify bottlenecks in
EOSDIS for two representative user scenarios from climate change
research.
Joint work with Jim Demmel and Sharon Smith. |
Related URLs: | [HTML] [PS] |
Title: | Enhancing Complex Data in the Predator DBMS |
Speaker: | Prof. Praveen Seshadri, Cornell University |
Abstract: | Database systems need to support rich data types efficiently. Images,
audio, video, time-series, matrices, and documents are examples of complex
types supported by current object-relational DBMS technology. However,
performance has lagged behind functionality. Queries involving such data
are often processed very inefficiently. This is because the
semantics of the new data types are ignored in query processing and
optimization.
I present Enhanced Abstract Data Types (E-ADTs) as a paradigm for supporting complex data types in database systems. While each data type remains encapsulated, it presents enhanced interfaces that expose query processing semantics to the database system. E-ADTs are the primary focus of Predator, a full-fledged DBMS being developed at Cornell. The support for E-ADTs in Predator leads to an efficient yet highly extensible system. I present many categories of query optimizations that can be performed using E-ADTs, and show the effects of some of them on query performance. This work opens up an exciting range of novel research opportunities -- I will touch briefly on some of them. |
Title: | Datacube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals |
Authors: | J. Gray, A. Bosworth, A. Layman and H. Pirahesh |
Appeared in: | ICDE'96 |
Title: | An Array-Based Algorithm for Simultaneous Multidimensional Aggregates |
Authors: | Zhao, Deshpande and Naughton |
Appeared in: | SIGMOD'97 |
Presenter: | Megan Thomas, UC Berkeley |
Related URLs: | http://www.cs.wisc.edu/~zhao/sigmod97.ps |
Title: | Fundamental Techniques for Order Optimization |
Authors: | D. Simmen, E. Shekita and T. Malkemus |
Appeared in: | SIGMOD'96 |
Presenter: | Kirk Wylie, UC Berkeley |
Abstract: | The paper describes techniques for extending the System R algorithm to treat "order by" and "group by" clauses as interesting orders in its optimization. A good time is guaranteed for all. |
Title: | The Infomaster System |
Presenter: | Keith Vanderveen, UC Berkeley |
Related URLs: | http://logic.stanford.edu/papers/iiaw95-infomaster.ps |
Title: | The SR-Tree: An Index Structure for High-Dimensional Nearest-Neighbor Queries |
Speaker: | Norio Katayama, Japan National Center for Science Information Systems |
Abstract: | Recently, similarity queries on feature vectors have been widely used to perform content-based retrieval of images. To apply this technique to large databases, it is required to develop multidimensional index structures supporting nearest neighbor queries efficiently. The SS-tree had been proposed for this purpose and is known to outperform other index structures such as the R*-tree and the K-D-B-tree. One of its most important features is that it employs bounding spheres rather than bounding rectangles for the shape of regions. However, we demonstrate in this paper that bounding spheres occupy much larger volume than bounding rectangles with high-dimensional data and that this reduces search efficiency. To overcome this drawback, we propose a new index structure called the SR-tree (Sphere/Rectangle-tree) which integrates bounding spheres and bounding rectangles. A region of the SR-tree is specified by the intersection of a bounding sphere and a bounding rectangle. Incorporating bounding rectangles permits neighborhoods to be partitioned into smaller regions than the SS-tree and improves the disjointness among regions. This enhances the performance on nearest neighbor queries especially for high-dimensional and non-uniform data which can be practical in actual image/video similarity indexing. We include the performance test results that verify this advantage of the SR-tree and show that the SR-tree outperforms both the SS-tree and the R*-tree. |
Title: | MMM: A Web-Based Method Management System for Using Software Modules Remotely |
Speaker: | Prof. Oliver Gunther, Humboldt University and ICSI |
Abstract: | The World Wide Web has been highly successful as a tool for the
distributed publishing and sharing of online documents. This raises the
question whether the distributed authoring and execution of software
modules can be supported in a similar manner. In this talk we present a
middleware for executing software modules over the Internet. Unlike the
well-known Java paradigm, our MMM approach ist not based on porting.
Instead, we provide software authors with a toolkit that enables them to
wrap their modules in a generic manner and make them available on the
Internet. The software modules can thus remain on the machine where they
have been implemented originally. Once a module has been checked in, other
users can execute it through a Web browser, sending the required
parameters to the server where the module resides.
The first MMM prototype was geared to applications in statistical computing, where software modules are typically written in scripting languages such as Mathematica or Matlab. Current work includes a CORBA-based reimplementation and customizations for other application areas, including environmental modeling, spatial database benchmarking, operations research (scheduling), and corporate finance. Joint work with R. Mueller, P. Schmidt, H. Bhargava and R. Krishnan NOTE: This talk will be held at 1:00 rather than the usual 12:30, in the Wozniak Lounge on the 4th floor of Soda. The talk is not going to be a brown-bag setting, so please eat beforehand. |
Title: | Distributed Enforcement of Integrity Constraints |
Speaker: | Prof. Opher Etzion, Technion - Israel Institute of Technology |
Abstract: | Integrity constraint enforcement was identified as one of the most
labor intensive tasks in database programming. A proposed way to reduce
the amount of programming in this area has been to employ a collection of
of high-level abstractions, called stabilizers, that are translated into
active rules. This approach has been discussed before by several works.
Such a solution can ease the difficult task of programming distributed
databases, but its implementation should take into consideration some
obstacles that arise in coordinating the different part of distributed
database applications.
The talk reports on a research that concentrates on the structure and execution of distributed transactions that contain stabilizers. Two issues are discussed:
|
Title: | The Common Object Request Broker Architecture: Goals and Achievements |
Presenter: | Arno Jacobsen, UC Berkeley/ICSI |
Abstract: | The Common Object Request Broker Architecture (CORBA) defined by the
Object Management Group is an emerging open standard for interconnection
software components across heterogeneous distributed computing
environments.
These components may be implemented in different programming languages and paradigms, as well as, run under different operating systems. Interoperability across system boundaries is achieved by specifying interfaces in an universal interface definition language (IDL) imposed by the standard. CORBA platforms automate common network programming tasks such as object registration, activation and object location, parameter marshalling, de-marshalling, and operation dispatching. I will briefly present an overview of the architecture and its goals. Emphasis will be put on 'non-standard' concepts such as language mappings and dynamic method invocation. I will summarize OMG's efforts in incorporating OODBMS access in the standard and compare it with the efforts undergoing in the Objet Database Management Group ( ODMG). |
Related URLs: | Background for the discussion can be found in the following paper: The web page contains further pointer to CORBA related material. |
Title: | Efficient Optimistic Concurrency Control Using Loosely Synchronized Clocks |
Authors: | Atul Adya, Barbara Liskov, Robert Gruber, and Umesh Maheshwari |
Appeared in: | SIGMOD'95 |
Presenter: | Adam Sah, UC Berkeley |
Related URLs: | http://www.pmg.lcs.mit.edu/~adya/ |
Title: | Broadcast Disks |
Authors: | Acharya, Alonso, Franklin and Zdonik |
Appeared in: | various |
Presenter: | Joe Hellerstein, UC Berkeley |
Abstract: | I was reminded that we had set a goal of covering "outside" material
in the seminar this semester -- i.e. papers from outside the db group at
Berkeley. Hopefully we will also draw in some guest speakers as
well.
I will cover the main ideas of the "Broadcast Disks" work of Acharya, Franklin and Zdonik. I'll focus on the intial paper by Acharya, Alonso, Franklin and Zdonik, which appeared in SIGMOD '95, and also touch on some of the extensions in Acharya/Franklin/Zdonik's later papers from ICDE '96 and VLDB '96 on prefetching and updates. Apparently there will be a SIGMOD '97 paper as well, but it's not available yet.
|
Related URLs: | http://www.cs.brown.edu/people/sa/pub.html |