Advanced Topics in Computer Systems  
Joe Hellerstein & Timothy Roscoe

Query Processing

Themes: Dataflow and Dataflow Operators


A quick primer on the relational algebra

Dataflow Architecture: Iterators

Basic question: how do you turn the algebraic operator abstraction into a programming construct?  Iterators.

An OO view of iterators:

An iterator is a superclass for all dataflow operators.  It has three methods: The output type of next() is well typed.  In a SQL system, this is a SQL record in some schema, e.g. (integer, numeric(10,2), varchar, float).

Iterators are strung together into a dataflow by parameterizing them to have other iterators as inputs, and then having their methods invoke the input iterators' methods.  Usually invokes and sometimes  Usually invokes or nothing.  Usually parent.close() invokes child.close().

An iterator also has a number of private data members, including all the "input" iterators it invokes (usually one or two of these), and any state it maintains.  The types of the records from input iterators need to match the operator semantics in generating the output type.  In a DBMS, this is all set up automatically at query parsing/optimization time.  Extensible "optimizer generators" have been developed that let you declare these rules in a high-level way (not as optimizer code).

A state-machine/event-handler view:

An iterator is an event handler for the events open/next/close/tuple-received.  The work in subclassing the iterator operator is to identify and maintain the states that it can be in across next() events.

An eventflow graph is constructed to string together the dataflow, and type matching of tuples needs to be managed as part of constructing that graph.

A note on Synchronous/Asynchronous (a.k.a. Pull/Push)

Typically, iterators in a single-site query processor make synchronous calls to their children.  This is a "pull" model, like sucking data through a straw.  For now, it's fine to think of iterators that way.

As we'll see later this semester, in some contexts it can be more efficient to "push" the data between iterators (e.g. if there's a network in between!).  This means that tuples arrive asynchronously.  Some kind of buffering and/or rate-matching has to happen to keep operators in sync.

Typical DB Dataflow Operators

Unary Operations

Binary Operations: Union

Without duplicate elimination, a very simple iterator (SQL's "UNION ALL").  Only issue is that both inputs have to have identical schemas.

With duplicate elimination (SQL's plain "UNION"), you need to eliminate duplicates too.  Simple implementation: do UNION ALL as one iterator, and dup-elim as another; make sure to use them in order if somebody wants UNION without dups.

Binary Operations: Join

Join algorithms apply to almost any form of combining multiple collections, except for UNION.

Some commonly used join variants (alternative logical algebra operators):

These logical algebra operators can be implemented as minor variations on the same join algorithms!

The "Guy Lohman Test" for join operators (a.k.a. pipelining of intermediate results):

The "Joe Hellerstein Test" for join operators (a.k.a. full pipelining):

Nested Loops Join

for each tuple r of R
    for each tuple s of S
       if rs satisfies join predicate
          output rs
R is the outer relation (left)

S is the inner relation (right)

Refinement 1: Block Nested Loops Join

for each block BR of R
    for each tuple s of S
        for each tuple r of BR such that rs satisfies join predicate
               output rs
Further refinements to nested loops:

Refinement 2: Index Nested Loops Join

for each tuple r of R
    probe index over S;
    output all s s.t. rs satisfies join predicate;
          FROM cities, forests
         WHERE cities.boundary overlaps forests.boundary;

(Sort-)Merge Join

Works for equijoin, "band" joins

we will assume here you know how to do a 2-pass sort [see Knuth or Shapiro]

idea: if R & S are sorted on the join column(s), we can "simply" merge them

But duplicates complicate things (as usual).
R join S 

sort R;
sort S;;;
r =;
s =;
while (r != NULL && s != NULL) {
    while (r.c < s.c)
        r =;
    if (r.c = s.c) {
        while (r.c = s.c) {
            output rs;
            r =;
        "rewind" r to first tuple of R where r.c = s.c;
        s =;
    while (r.c > s.c)
        s =;
Refinement: do merging of R & S while merging sort runs from each. Note: Sort-merge is symmetric, so "outer", "inner", "left", "right" are arbitrary

Classic Hash Join

Works for equijoin only

Let R be the smaller relation

Hash R into VM;
for each tuple of S
    probe hashtable over R and output all rs s.t. s.c = r.c

Simple Hash Join

Repeat steps 1 and 2 with R, S replaced by the passed over tuples.



Grace Hash Join

Phase 1 is repeated with S in place of R




Hybrid Hash

Original paper: DeWitt, Katz, Olken, Shapiro, Stonebraker, Wood, SIGMOD '84.

Phase 2 as in Grace Join

Hybrid Hash Advantages:

Disadvantages: Handling Partition Overflow: Additional Tricks: Filters

Idea: build a filter based on R so you stage less of S to disk