|Advanced Topics in Computer Systems
|Joe Hellerstein & Eric Brewer
System R & DBMS Overview
- late 60's: network (CODASYL) & hierarchical (IMS)
Charles Bachman: father of CODASYL predecessor IDS (at GE in early
1960's). Turing award #8 (1973, between Dijkstra and Knuth.)
- IMS Example: Suppliers
record type and Parts record
type. One is parent, one is child.
Problems include redundancy
and requirement of having a parent (deletion anomalies.)
- Low-level ``record-at-a-time'' DML, i.e. physical data
reflected in DML (no data independence).
- 1970: Codd's paper. The most influential paper in DB research.
Set-at-a-time DML. Data independence. Allows for schema and physical
storage structures to change under the covers. Papadimitriou: "as clear
a paradigm shift as we can hope to find in computer science").
Edgar F. Codd: Turing
award #18 (1981, between Hoare and Cook).
- Data Independence, both logical and physical.
- "Hellerstein's Inequality":
- Need data independence when dapp/dt << denvironment/dt
- Other scenarios where this holds?
- This is an early, powerful instance of two themes: levels of
indirection and adaptivity
- mid 70's: wholesale adoption of Codd's vision in 2 full-function
(sort of) prototypes. Ancestors of essentially all today's commercial
: UCB 1974-77
- a ``pickup team'', including Stonebraker & Wong. early and
pioneering. Begat Ingres Corp (CA), CA-Universe, Britton-Lee, Sybase,
MS SQL Server, Wang's
PACE, Tandem Non-Stop SQL.
- System R
: IBM San Jose (now Almaden)
- 15 PhDs. Begat IBM's SQL/DS & DB2, Oracle, HP's Allbase,
Non-Stop SQL. System R arguably got more stuff ``right'', though there
lots of information passing between both groups
- Jim Gray: Turing Award #22 (1998, between Englebart and
- Lots of Berkeley folks on the System R team, including Gray
CS PhD @ Berkeley), Bruce Lindsay, Irv Traiger, Paul McJones, Mike
Mario Schkolnick, Bob Selinger , Bob Yost. See
- Both were viable starting points, proved practicality of
approach. Direct example of theory -> practice!
- ACM Software Systems award #6 shared by both
- Stated goal of both systems was to take Codd's theory and turn
it into a workable system as fast as CODASYL but much easier to use and
- Interestingly, Stonebraker received ACM SIGMOD Innovations
#1 (1991), Gray #2 (1992), whereas Gray got the Turing first.
- early 80's: commercialization of relational systems
- Ellison's Oracle beats IBM to market by reading white papers.
- IBM releases multiple RDBMSs, settles down to DB2. Gray
R), Jerry Held (Ingres) and others join Tandem (Non-Stop SQL), Kapali
Eswaran starts EsVal, which begets HP Allbase and Cullinet
- Relational Technology Inc (Ingres Corp), Britton-Lee/Sybase,
PACE grow out of Ingres group
- CA releases CA-Universe, a commercialization of Ingres
- Informix started by Cal alum Roger Sippl (no pedigree to
- Teradata started by some Cal Tech alums, based on proprietary
networking technology (no pedigree to software research, though see
parallel DBMS discussion
later in semester!)
- mid 80's: SQL becomes "intergalactic standard''.
- DB2 becomes IBM's flagship product.
- IMS "sunseted''
- today: network & hierarchical are legacy systems (though
- IMS still widely used in banking, airline reservations, etc. A cash cow for IBM
- Relational commoditized -- Microsoft, Oracle and IBM fighting
over bulk of market. NCR Teradata, Sybase, HP Nonstop and a few others vying to
survive on the fringes. OpenSource coming of age, including MySQL, PostgreSQL,
Ingres (reborn). BerkeleyDB is an embedded transactional store
that is widely used as well, but now owned by Oracle.
- XML and object-oriented features have pervaded the relational products as
both interfaces and data types, further complicating the "purity" of Codd's vision.
Database View of Applications
Big, complex record-keeping applications like SAP and PeopleSoft, which
over a DBMS. "Enterprise applications" to keep businesses
Typically client-server (a Sybase "invention") with a form-based API.
Focus on resource management secondary to focus on data management.
- ERP: Enterprise Resource Planning (SAP, Baan, PeopleSoft,
- CRM: Customer Relationship Management (E.phiphany, Siebel,
Oracle, IBM, etc.)
- SCM: Supply Chain Management (Trilogy, i2, Oracle, IBM, etc.)
- Human Resources, Direct Marketing, Call Center, Sales Force
Automation, Help Desk, Catalog Management, etc.
- Many e-business versions of order-entry/procurement and the
(i.e. web serving packages for this)
A main job of DBMS is to make these kinds of apps easy to write
Notes on System R
See the System
reunion notes for fun background and gossip.
Some "systems chestnuts" seen in this paper:
Some important points of discussion
- Expect to throw out the 1st version of the system
- Expose internals via standard external interfaces whenever
possible (e.g. catalogs as tables, the /proc filesystem, etc.)
- Optimize the fast path
- Interpretation vs. compilation vs. intermediate "opcode"
- Component failure as a common case to consider
- Problems arising from interactions between replicated
functionality (in this case, scheduling)
The "Convoy Problem":
- Flexibility of storage mechanisms: domains/inversions vs.
heap-files/indexes. Use of TID-lists common in modern DBMS.
Why be doctrinaire? What about Data Independence? One
answer: you have to get transactions right for each "access method".
- System R was often CPU bound (though that's a coarse-grained
assertion -- really means NOT disk-bound). This is common today
in well-provisioned DBMSs as well. Why?
- DBMSs are not monolithic designs, really. The RSS stuff
does intertwine locking and logging into disk access, indexing and
buffer management. But RDS/RSS boundary is clean, and RDS is
- Access control via views: a deep application of data
- Transactional contribution of System R (both conceptual and
implementation) as important as relational model, and in fact should be
decoupled from relational model.
- A classic cross-level scheduling interaction. We will see
- Poorly explained in the paper.
I have always found this presentation confusing. A number of issues are
going on. The first two have to do with interactions between OS and DB
The last issue is that
- the OS can preempt a database "process" even when that
process is holding a high-traffic DB lock
- DB processes sitting in DB lock queues use up their OS
scheduling quanta while waiting (this is poorly explained in the text).
Once they use up all their quanta, they get removed from the
"multiprogramming set" and go to "sleep" -- and an expensive OS
dispatch is required to run them again.
For a high-traffic DB lock, DB processes will request it on average
every T timesteps. If the OS preempts a DB process holding that
high-traffic DB lock, the queue behind the lock grows to include
almost all DB processes. Moreover, the queue is too long to be drained
in T timesteps, so it's "stable" -- every DB process queues back up
before the queue drains, and they burn up their quanta pointlessly
waiting in line, after which they are sent to sleep. Hence each DB
process is awake for only one grant of the lock and the subsequent T
timesteps of useful work, after which they queue for the lock again,
waste their quanta in the queue, and are put back to sleep. The result
is that the useful work per OS waking period is about T timesteps,
which is shorter than the overhead of scheduling -- hence the system
is thrashing. Note that the solution attacks the only issue in the
comment that can be handled without interating with the OS: (c) the
FCFS DB lock queue. The explanation here is confusing, I think. The
point is to always allow any one of the DB processes currently in the
"multiprogramming set" to immediately get the lock without burning a
quantum waiting on the lock -- hence no quanta are wasted on waiting,
so each process spends almost all of its alloted quanta on "real
work". Note that the proposed policy achieves this without needing to
know which processes are in the OS' multiprogramming set.
- the DBMS uses a FCFS wait queue for the lock.
System R and INGRES are the prototypes that all current systems are
based on. Basic architecture is the same, and many of the ideas
remain in today's systems:
Stuff they got wrong:
- optimizer remains, largely unchanged
- RSS/RDS divide remains in many systems
- SQL, cursors, duplicates, NULLs, etc.
- the pros and cons of duplicates. Alternatives?
- pros and cons of NULLs. Alternatives?
- grouping and aggregation
- updatable single-table views
- begin/end xact at user level
- savepoints and restore
- catalogs as relations
- flexible security (GRANT/REVOKE)
- integrity constraints
- triggers (!!)
- compiled queries
- Nest-loop & sort-merge join, all joins 2-way
- dual logs to support log failure
- shadow paging
- predicate locking
- SQL language
- duplicate semantics
- subqueries vs. joins
- outer join
- rejected hashing
Relational System Architecture
Databases are BIG pieces of software. Typically somewhat hard to
modularize. Lots of system design decisions at the macro and micro
scale. We will focus mostly on micro decisions -- and hence ideas
reusable outside DBMSs -- in
subsequent lectures. Here we focus on macro design.
Disk management choices:
- file per relation
- big file in file system
- raw device
- process per user
- query rewrite
- query executor
- access methods
- buffer manager
- lock manager
- log/recovery manager
OS and DBMS: Philosophical Similarities & Differences
So, a main goal of this class is to work from both of these directions,
the lessons from each, and ask how to use these lessons today both
and OUTSIDE the context of these historically separate systems.
- UNIX paper: "The most important job of UNIX is to provide a file
- UNIX and System R are both "information management" systems!
- both also provide programming APIs for code
- Difference in focus: Bottom-Up (elegance of system) vs. Top-Down
(elegance of semantics)
- main goal of UNIX was to provide a small elegant set
of mechanisms, and have programmers (i.e. C programmers) build on top
of it. As an
example, they are proud that "No large 'access method' routines are
to insulate the programmer from system calls". After all, OS
its role as presenting hardware to computer programmers.
- main goal of System R and Ingres was to provide a complete
that insulated programmers (i.e. SQL + scripting) from the system,
guaranteeing clearly defined semantics of data and
After all, DBMS views its role as managing data for application
- Affects where the complexity goes!
- to the system, or the end-programmer?
- question: which is better? in what environments?
- follow-on question: are internet systems more like
apps (traditionally built on DBMSs) or scientific/end-user apps
built over OSes and files)? Why?
- Achilles' heel of RDBMSs: a closed box
- Cannot leverage technology without going through the full SQL
- One solution: make the system extensible, convince the world
download code into the DBMS
- Another solution: componentize the system (hard? RSS is hard
to bust up, due to transaction semantics)
- Achilles' heel of OSes: hard to decide on the "right" level of
- As we'll read, many UNIX abstractions (e.g. virtual memory)
*too* much detail, messing up semantics. On the other hand, too
a level can cause too much programmer burden, and messes up the
of the system
- One solution: make the system extensible, convince the fancy
to download code into the OS
- Another solution: componentize the system (hard, due to
- Traditionally separate communities, despite subsequently clear
- UNIX paper: "We take the view that locks are neither necessary
sufficient, in our environment, to prevent interference between users
the same file. They are unnecessary because we are not faced with
single-file data bases maintained by independent processes."
- System R: "has illustrated the feasibility of compiling a very
high-level data sublanguage, SQL, into machine-level code".