Advanced Topics in Computer Systems |
Lecture 3
|
Joe Hellerstein & Timothy Roscoe
|
|
System R & DBMS Overview
DBMS History
- late 60's: network (CODASYL) & hierarchical (IMS)
DBMS.
Charles Bachman: father of CODASYL predecessor IDS (at GE in early
1960's). Turing award #8 (1973, between Dijkstra and Knuth.)
- IMS Example: Suppliers
record type and Parts record
type. One is parent, one is child. Each instance identified
by a Hierarchical Sequence Key (HSK). Problems include redundancy
and requirement of having a parent (deletion anomalies.)
- Low-level ``record-at-a-time'' DML, i.e. physical data
structures
reflected in DML (no data independence).
- 1970: Codd's paper. The most influential paper in DB research.
Set-at-a-time DML. Data independence. Allows for schema and physical
storage structures to change under the covers. Papadimitriou: "as clear
a paradigm shift as we can hope to find in computer science").
Edgar F. Codd: Turing
award #18 (1981, between Hoare and Cook).
- Focus on separation into 3 levels: physical storage, logical
schema,
and multiple views.
- Results in two kinds of independence: physical data
independence,
and logical data independence
- Physical allows you change the storage layout without
affecting
apps. "DBMS makes right", hence totally invisible to everyone
except in terms of performance.
- Logical encapsulates apps from changes in logical
schema.
"DBA makes right" (except in simple cases, when DBMS makes
right). Additional flaws with this (can't update views in
general). Hence visible to everyone.
- Data independence CRITICAL for database evolution -- and
note that databases live
and evolve for a LONG time!
- Generalizing the lesson of data independence:
- Need data indendence when dapp/dt << denvironment/dt, where environment is things
like physical storage, machine speed, machine workload, etc.
- Other scenarios where this holds?
- This is an early, powerful instance of two themes: levels of
indirection (this is a biggie)
and adaptivity (done in DBMS on a coarse grain til recent research.)
- mid 70's: wholesale adoption of Codd's vision in 2 full-function
(sort of) prototypes. Ancestors of essentially all today's commercial
systems
- Ingres
: UCB 1974-77
- a ``pickup team'', including Stonebraker & Wong. early and
pioneering. begat Ingres Corp (CA), CA-Universe, Britton-Lee, Sybase,
MS SQL Server, Wang's
PACE, Tandem Non-Stop SQL.
- System R
: IBM San Jose (now Almaden)
- 15 PhDs. begat IBM's SQL/DS & DB2, Oracle, HP's Allbase,
Tandem
Non-Stop SQL. System R arguably got more stuff ``right'', though there
was
lots of information passing between both groups
- Jim Gray: Turing Award #22 (1998, between Englebart and
Brooks)
- Lots of Berkeley folks on the System R team, including Gray
(1st
CS PhD @ Berkeley), Bruce Lindsay, Irv Traiger, Paul McJones, Mike
Blasgen,
Mario Schkolnick, Bob Selinger , Bob Yost. See
http://www.mcjones.org/System_R/SQL_Reunion_95/sqlr95-Prehisto.html#Index71.
- Both were viable starting points, proved practicality of
relational
approach. Direct example of theory -> practice!
- ACM Software Systems award #6 shared by both
- Stated goal of both systems was to take Codd's theory and turn
it into a workable system as fast as CODASYL but much easier to use and
maintain
- Interestingly, Stonebraker received ACM SIGMOD Innovations
Award
#1 (1991), Gray #2 (1992), whereas Gray got the Turing first.
- early 80's: commercialization of relational systems
- Ellison's Oracle beats IBM to market by reading white papers.
- IBM releases multiple RDBMSs, settles down to DB2. Gray
(System
R), Jerry Held (Ingres) and others join Tandem (Non-Stop SQL), Kapali
Eswaran starts EsVal, which begets HP Allbase and Cullinet
- Relational Technology Inc (Ingres Corp), Britton-Lee/Sybase,
Wang
PACE grow out of Ingres group
- CA releases CA-Universe, a commercialization of Ingres
- Informix started by Cal alum Roger Sippl (no pedigree to
research).
- Teradata started by some Cal Tech alums, based on proprietary
networking technology (no pedigree to software research, though see
parallel DBMS discussion
next semester!)
- mid 80's: SQL becomes "intergalactic standard''.
- DB2 becomes IBM's flagship product.
- IMS "sunseted''
- today: network & hierarchical are legacy systems (though
commonly
in use!)
- relational commoditized -- Microsoft, Oracle and IBM fighting
over
much of the market. IBM bought Informix, to become solid
competitor for #1 with Oracle. MS a strong #3. NCR
Teradata is distant #4, and a few others (e.g. Sybase) vying to
survive on the fringes. OpenSource coming of age, with MySQL
doing well at the low end, and PostgreSQL maturing at the more
functional end. BerkeleyDB is an embedded transactional store
that is widely used as well
- Computer Associates bought Ingres and largely killed it
- Oracle bought DEC Rdb and largely killed it
- Informix bought Illustra (commercial Postgres), incorporated
it.
- IBM bought Informix, will mostly sunset
it.
- Microsoft bootstrapped by buying code to Sybase SQL Server
(hence
Ingres pedigree)
- "object-relational" is mainstream (courtesy Postgres)
- extension of relational we may talk about as time permits
- think relational + extensible types and downloadable code +
rule
systems + non-normalized data
- Informix, IBM, Oracle, Sybase all claim to sell it
today.
Microsoft in "the next version".
- XML has pervaded the relational products as both an interface
and a data type, further complicating the "purity" of Codd's vision.
Relational System Architecture
Databases are BIG pieces of software. Typically somewhat hard to
modularize. Lots of system design decisions at the macro and micro
scale. We will focus mostly on micro decisions -- and hence ideas
reusable outside DBMSs -- in
subsequent lectures. Here we focus on macro design.
Disk management choices:
- file per relation
- big file in file system
- raw device
Process Model:
- process per user
- server
- multi-server
Basic modules:
- parser
- query rewrite
- optimizer
- query executor
- access methods
- buffer manager
- lock manager
- log/recovery manager
Query Rewriter
- Flattens views (why?)
- may change query semantics (constraints, protection, etc.)
Optimizer
- large space of equivalent relational plans
- pick one that's going to be "optimal" (?)
- produces either an interpretable plan tree, or compiled code
Executor
- modules to perform relation operations like joins, sorts,
aggregations, etc.
- calls Access Methods for operations on base and temporary
relations
Access Methods
- uniform relational interface (open, get next), a la INGRES
AMI,
System R's RSS
- multiple implementations: heap, B-tree, extensible hashing
Buffer Manager
- Intelligent user-level disk cache
- must interact with transaction manager & lock manager
- Virtual memory does not cut it! (we'll discuss this at
length)
Lock Manager
- must efficiently support lock table
- System R architecture influential:
- physical and logical locks treated uniformly
- multiple granularity of locks
- set intent locks at high levels
- we will study this in more detail later (Gray)
- deadlock handling: detection
Log/Recovery Manager
- "before/after" log on values
- checkpoint/restore facility for quick recovery
- Redo/Undo on restore
- Support soft crashes off disk, hard crashes off tape.
- System R?s shadowing is too slow. Use Write-Ahead Logging!
(WAL)
More on this to come.
- Hard to get right!
Notes on System R
See the System
R
reunion notes for fun background and gossip.
Some "systems chestnuts" seen in this paper:
- Expect to throw out the 1st version of the system
- Expose internals via standard external interfaces whenever
possible (e.g. catalogs as tables, the /proc filesystem, etc.)
- Optimize the fast path
- Interpretation vs. compilation vs. intermediate "opcode"
representations
- Component failure as a common case to consider
- Problems arising from interactions between replicated
functionality (in this case, scheduling)
Some important points of discussion
- Flexibility of storage mechanisms: domains/inversions vs.
heap-files/indexes. Use of TID-lists common in modern DBMS.
Why be doctrinaire? What about Data Independence? One
answer: you have to get transactions right each time.
- System R was often CPU bound (though that's a coarse-grained
assertion -- really means NOT disk-bound). This is common today
in well-provisioned DBMSs as well. Why?
- DBMSs are not monolithic designs, really. The RSS stuff
does intertwine locking and logging into disk access, indexing and
buffer management. But RDS/RSS boundary is clean, and RDS is
decomposable.
- Access control via views: a deep application of data
independence?!
- Transactional contribution of System R (both conceptual and
implementation) as important as relational model, and in fact should be
decoupled from relational model.
The "Convoy Problem":
- A classic cross-level scheduling interaction. We will see
this again!
- Poorly explained in the paper.
I have always found this presentation confusing. A number of issues are
going on. The first two have to do with interactions between OS and DB
scheduling:
- the OS can preempt a database "process" even when that
process is holding a high-traffic DB lock
- DB processes sitting in DB lock queues use up their OS
scheduling quanta while waiting (this is poorly explained in the text).
Once they use up all their quanta, they get removed from the
"multiprogramming set" and go to "sleep" -- and an expensive OS
dispatch is required to run them again.
The last issue is that
- the DBMS uses a FCFS wait queue for the lock.
For a high-traffic DB lock, DB processes will request it on average
every T timesteps. If the OS preempts a DB process holding that
high-traffic DB lock, the queue behind the lock grows to include
almost all DB processes. Moreover, the queue is too long to be drained
in T timesteps, so it's "stable" -- every DB process queues back up
before the queue drains, and they burn up their quanta pointlessly
waiting in line, after which they are sent to sleep. Hence each DB
process is awake for only one grant of the lock and the subsequent T
timesteps of useful work, after which they queue for the lock again,
waste their quanta in the queue, and are put back to sleep. The result
is that the useful work per OS waking period is about T timesteps,
which is shorter than the overhead of scheduling -- hence the system
is thrashing. - Note that the solution attacks the only issue in the
previous
comment that can be handled without interating with the OS: (c) the
FCFS DB lock queue. The explanation here is confusing, I think. The
point is to always allow any one of the DB processes currently in the
"multiprogramming set" to immediately get the lock without burning a
quantum waiting on the lock -- hence no quanta are wasted on waiting,
so each process spends almost all of its alloted quanta on "real
work". Note that the proposed policy achieves this without needing to
know which processes are in the OS' multiprogramming set.
System R and INGRES are the prototypes that all current systems are
based on. Basic architecture is the same, and many of the ideas
remain in today's systems:
- optimizer remains, largely unchanged
- RSS/RDS divide remains in many systems
- SQL, cursors, duplicates, NULLs, etc.
- the pros and cons of duplicates. Alternatives?
- pros and cons of NULLs. Alternatives?
- grouping and aggregation
- updatable single-table views
- begin/end xact at user level
- savepoints and restore
- catalogs as relations
- flexible security (GRANT/REVOKE)
- integrity constraints
- triggers (!!)
- clustering
- compiled queries
- B-trees
- Nest-loop & sort-merge join, all joins 2-way
- dual logs to support log failure
Stuff they got wrong:
- shadow paging
- predicate locking
- SQL language
- duplicate semantics
- subqueries vs. joins
- outer join
- rejected hashing
Database View of Applications
Big, complex record-keeping applications like SAP and PeopleSoft, which
run
over a DBMS. "Enterprise applications" to keep businesses
humming.
A smattering:
- ERP: Enterprise Resource Planning (SAP, Baan, PeopleSoft,
Oracle,
IBM, etc.)
- CRM: Customer Relationship Management (E.phiphany, Siebel,
Oracle, IBM, etc.)
- SCM: Supply Chain Management (Trilogy, i2, Oracle, IBM, etc.)
- Human Resources, Direct Marketing, Call Center, Sales Force
Automation, Help Desk, Catalog Management, etc.
- Many e-business versions of order-entry/procurement and the
above
(i.e. web serving packages for this)
- Integrated B2B versions of all this rolling as fast as possible
A main job of DBMS is to make these kinds of apps easy to write
OS and DBMS: Philosophical Similarities & Differences
- UNIX paper: "The most important job of UNIX is to provide a file
system".
- UNIX and System R are both "information management" systems!
- both also provide programming APIs for code
- Difference in focus: Bottom-Up (elegance of system) vs. Top-Down
(elegance of semantics)
- main goal of UNIX was to provide a small elegant set
of mechanisms, and have programmers (i.e. C programmers) build on top
of it. As an
example, they are proud that "No large 'access method' routines are
required
to insulate the programmer from system calls". After all, OS
viewed
its role as presenting hardware to computer programmers.
- main goal of System R and Ingres was to provide a complete
system
that insulated programmers (i.e. SQL + scripting) from the system,
while
guaranteeing clearly defined semantics of data and
queries.
After all, DBMS views its role as managing data for application
programmers.
- Affects where the complexity goes!
- to the system, or the end-programmer?
- question: which is better? in what environments?
- follow-on question: are internet systems more like
enterprise
apps (traditionally built on DBMSs) or scientific/end-user apps
(traditionally
built over OSes and files)? Why?
- Achilles' heel of RDBMSs: a closed box
- Cannot leverage technology without going through the full SQL
stack
- One solution: make the system extensible, convince the world
to
download code into the DBMS
- Another solution: componentize the system (hard? RSS is hard
to bust up, due to transaction semantics)
- Achilles' heel of OSes: hard to decide on the "right" level of
abstraction
- As we'll read, many UNIX abstractions (e.g. virtual memory)
hide
*too* much detail, messing up semantics. On the other hand, too
low
a level can cause too much programmer burden, and messes up the
elegance
of the system
- One solution: make the system extensible, convince the fancy
apps
to download code into the OS
- Another solution: componentize the system (hard, due to
protection
issues)
- Traditionally separate communities, despite subsequently clear
need
to integrate
- UNIX paper: "We take the view that locks are neither necessary
nor
sufficient, in our environment, to prevent interference between users
of
the same file. They are unnecessary because we are not faced with
large,
single-file data bases maintained by independent processes."
- System R: "has illustrated the feasibility of compiling a very
high-level data sublanguage, SQL, into machine-level code".
So, a main goal of this class is to work from both of these directions,
cull
the lessons from each, and ask how to use these lessons today both
within
and OUTSIDE the context of these historically separate systems.