Advanced Topics in Computer Systems |
|
Joe Hellerstein & Timothy Roscoe |
|
Multi-Granularity Locking
Some implementation background (not in reading):
- Locks vs. Latches
- locks assure logical (i.e. xactional) consistency of data. They
are implemented via a lock table, held for a long time (e.g. 2PL), and part
of the deadlock detection mechanism.
- latches are like semaphores. They assure physical consistency
of data and resources, which are not visible at the transactional level (e.g.
latch a frame in a buffer pool). They are not subject to 2PL (usually
held for a very short time), and the deadlock detector doesn't know about
them (and therefore...?).
- Acquiring a latch is much cheaper than acquiring a lock (10s vs. 100s
of instructions in the no-conflict case)
- latch control info is always in VM in a fixed place
- lock tables are dynamically managed (can have a lock per tuple!),
so data structure mgmt on lock/release is more complex
- Lock table is a hashed main-mem structure
- Lock/Unlock must be atomic operations (protected by latches or critical
sections)
- typically costs several hundred instructions to lock/unlock an item
- Lock Upgrades
- suppose T1 has an S lock on P, T2 is waiting to get X lock on P, and
now T3 wants S lock on P. Do we grant T3 an S lock?
No! (starvation, unfair, etc.) So...
- Manage FCFS queue for each locked object with outstanding requests
- all xacts that are adjacent and compatible are a compatible group
- The front group is the granted group
- group mode is most restrictive mode amongst group members
- Conversions: often want to convert (e.g. S to X for "test and modify"
actions). Should conversions go to back of queue?
- No! Instant deadlock. So put conversions right after granted group.
- More notes for the curious...
- Most DBMSs have multiple lock request types, in part to help with
that. E.g. Recall from ARIES:
- conditional vs. unconditional lock requests (are you willing to wait?)
- instant vs. manual vs. commit duration lock requests (wait, but plan
to hold for different amounts of time)
- More below on non-2PL locking schemes, and the consistency guarantees they provide
Gray, et al.: Granularity of Locks
Theme: Correctness and performance
- Granularity tradeoff: small granularity (e.g. field of a tuple) means
high concurrency but high overhead. Large granularity (e.g. file) means low
overhead but low concurrency.
- Possible granularities:
- DB
- Areas
- Files
- Pages
- Tuples (records)
- fields of tuples
- Want hierarchical locking, to allow "large" xacts to set large locks,
"small" xacts to set small locks
- Problem: T1 S-locks a record in a file, then T2 X-locks the whole
file. How can T2 discover that T1 has locked the record?
- Solution: "Intention" locks
|
NL |
IS |
IX |
S |
SIX |
X |
NL |
|
|
|
|
|
|
IS |
|
|
|
|
|
|
IX |
|
|
|
|
|
|
S |
|
|
|
|
|
|
SIX |
|
|
|
|
|
|
X |
|
|
|
|
|
|
- IS and IX locks
- T1 obtains S lock on record in question, but first gets IS lock on
file.
- Now T2 cannot get X lock on file
- However, T3 can get IS or S lock on file (the reason for distinguishing
IS and IX: if there were only I, T3 couldn’t get an S lock on file)
- For higher concurrency, one more mode: SIX. Intuitively, you read
all of the object but only lock some subparts. Allows concurrent IS locks
(IX alone would not). Note: gives S access, so disallows IX to others.
- requires that xacts lock items root to leaf in the hierarchy, unlock
leaf to root
- generalization to DAG of resources: X locks all paths to a node, S
locks at least one.
Notes on Deadlock
- In OS world, deadlock usually due to errors or overloads
- In DB/xact world with 2PL, they’re inherent.
- Most common causes:
- Usual DB solution: deadlock detection (other option: deadlock avoidance)
- Use "waits-for" graph and look for cycles
- Abort/restart some transaction along the cycle
- Empirically, in actual systems the waits-for graph shows:
- cycles fairly rare
- cycle length usually 2, sometimes 3, virtually never >3
- use DFS to find cycles
- When to look for cycles?
- whenever a xact blocks
- periodically
- never (use timeouts)
Typically, in centralized systems, run deadlock detection
whenever blocking occurs. Arguably this is cheap: most recently
blocked transaction T must be the one that caused the deadlock, so just DFS
starting from T.
In distributed systems, even this can be expensive, so often do periodic
detection (more later)
- Who to restart ("victim selection")
- current blocker
- youngest XACT
- least resources used
- fewest locks held (common)
- fewest number of restarts
Degrees of Consistency (a/k/a Isolation Levels)
Despite all the discussion of ACID, sometimes it's nice to relax
semantic
guarantees for the sake of performance. The goal is to let
individual
transactions choose sensible ways of doing this without messing
up the database or the other transactions that do care. I.e. preserve
everybody's desired isolation.
Gray, et al.: Degrees of Consistency
First, a definition: A write is committed when transaction if
finished;
otherwise, the write is dirty.
A Locking-Based Description of Degrees of Consistency:
This is not actually a description of the degrees, but rather of how
to achieve them via locking. But it’s (operationally) well-defined.
-
Degree 0: set short write locks on updated items ("short" = length of
action)
-
Degree 1: set long write locks on updated items ("long" = EOT)
-
Degree 2: set long write locks on updated items, and short read locks
on
items read
-
Degree 3: set long write and read locks
A Dirty-Data Description of Degrees of Consistency
Transaction T "sees degree X consistency" if...
- Degree 0: All writes are atomic
- Degree 1:
-
T sees degree 0 consistency, and
- T does not commit any writes before EOT
- In sum, no conflicts will ever occur involving T's writes
- IN ANSI-SQL Standard this is called READ UNCOMMITTED for obvious
reasons.
- Degree 2:
- T sees degree 1 consistency, and
- T does not read dirty data of other transactions (we don't cause
WSRT conflicts)
- In ANSI-SQL this is called READ COMMITTED for obvious reasons.
- Degree 3:
- T sees degree 2 consistency, and
- Other transactions do not dirty any data read by T before T completes
(no RTWS) conflicts.
- In sum, no conflicts allowed!
- In ANSI-SQL this is called REPEATABLE READ.
Examples of Inconsistencies prevented by Various Degrees
- Garbage data:
- Concurrent actions WT1(X) || WT2(X)
- Neither T1 nor T2 controls the value of X; could in fact be
interleaved or corrupted.
- Solution: force every transaction to set short
write locks (degree 0). Note that device-level enforcement is
insufficient for objects bigger than the device interface
(e.g. files, tables)
- Lost Updates:
- RT1(X); WT2(X); WT1(X);
CommitT1; CommitT2.
- Alternate Problem: suppose T2 aborts!
- Solution: force every transaction set long
write locks (degree 1). Think of them as "advisory" for
reads (up to each transaction), but binding for writes.
- Dirty Reads:
- WT1(X); RT2(X); AbortT1.
- Now T2's read is bogus.
- Solution: set long write locks and short read locks
(degree 2). Attractive for long-running queries
(esp. aggregates).
- Note that there are no dirty reads in the Lost Update
example above.
- Unrepeatable ("Fuzzy") reads:
- RT1(X); WT2(X); CommitT2; R_T1(X)
- Now T2 has read 2 different values of X.
- Solution: long read locks (degree 3).
- Phantoms:
- RT1(range [x-y]); InsertT2(Z | x <
Z < y); CommitT2; R_T1(range [x-y])
- Z is a "phantom" data item (eek!)
- Solution: ?? Note that Degree 3 row-level locking
does not prevent phantoms!
- In ANSI-SQL, if you prevent phantoms and provide repeatable
read, it's called a SERIALIZABLE consistency level.
A note: ANSI-SQL standard talks about the phenomena of Dirty
Read, Non-Repeatable Read, and Phantoms.
To reiterate: if everybody sees at least Degree 1, than different
transactions can choose what degree they wish to "see" without
worry. I.e. can have a mixture of levels of consistency.
Two more consistency levels used in practice
- Snapshot Isolation (called SERIALIZABLE by Oracle). Widely
used (and now implemented in MS SQL Server 2005 too.)
- Goal: never delay R/O transactions, and try to ensure
that they see consistent data if updating transactions are
serializable.
- Idea: Conceptually, transaction Ti reads data from the
committed state of the DB as of the time start(Ti), except for its
own writes, which it reads from its own versions. Make sure this
works with indexes, etc.
- Detail: Have to obey the "First Committer Wins" rule on
any WW conflicts (including index updates).
- Implementation:Oracle keeps a "rollback segment" for
getting old versions. First Committer rule enforced by write
locks at update time, and waiters on the lock abort (Oracle's
"serialization error") if the holder commits. This is really
"First Updater Wins", since locking means that update order
induces a commit order.
- Semantic problem: this is not really Serializable.
Why? Hint: the basic problem is that reads all happen at start
transaction, and writes at commit. With 2 interleaved
transactions, this can lead to to violations of multi-object
consistency rules (e.g. think about the constraint
sum(checking,savings) > 0). Can even cause unserializable
behavior exhibited by read-only transactions! This is poorly
understood in the field. (IBM documents this problem whenever
they get the chance!)
- Cursor Stability. Designed to prevent lost updates.
This is an operational definition that
takes into account the interactive API to transactions. SQL
provides a row-at-a-time API to in-flight queries, called
"cursors". Cursor stability sets a read-lock on a row as long as
the cursor is "pointing" to the row. Note that a user can open
multiple cursors in the same transaction, and can thereby do operational
concurrency control at application level (yuck!)
Adya, et al. : Generalized Isolation Levels
Gray et al's
definitions (and the resulting ANSI standards) are not
implementation-independent, and semantics are really only well-defined
in the operational locking definition. Note the "dirty-data"
descriptions above forbidding conflicts entirely, rather than
describing bad interactions of conflicts. Seems too
restrictive, no?
Goal: find implementation-independent semantic isolation levels which
are as permissive as possible (most possible schedules allowed).
Key insight: many dependencies are multi-object.
Capture those, and you'll get the right semantics. To be as
permissive as we can, we talk about versions of objects
coexisting (don't count on the Haerder/Reuter DIRECT setting.)
Conflicts in Adya's Serialization Graphs:
-
Read dependencies (WR):
-
Def'n: Ti changes the matches of Tj for Tj's predicate-based
reads
if Ti installs a new version that either adds to or deletes from one of
Tj's read predicates.
-
Tj directly read depends on Ti if Ti directly installs some
version
that Tj subsequently reads (item-read-depends), or if Ti changes
the matches of Tj.
-
A way to think about predicate-based reads or phantoms: imagine that
every
object is versioned, there are "ghost versions" of objects before
they're
born and after they die. Predicate-based reads look at all latest
versions of all objects (including ghosts), and what matters is the set
of objects that do or do not match. See example on page 7 of the
paper for Hpred-read
-
Anti-dependencies (RW)
-
Def'n: Tj overwrites a predicate-based read by Ti if Tj
installs
a new version of an object in the read by Ti that changes the matches
of
Ti.
-
Tj directly anti-depends on Ti if Ti reads an object and Tj
installs
the next version of that object ("first invalidator"), or if Tj's
install of a version changes the matches of a read by Ti.
-
Write dependencies (WW)
-
Tj directly write-depends on Ti if Ti installs a version of an
object, and Tj installs the next version ("first invalidator").
(Note there's no predicate-based version of write dependencies, since
database updates are read-predicate/write-tuple. What might it mean
to write a predicate?).
Direct Serialization Graph:
-
nodes are committed xacts
-
edges are directed by time, labeled WR, RW, or WW.
Now we can talk about isolation in terms of serialization graphs and
"histories"
("schedules"), NOT implementation.
Adya's Isolation Levels
Try to Generalize Gray's. PL-x = "Portable Level x".
-
PL-1: try to serialize based on writes alone (ignore reads) -- ensure
that updates are not interleaved.
-
Specifically, no cycles containing only WW edges are allowed.
-
Note: more permissive than Gray's Degree 1: allows concurrenct
xacts to modify the same object...just ensure no cycles.
-
But obvious locking implementation of PL-1: long write locks.
-
PL-2: avoid aborted reads
-
Specifically, no aborted reads, no intermediate reads, and
no circular information flow (dependency-edge cycles in
serialization graph)
-
Note: cascaded aborts and/or commit delays prevent aborted
reads
-
Note: no intermediate reads means that committed xactions read only
committed data
-
More permissive than Degree 2, allows reads from
uncommitted xacts (final versions of eventually-committed xacts!)
- Obvious locking implementation: long write locks, short read locks
- PL-3: prevent xactions from committing if they perform inconsistent
reads or writes
-
Specifically, do PL-2 AND no anti-dependency cycles
-
More permissive than Degree 3, since a modifying xact can
update an object previously read by another uncommited xact
-
Obvious locking implementation: 2PL
-
PL-2.99: generalize "REPEATABLE READ"
- REPEATABLE READ was long locks for everything except predicate reads
(phantoms can happen)
-
PL-2.99 is PL-2 + no cycles with item-anti-dependency edges.
Cycles involving predicate-read overwrites are OK.
Modeling Mixed-Mode Systems
-
Mixed serialization graph, only contains dependencies relevant to a
transaction's
level (or obligatory dependencies required by other transactions'
modes).