Granularity of Locks and Degrees of Consistency

Advanced Topics in Computer Systems
Joe Hellerstein & Timothy Roscoe

Multi-Granularity Locking

Some implementation background (not in reading):

Locks vs. Latches

locks assure logical (i.e. xactional) consistency of data. They are implemented via a lock table, held for a long time (e.g. 2PL), and part of the deadlock detection mechanism.
latches are like semaphores. They assure physical consistency of data and resources, which are not visible at the transactional level (e.g. latch a frame in a buffer pool). They are not subject to 2PL (usually held for a very short time), and the deadlock detector doesn't know about them (and therefore...?).
Acquiring a latch is much cheaper than acquiring a lock (10s vs. 100s of instructions in the no-conflict case)

latch control info is always in VM in a fixed place
lock tables are dynamically managed (can have a lock per tuple!), so data structure mgmt on lock/release is more complex

Lock table is a hashed main-mem structure
Lock/Unlock must be atomic operations (protected by latches or critical sections)
typically costs several hundred instructions to lock/unlock an item
Lock Upgrades

suppose T1 has an S lock on P, T2 is waiting to get X lock on P, and now T3 wants S lock on P. Do we grant T3 an S lock?

No! (starvation, unfair, etc.) So...

Manage FCFS queue for each locked object with outstanding requests
all xacts that are adjacent and compatible are a compatible group
The front group is the granted group
group mode is most restrictive mode amongst group members

Conversions: often want to convert (e.g. S to X for "test and modify" actions). Should conversions go to back of queue?
No! Instant deadlock. So put conversions right after granted group.
More notes for the curious...

Most DBMSs have multiple lock request types, in part to help with that. E.g. Recall from ARIES:

conditional vs. unconditional lock requests (are you willing to wait?)
instant vs. manual vs. commit duration lock requests (wait, but plan to hold for different amounts of time)
More below on non-2PL locking schemes, and the consistency guarantees they provide

Gray, et al.: Granularity of Locks

Theme: Correctness and performance

Granularity tradeoff: small granularity (e.g. field of a tuple) means high concurrency but high overhead. Large granularity (e.g. file) means low overhead but low concurrency.
Possible granularities:

DB
Areas
Files
Pages
Tuples (records)
fields of tuples

Want hierarchical locking, to allow "large" xacts to set large locks, "small" xacts to set small locks
Problem: T1 S-locks a record in a file, then T2 X-locks the whole file. How can T2 discover that T1 has locked the record?
Solution: "Intention" locks

	NL	IS	IX	S	SIX	X
NL
IS
IX
S
SIX
X

IS and IX locks
T1 obtains S lock on record in question, but first gets IS lock on file.
Now T2 cannot get X lock on file
However, T3 can get IS or S lock on file (the reason for distinguishing IS and IX: if there were only I, T3 couldn’t get an S lock on file)
For higher concurrency, one more mode: SIX. Intuitively, you read all of the object but only lock some subparts. Allows concurrent IS locks (IX alone would not). Note: gives S access, so disallows IX to others.
requires that xacts lock items root to leaf in the hierarchy, unlock leaf to root
generalization to DAG of resources: X locks all paths to a node, S locks at least one.

Notes on Deadlock

In OS world, deadlock usually due to errors or overloads
In DB/xact world with 2PL, they’re inherent.
Most common causes:

Differing access orders

T2: X-lock Q

T1: X-lock Q // block waiting for T2

T2: X-lock P // block waiting for T1

lock-mode upgrades

T2: S-lock P

T1: convert S-lock on P to X-lock // block

T2: convert S-lock on P to X-lock // block

Usual DB solution: deadlock detection (other option: deadlock avoidance)

Use "waits-for" graph and look for cycles
Abort/restart some transaction along the cycle
Empirically, in actual systems the waits-for graph shows:

cycles fairly rare
cycle length usually 2, sometimes 3, virtually never >3
use DFS to find cycles

When to look for cycles?

whenever a xact blocks
periodically
never (use timeouts)

Typically, in centralized systems, run deadlock detection whenever blocking occurs. Arguably this is cheap: most recently blocked transaction T must be the one that caused the deadlock, so just DFS starting from T.
In distributed systems, even this can be expensive, so often do periodic detection (more later)

Who to restart ("victim selection")

current blocker
youngest XACT
least resources used
fewest locks held (common)
fewest number of restarts

Degrees of Consistency (a/k/a Isolation Levels)

Despite all the discussion of ACID, sometimes it's nice to relax semantic guarantees for the sake of performance. The goal is to let individual transactions choose sensible ways of doing this without messing up the database or the other transactions that do care. I.e. preserve everybody's desired isolation.

Gray, et al.: Degrees of Consistency

First, a definition: A write is committed when transaction if finished; otherwise, the write is dirty.

A Locking-Based Description of Degrees of Consistency:

This is not actually a description of the degrees, but rather of how to achieve them via locking. But it’s (operationally) well-defined.

Degree 0: set short write locks on updated items ("short" = length of action)
Degree 1: set long write locks on updated items ("long" = EOT)
Degree 2: set long write locks on updated items, and short read locks on items read
Degree 3: set long write and read locks

A Dirty-Data Description of Degrees of Consistency

Transaction T "sees degree X consistency" if...

Degree 0: All writes are atomic
Degree 1:
- T sees degree 0 consistency, and
- T does not commit any writes before EOT
- In sum, no conflicts will ever occur involving T's writes
- IN ANSI-SQL Standard this is called READ UNCOMMITTED for obvious reasons.
Degree 2:
- T sees degree 1 consistency, and
- T does not read dirty data of other transactions (we don't cause W_SR_T conflicts)
- In ANSI-SQL this is called READ COMMITTED for obvious reasons.
Degree 3:
- T sees degree 2 consistency, and
- Other transactions do not dirty any data read by T before T completes (no R_TW_S) conflicts.
- In sum, no conflicts allowed!
- In ANSI-SQL this is called REPEATABLE READ.

Examples of Inconsistencies prevented by Various Degrees

Garbage data:
- Concurrent actions W_T1(X) || W_T2(X)
- Neither T1 nor T2 controls the value of X; could in fact be interleaved or corrupted.
- Solution: force every transaction to set short write locks (degree 0). Note that device-level enforcement is insufficient for objects bigger than the device interface (e.g. files, tables)
Lost Updates:
- R_T1(X); W_T2(X); W_T1(X); Commit_T1; Commit_T2.
- Alternate Problem: suppose T2 aborts!
- Solution: force every transaction set long write locks (degree 1). Think of them as "advisory" for reads (up to each transaction), but binding for writes.
Dirty Reads:

W_T1(X); R_T2(X); Abort_T1.
Now T₂'s read is bogus.
Solution: set long write locks and short read locks (degree 2). Attractive for long-running queries (esp. aggregates).
Note that there are no dirty reads in the Lost Update example above.

Unrepeatable ("Fuzzy") reads:

R_T1(X); W_T2(X); Commit_T2; R__T1(X)
Now T₂ has read 2 different values of X.
Solution: long read locks (degree 3).

Phantoms:

R_T1(range [x-y]); Insert_T2(Z | x < Z < y); Commit_T2; R__T1(range [x-y])
Z is a "phantom" data item (eek!)
Solution: ?? Note that Degree 3 row-level locking does not prevent phantoms!
In ANSI-SQL, if you prevent phantoms and provide repeatable read, it's called a SERIALIZABLE consistency level.

A note: ANSI-SQL standard talks about the phenomena of Dirty Read, Non-Repeatable Read, and Phantoms.
To reiterate: if everybody sees at least Degree 1, than different transactions can choose what degree they wish to "see" without worry. I.e. can have a mixture of levels of consistency.

Two more consistency levels used in practice

Snapshot Isolation (called SERIALIZABLE by Oracle). Widely used (and now implemented in MS SQL Server 2005 too.)
- Goal: never delay R/O transactions, and try to ensure that they see consistent data if updating transactions are serializable.
- Idea: Conceptually, transaction Ti reads data from the committed state of the DB as of the time start(Ti), except for its own writes, which it reads from its own versions. Make sure this works with indexes, etc.
- Detail: Have to obey the "First Committer Wins" rule on any WW conflicts (including index updates).
- Implementation:Oracle keeps a "rollback segment" for getting old versions. First Committer rule enforced by write locks at update time, and waiters on the lock abort (Oracle's "serialization error") if the holder commits. This is really "First Updater Wins", since locking means that update order induces a commit order.
- Semantic problem: this is not really Serializable. Why? Hint: the basic problem is that reads all happen at start transaction, and writes at commit. With 2 interleaved transactions, this can lead to to violations of multi-object consistency rules (e.g. think about the constraint sum(checking,savings) > 0). Can even cause unserializable behavior exhibited by read-only transactions! This is poorly understood in the field. (IBM documents this problem whenever they get the chance!)
Cursor Stability. Designed to prevent lost updates. This is an operational definition that takes into account the interactive API to transactions. SQL provides a row-at-a-time API to in-flight queries, called "cursors". Cursor stability sets a read-lock on a row as long as the cursor is "pointing" to the row. Note that a user can open multiple cursors in the same transaction, and can thereby do operational concurrency control at application level (yuck!)

Adya, et al. : Generalized Isolation Levels

Gray et al's definitions (and the resulting ANSI standards) are not implementation-independent, and semantics are really only well-defined in the operational locking definition. Note the "dirty-data" descriptions above forbidding conflicts entirely, rather than describing bad interactions of conflicts. Seems too restrictive, no?

Goal: find implementation-independent semantic isolation levels which are as permissive as possible (most possible schedules allowed).

Key insight: many dependencies are multi-object. Capture those, and you'll get the right semantics. To be as permissive as we can, we talk about versions of objects coexisting (don't count on the Haerder/Reuter DIRECT setting.)

Conflicts in Adya's Serialization Graphs:

Read dependencies (WR):

Def'n: Ti changes the matches of Tj for Tj's predicate-based reads if Ti installs a new version that either adds to or deletes from one of Tj's read predicates.
Tj directly read depends on Ti if Ti directly installs some version that Tj subsequently reads (item-read-depends), or if Ti changes the matches of Tj.
A way to think about predicate-based reads or phantoms: imagine that every object is versioned, there are "ghost versions" of objects before they're born and after they die. Predicate-based reads look at all latest versions of all objects (including ghosts), and what matters is the set of objects that do or do not match. See example on page 7 of the paper for H_pred-read

Anti-dependencies (RW)

Def'n: Tj overwrites a predicate-based read by Ti if Tj installs a new version of an object in the read by Ti that changes the matches of Ti.
Tj directly anti-depends on Ti if Ti reads an object and Tj installs the next version of that object ("first invalidator"), or if Tj's install of a version changes the matches of a read by Ti.

Write dependencies (WW)

Tj directly write-depends on Ti if Ti installs a version of an object, and Tj installs the next version ("first invalidator"). (Note there's no predicate-based version of write dependencies, since database updates are read-predicate/write-tuple. What might it mean to write a predicate?).

Direct Serialization Graph:

nodes are committed xacts
edges are directed by time, labeled WR, RW, or WW.

Now we can talk about isolation in terms of serialization graphs and "histories" ("schedules"), NOT implementation.

Adya's Isolation Levels
Try to Generalize Gray's. PL-x = "Portable Level x".

PL-1: try to serialize based on writes alone (ignore reads) -- ensure that updates are not interleaved.

Specifically, no cycles containing only WW edges are allowed.
Note: more permissive than Gray's Degree 1: allows concurrenct xacts to modify the same object...just ensure no cycles.
But obvious locking implementation of PL-1: long write locks.

PL-2: avoid aborted reads

Specifically, no aborted reads, no intermediate reads, and no circular information flow (dependency-edge cycles in serialization graph)
Note: cascaded aborts and/or commit delays prevent aborted reads
Note: no intermediate reads means that committed xactions read only committed data
More permissive than Degree 2, allows reads from uncommitted xacts (final versions of eventually-committed xacts!)
Obvious locking implementation: long write locks, short read locks

PL-3: prevent xactions from committing if they perform inconsistent reads or writes

Specifically, do PL-2 AND no anti-dependency cycles
More permissive than Degree 3, since a modifying xact can update an object previously read by another uncommited xact
Obvious locking implementation: 2PL

PL-2.99: generalize "REPEATABLE READ"

REPEATABLE READ was long locks for everything except predicate reads (phantoms can happen)
PL-2.99 is PL-2 + no cycles with item-anti-dependency edges. Cycles involving predicate-read overwrites are OK.

Modeling Mixed-Mode Systems

Mixed serialization graph, only contains dependencies relevant to a transaction's level (or obligatory dependencies required by other transactions' modes).