Posts

OLTP Through the Looking Glass, and What We Found There

Image
This paper appeared in Sigmod 2008.  The goal of the paper is to rethink the Online Transaction Processing (OLTP) database architecture (which remained largely unchanged from late 1970s) and to test out a streamlined/stripped-out in-memory database architecture. To this end, the paper dissects where time is spent inside of a single node OLTP database system, and carefully measures the performance of various stripped down variants of that system. Motivation OLTP databases were optimized for the computer technology of the 1970s-80s.  Multi-threaded execution was adopted to allow other transactions to make progress while one transaction waits for data to be fetched from the slow, rotating disk drive. Locking-based concurrency control was adopted since conflicts were inevitable in these disk-bound transactions, and since the small dataset sizes and the workloads of the time resulted in extremely "hot" keys. The computing world now is quite a different environment. Fast SSDs and a

Chardonnay: Fast and General Datacenter Transactions for On-Disk Databases (OSDI'23)

Image
This paper appeared in OSDI'23. I haven't read the paper, and I am going by the two presentations I have seen of the paper, and by a quick skim of the paper. The paper has a cool idea, and I decided I should not procrastinate more waiting to read the paper in detail, and instead just capture and communicate this idea. Datacenter networking got very fast, much faster than SSD/disk I/O. That means, rather than two-phase commit (2PC), cold reads from SSD/disk is the dominant latency. Chardonnay banks on this trend in an interesting manner. It executes read/write transactions in "dry run" mode first, just to discover the readset and writeset and pin them to main memory. Then in "definitive" mode, it executes these transactions using two-phase-locking (2PL) and 2PC swiftly because everything (mostly) is in-memory. Having learned the readset and writeset during dry run also allows definitive execution achieve deadlock-freedom (and prevent aborts due to deadlock a

A critique of ANSI SQL Isolation layers (Transaction Processing Book followup)

Image
This paper is from Sigmod 1995. As the title says, it critiques the ANSI SQL 92 standard (book) with respect to problems in isolation layer definitions. This is also a good follow up to t he Transaction Book chapter 7 where we reviewed the isolation layers . In fact, here Gray (with co-authors) gets a chance to amend the content in the Transaction Book with recent developments and better understanding/mapping of isolation layers to the degrees 0-4. Snapshot isolation (SI) finally makes its debut here in Gray's writing. Why so late? As we discussed earlier in the 80s the database keys were hot, and people rightly ignored optimistic concurrency control as it was not suitable for those workloads. Moreover, memory space was very scarce commodity, and multi-version (as in snapshot storage) was also off the table. With its increased adoption (with MySQL's use of SI around the corner) we see coverage of snapshot isolation in Gray's writing.  The paper is very well written, and is

Implementation of Cluster-wide Logical Clock and Causal Consistency in MongoDB

Image
This paper (SIGMOD 2019) discusses the design of causal consistency and cluster-wide logical clock for MongoDB.  Causal consistency preserves Lamport's happened-before (transitive and partial) order of events in a distributed system. If an event A causes another event B, causal consistency ensures that every process in the system observes A before observing B. Causal consistency guarantees read-your-writes, write-follows-reads, monotonic reads, and monotonic writes. Causal consistency is the strictest consistency level where the system can still make progress (always-available one-way convergence) in the presence of partitions as discussed in  the "Consistency, Availability, and Convergence" paper . This makes causal-consistency a very attractive model for modern distributed applications. https://jepsen.io/consistency The paper provides a simple and balanced design for causal-consistency in MongoDB. The ideas are simple, yet the implementation and operationalizing (backw

Isolation Concepts (Chp. 7, part 2, Transaction processing book)

Image
Continuing with our Transaction Processing book reading , the second part of Chapter 7 covers degrees of isolation (and them co-existing), phantoms and predicate locks, how to implement predicate locking efficiently (via intent locks, and more exotic techniques), scheduling of locks, and dealing with deadlocks. I really liked these sections, as they contain many nice algorithmic ideas. The DB guys back then seemeed to have fun designing and then implementing these locking algebra/techniques. I miss the days where we could make up stuff/theory from first principles (because that is the "right" elegant thing to do)  and implement them with no fuss. Nowadays everything is too complicated, and we deal pieces in systems of systems where each things touches so many other that we are often zugzwanged out of innovation.  Below I borrow several paragraphs from the book to give a brief summary of the second part of Chapter 7.  Degrees of isolation Locking helps to provide the isolation

Isolation Concepts (Chp 7: Transaction processing book)

Image
Continuing with our Transaction book reading , this week we look at the first part of the Isolation concepts. The book gets conceptual to provide the theory behind isolation of transactions. Finally we are getting some protocol action.  The first part of the chapter covers the following: Introduction to Isolation, The Dependency Model of Isolation (Three Bad Dependencies), Isolation: The Application Programmer’s View, and Isolation Theorems. A couple things immediately pop out. Chapter 7 equates Serializability (SER) with Isolation, whereas today snapshot isolation (SI) is the most popular isolation level, and we have a bunch of other weaker isolation levels .   Secondly, Chapter 7 equates two-phase-locking (2PL) with the isolation implementation techniques, whereas today MVCC and OCC are also popular implementation techniques. Although optimistic concurrency control (OCC) was established by 1980, the book did not consider OCC mechanisms at all.  This is normal, when we consider the

Popular posts from this blog

Learning about distributed systems: where to start?

Hints for Distributed Systems Design

Foundational distributed systems papers

Metastable failures in the wild

The demise of coding is greatly exaggerated

Scalable OLTP in the Cloud: What’s the BIG DEAL?

The end of a myth: Distributed transactions can scale

SIGMOD panel: Future of Database System Architectures

Why I blog

There is plenty of room at the bottom