Extra 7: CockroachDB, Spanner, MongoDB

  • timestamp 18:44
  • Amazon
    • fork of MySQL or PostgreSQL
    • custom storage on EBS
      • transaction level on EBS
  • Redis
    • In memory
    • single thread engine

CockroachDB

  • Database Don't die
  • Open Source like Spanner
  • Distributed Database
    • Decentralized Shared Nothing
    • Log Structured Storage Architecture at individual nodes (RocksDB)
    • Concurrency Control Model: Multi-Version OCC
    • Serializable Isolation

  • multi-layer
    • RocksDB storage manager (low level)
    • Raft - replication and consensus
    • key-value api
      • for pages to fetch data on

  • hybrid clock
    • order transaction globally
  • transaction stage intent check conflict commits (?)

  • global keyspace
    • key -> data
  • leader
    • raft replication
  • instead of buffer manager -> disk (your classic single node database)
    • get from distributed key value store system
  • FoundationDB does the same thing (distributed database)
  • engineering! done well, fault tolerant and all the edge cases!

Spanner

  • Cloud Spanner
  • Google wrote BigTable in 2006
    • give up SQL
    • give up joins
    • column-based database
  • Adwords ran on sharded MySQL
    • needed transactions
    • Megastore

  • 2011
  • geo-replicated
  • schematized, semi-relational ?
  • log structured on disk
  • strict 2PC MVCC Multi-Paxos 2PC
    • Paxos groups
    • External Consistency
      • global write_transaction synchronous replication
    • lock-free read only transactions

  • joins are slow :'(
    • have to go to another node/table
  • interleave
    • single page efficient physical denormalization

  • wound-wait deadlock prevention
    • don't need deadlock detection
    • ordering throuhg unique timestamps from atomic clocks and GPS devices
  • tablets (shards)
    • paxos - elect leader in tablet group
    • 2PC for txn spanning tablets

  • completely physical wall-clock time
    • necessary for linearizability
    • paxos group decide order transaction commit

  • internal Google API
    • time range

  • wait long enough
    • then can commit
      • at commit + release locks

  • F1 has OCC
    • now SQL
    • Built for the ad system
  • Spanner SQL
    • for everyone else

  • good benchmarks but expensive!

MongoDB

  • shared nothing document database (2007)
    • json document
    • now can do multi-action transactions

  • don't join

  • instead embed single document
  • pre join

  • the json example

  • no query optimizer
    • just heuristics
  • instead generate a bunch of query plans
    • execute all of them!
      • whichever is fastest return

  • shared nothing architecture
  • master slave replication
  • auto-sharding
    • partition by hash or range
    • automaticly split if a shard is too big
  • startups liked this, oh if I grow, it'll shard automaticly

  • mmap - OS storage manage
    • changed
    • storage backend is WiredTiger but could replace with RocksDB

  • avoid premature optimizations!
    • based on months in the future
  • MySQL and Postgres should be fine for most cases
  • If need to ACTUALLY scale, you have money - instead find and pay smart people to help