![](https://i.imgur.com/0RMdEGA.png)
- timestamp 18:44
- Amazon
- fork of MySQL or PostgreSQL
- custom storage on EBS
- Redis
- In memory
- single thread engine
![](https://i.imgur.com/irppMeM.png)
- Database Don't die
- Open Source like Spanner
- Distributed Database
- Decentralized Shared Nothing
- Log Structured Storage Architecture at individual nodes (RocksDB)
- Concurrency Control Model: Multi-Version OCC
- Serializable Isolation
![](https://i.imgur.com/vos3h5b.png)
- multi-layer
- RocksDB storage manager (low level)
- Raft - replication and consensus
- key-value api
- for pages to fetch data on
![](https://i.imgur.com/0WkNbaH.png)
- hybrid clock
- order transaction globally
- transaction stage intent check conflict commits (?)
![](https://i.imgur.com/ArGVHYQ.png)
- global keyspace
- leader
- instead of buffer manager -> disk (your classic single node database)
- get from distributed key value store system
- FoundationDB does the same thing (distributed database)
- engineering! done well, fault tolerant and all the edge cases!
![](https://i.imgur.com/Mgmhiw2.png)
- Cloud Spanner
- Google wrote BigTable in 2006
- give up SQL
- give up joins
- column-based database
- Adwords ran on sharded MySQL
- needed transactions
- Megastore
![](https://i.imgur.com/8BM7lRy.png)
- 2011
- geo-replicated
- schematized, semi-relational ?
- log structured on disk
- strict 2PC MVCC Multi-Paxos 2PC
- Paxos groups
- External Consistency
- global write_transaction synchronous replication
- lock-free read only transactions
![](https://i.imgur.com/C58TR9Q.png)
- joins are slow :'(
- have to go to another node/table
- interleave
- single page efficient physical denormalization
![](https://i.imgur.com/DzRstdP.png)
- wound-wait deadlock prevention
- don't need deadlock detection
- ordering throuhg unique timestamps from atomic clocks and GPS devices
- tablets (shards)
- paxos - elect leader in tablet group
- 2PC for txn spanning tablets
![](https://i.imgur.com/SX93b3e.png)
![](https://i.imgur.com/kLZSk2V.png)
- completely physical wall-clock time
- necessary for linearizability
- paxos group decide order transaction commit
![](https://i.imgur.com/9EZ7SLp.png)
![](https://i.imgur.com/gBcHHp7.png)
![](https://i.imgur.com/qkTED0U.png)
- wait long enough
- then can commit
- at commit + release locks
![](https://i.imgur.com/GGSJxuz.png)
- F1 has OCC
- now SQL
- Built for the ad system
- Spanner SQL
![](https://i.imgur.com/doKKL86.png)
- good benchmarks but expensive!
![](https://i.imgur.com/T198LV5.png)
![](https://i.imgur.com/22mgBsg.png)
- shared nothing document database (2007)
- json document
- now can do multi-action transactions
![](https://i.imgur.com/LbdE4yT.png)
![](https://i.imgur.com/F3mnHa1.png)
- instead embed single document
- pre join
![](https://i.imgur.com/DVscSti.png)
![](https://i.imgur.com/LzJMife.png)
- no query optimizer
- instead generate a bunch of query plans
- execute all of them!
- whichever is fastest return
![](https://i.imgur.com/OMDVka0.png)
- shared nothing architecture
- master slave replication
- auto-sharding
- partition by hash or range
- automaticly split if a shard is too big
- startups liked this, oh if I grow, it'll shard automaticly
![](https://i.imgur.com/IME2hTN.png)
![](https://i.imgur.com/1yWouuR.png)
mmap
- OS storage manage
- changed
- storage backend is WiredTiger but could replace with RocksDB
![](https://i.imgur.com/HuiRB7e.png)
- avoid premature optimizations!
- based on months in the future
- MySQL and Postgres should be fine for most cases
- If need to ACTUALLY scale, you have money - instead find and pay smart people to help