- problem: reads scale with read replicates but writes too much overhead
- another problem was it doesn't have auto-scaling
- wanted to do data modeling of shopping cart
- app at top
- key value operations
- shared (A-F, G-K, etc)
- problem: wanted partitioning
- solution/technique: consistent hashing
- prob: high availability for writes
- sol: vector clocks with conflict resolution on read
- prob: handling failures (temporary)
- prob: recovering from failures (permanent)
- sol: anti-entropy with merkle-trees
- prob: membership and failure dectection
- sol: gossip-bsaed protocol
- partitioning with consistent hashing
- instead of just choosing a server from N server with hash(k) mod N
- have a ring [0, L]
- hash(k) mod L
- distribute N servers randomly among the L ring and handle all key calls from ring section of previous node to current node
- need a quorum to consider a read or write successful
- vector clocks
- conflict if servers think have different items
- consistenty by sending messages as vectors
- there is a property to timely order operations and get servers to be consistent
- handling failures
- temporary: hinted handoff, gives it to the next node in the consistent hash
- permanent: sync using merkle tree
- periodicly exchange key ranges
- just pass root of tree instead of whole ring
- if inconsistent, pass the children and repeat until find when inconsistent
- pass only a logorithmic size of data and only the root in best case
- membership and failure decection
- could send heart beats
- could have single node to handle listening like zookeeper
- instead just gossip it and in logorithmic time, realize can identify node is dead