Extra 3: Dynamo Amazon’s Highly Available Key-value Store

  • problem: reads scale with read replicates but writes too much overhead
    • another problem was it doesn't have auto-scaling
    • wanted to do data modeling of shopping cart
      • as a key-value operation

  • app at top
    • key value operations
    • shared (A-F, G-K, etc)

  • problem: wanted partitioning
    • solution/technique: consistent hashing
  • prob: high availability for writes
    • sol: vector clocks with conflict resolution on read
  • prob: handling failures (temporary)
    • sol: hinted handoff
  • prob: recovering from failures (permanent)
    • sol: anti-entropy with merkle-trees
  • prob: membership and failure dectection
    • sol: gossip-bsaed protocol

  • partitioning with consistent hashing
    • instead of just choosing a server from N server with hash(k) mod N
      • have a ring [0, L]
      • hash(k) mod L
      • distribute N servers randomly among the L ring and handle all key calls from ring section of previous node to current node

  • need a quorum to consider a read or write successful

  • vector clocks
    • conflict if servers think have different items
    • consistenty by sending messages as vectors
      • there is a property to timely order operations and get servers to be consistent

  • handling failures
    • temporary: hinted handoff, gives it to the next node in the consistent hash
    • permanent: sync using merkle tree
      • periodicly exchange key ranges
      • just pass root of tree instead of whole ring
        • if inconsistent, pass the children and repeat until find when inconsistent
        • pass only a logorithmic size of data and only the root in best case

  • membership and failure decection
    • could send heart beats
      • could have single node to handle listening like zookeeper
        • single point of failure
      • instead just gossip it and in logorithmic time, realize can identify node is dead