Extra 12: Introduction to Apache Cassandra

  • Cassandra was a blend of distributed features of Dynamo and Data Model of BigTable

  • Shared nothing, add, remove as needed

  • Data written to commit log

  • Data ritten to memtable

  • Server acknowledges to client

  • Memtable flushed to disk
    • to SSTable
    • In Sequential Write
    • Really good for time-series

  • How to Deal with duplicates?

  • Compaction of Sequential Append only SSTables
    • With Merge Sort

  • Compacted file written

  • Clean up old files

  • Partitioning wiht primary key

  • With Key hashing

  • ring

  • Replication factor = 3?

  • Can do virtual nodes

  • Coordinated reads

  • Consistency Level
    • Quorum is when more than 51% replicas acknowledge data read
  • ALL - All replicas ack - full consistency - usually not necessary

  • Quorum and Avalability

  • Rapid Read Protection

  • CQL Cassandra Query Language

  • Inserts will always overwrite