![](https://i.imgur.com/FBFORS5.png)
- GFS published in 2003, MapReduce in 2004, Hadoop/HDFS in 2006
- performance - sharding
- fault - tolerance
- tolerance - replication
- replication - inconsistency
- consistency - low performance
- strong vs weak conssitency
![](https://i.imgur.com/K14XRIH.png)
![](https://i.imgur.com/2kxfZ2a.png)
![](https://i.imgur.com/Rr85JpU.png)
- big, fast
- global
- sharding
- automatic recovery
- single data center (really :O)
- internal use
- big sequential accces (not random)
- single master!
- map reduce has as single master too, but failure is so unlikely its fine to rerun all operations
![](https://i.imgur.com/eRHLlmG.png)
- master data
- file name
- __ handles__
- list of chunk server (cs)
- primary version number (v)
- lease expiration
- LOG, CHECKPOINT, DISK
- append to log efficiently
![](https://i.imgur.com/VoCdfqS.png)
- READ
- name of master
- master ot list of servers
- gets chunk server which sends data back
![](https://i.imgur.com/e7lPjHq.png)
- WRITE
- no primary - on master
- find up to date replicas
- pick p, s
- increment version #
- problem of split brain
- network partition
- give a primary a lease (has a timer)
- primary know who has the lease and can wait for it to expire
![](https://i.imgur.com/Py5Wun7.png)
- these are secondaries
- mostly appends
- ask if they can do it
- only write if they promise they can
- what if primary crashes