Notes from Chapter 8 of Martin Kleppmann’s ‘Designing Data-Intensive Applications’ book
Various challenges in distributed systems, such as unreliable networks, clocks, and nodes, as well as processing pauses.
Unlike single-node programs, distributed systems rely on message passing through an unreliable network with variable delays.
Consequently, a node in a distributed system cannot be certain of anything; it can only make educated guesses based on received or missed messages. Achieving consensus is essential.
Defining Truth Through Consensus
In distributed systems, a node cannot trust its own assessment completely.
It might mistakenly believe it is the leader while others have chosen a new leader or think it is operational while others have marked it as dead. Hence, many distributed algorithms use a quorum, reducing reliance on a single node.
quorum:
A minimum number of votes from several nodes—to make decisions
Typically, a quorum requires a majority, ensuring only one majority exists at any given time.
DDIA CH5: Replication
Understanding Quorum-Based Systems
Leadership and Exclusivity
Need to ensure that certain roles or resources are not duplicated:
- Only one node should act as the leader for a given database partition
- Only one transaction should lock a specific resource at any time.
- Unique usernames must be registered by only one user.
These systems are designed to handle scenarios where a node erroneously believes it is in charge (a split-brain situation), which could otherwise lead to inconsistencies.
Above figure shows that issue arises when a client obtains a lease but experiences a prolonged pause due to garbage collection (GC), causing the lease to expire. During this time, another client acquires the lease and writes to the same file.
When the original client resumes from the GC pause, it assumes it still holds a valid lease and proceeds to write to the same file, resulting in data corruption.
In essence:
the problem occurs when a client’s lease expires unexpectedly due to a long GC pause, allowing another client to acquire the lease and modify the same resource, leading to inconsistent data when the original client resumes and writes to the same resource under the assumption of a valid lease.
Fencing Tokens
To manage conflicts, especially where nodes may operate under false beliefs, systems use fencing tokens.
When a lock or lease is granted, a numerically increasing fencing token is also issued. This token must accompany any subsequent write requests to the storage service, allowing the lock server to validate requests and reject any that are outdated, ensuring operations align with the current system state.
Services like ZooKeeper utilize transaction IDs or node version numbers as fencing tokens to maintain order and consistency.
Handling Byzantine Faults
While fencing tokens address errors from non-malicious nodes, they cannot prevent intentional faults by nodes acting maliciously or under compromised conditions, known as Byzantine faults.
Systems must be Byzantine Fault Tolerant to operate correctly even if some nodes fail to follow protocol or external interference occurs. This is crucial in environments like aerospace, where data corruption could lead to catastrophic failures, or in decentralized networks like Bitcoin, where trust among participants cannot be assumed.
Summary
Distributed systems face challenges such as unreliable networks, faulty nodes, and GC Pause. Nodes cannot be certain of their state and must rely on consensus, often achieved through a quorum.
Leadership and resource locking require mechanisms like fencing tokens to prevent conflicts.
Byzantine faults pose significant risks, requiring Byzantine Fault Tolerant systems for resilience.
Different timing and failure models help in understanding and designing robust systems. Correctness in distributed algorithms hinges on maintaining safety (nothing bad happens) and liveness (something good eventually happens) properties.
While theoretical models provide a foundation, practical implementations must also account for real-world complexities through empirical testing.