Cap Theorem

Introduction

The CAP Theorem, also known as Brewer’s theorem, states that a distributed data store cannot simultaneously guarantee all three of the following:

  1. Consistency

    In the context of the CAP Theorem, consistency means that every read operation in the system retrieves the most recent write or returns an error. Essentially, any data read reflects the most up-to-date information following a successful write operation.

  2. Availability

    Availability means every request receives a response, but there’s no guarantee that the response contains the most recent write.

  3. Partition Tolerance

    Partition tolerance in the CAP Theorem specifically refers to network partitions, where there’s a break or disruption in the communication channels between nodes in a distributed system. This means the system continues to operate despite an arbitrary number of messages being dropped or delayed by the network.

Note on Partition Tolerance: Network Partition vs. Database Partition

It’s important to note that in the context of the CAP Theorem, the term “partition” refers to network partitions, not database partitions. Network partitioning involves a break or loss in the network connectivity between nodes in a distributed system, leading to potential inconsistency or unavailability. In contrast, database partitioning means splitting a database into smaller, manageable pieces.

Understanding the Trade-Offs

cap theorem

  1. Consistency and Partition Tolerance at the Expense of Availability (CP)

    In systems that give priority to consistency and partition tolerance, there’s a potential trade-off in terms of availability. When a network partition occurs, ensuring that all nodes reflect the latest data (consistency) while still operating despite communication issues (partition tolerance) might mean that some nodes will not be available. In such cases, the system might choose to refuse to respond rather than return outdated or inconsistent data.

  2. Availability and Partition Tolerance at the Expense of Consistency (AP)

    When a system opts for availability and partition tolerance over consistency, it ensures that it remains operational and responsive to all requests, even during network partitions. The system remains fully functional, but the data it provides might not always be the latest, sacrificing consistency for continuous operation.

  3. Consistency and Availability, Assuming No Partition Tolerance (CA)

    This scenario, where a system aims for both consistency and availability without considering partition tolerance, is less common in practical applications. Most distributed systems must contend with the possibility of network partitions. However, theoretically, in a network environment where communication is always reliable and uninterrupted, a system could achieve both high consistency and high availability. In reality, this is a rare situation, as network reliability can seldom be guaranteed.