The first row means: “One web-server connected to one SQL database” is a CA system, not available during partitions.
The author is very confusing. CA systems do not exist. I think they are trying to make some meaningful distinction but CAP is really not expressive enough for that, instead PACELC might be a better set of letters to work with.
I agree about PACELC, and I think that’s close to the author’s point. There’s a lot of confusion coming from CAP’s particular version of A, which is very useful for some system uses, but is frequently misunderstood.
Both PACELC and yield/harvest are “beyond CAP”:
You want to say that you chose eventual consistency for another reason than cap-availability? That’s PACELC
You want to define a system with reduced availability? That’s yield/harvest.
I mention them in the conclusion of the post, but not inside the post because the post is just about CAP.
Could you elaborate? I don’t think you’re right. Your main contention seems to be that you can’t give up partition tolerance, but his claim is that if you don’t have any replicas, you have both consistency and availability, but not partition tolerance, which seems reasonable. It’s just not distributed.
In particular, Jay Kreps mentioned this post as being high quality on twitter, and I trust him to have a pretty good distributed systems horse sense.
CAP applies to systems not a single piece of data, which is what harvest and yield are about. In the case of failure not all of your data is available but the system might be. And depending on the semantics of the system, that might be just fine.
When the author says there is no such thing as an AP big data store, that is simply false. I have worked on systems that are AP and big data. It worked because missing some data was ok it just meant the quality of a decision during a failure was degraded.
there is no such thing as an AP big data store
I have worked on systems that are AP and big data
It’s an interesting point. At the data store level, saying ‘I don’t know if I have or had this data’ is considered as an error (excepted for some crazy definitions of datastore: but an eventually consistent store does not allow this). Then an application can perfectly encapsulate this. It depends on the application, but this application by itself is not itself a data store (but yes, it is a big data application).
As far as I can tell, both Harvest and Yield are in terms of both data and entire systems. They phrase themselves in success rate, which can be considered an “entire system” idea, but clearly the success rate is bounded by availability of underlying data.
My understanding is that strict AP would mean that even in a partition, you have access to all of the data in a system, unless all of the replicas are down. Instead, the system you’re describing sounds closer to neither strictly available nor strictly consistent.
I know of nobody who believes availability is defined as that and I have not had the idea of “strict availability” come up in discussions. And it clearly is not useful definition after a few seconds thought. People care about systems as a whole.
High availability
is assumed to be provided through redundancy, e.g.
data replication; data is considered highly available if a
given consumer of the data can always reach some replica.
And
AP without C: HTTP Web caching provides clientserver
partition resilience by replicating documents,
but a client-server partition prevents verification of the
freshness of an expired replica. In general, any distributed
database problem can be solved with either
expiration-based caching to get AP, or replicas and majority
voting to get PC (the minority is unavailable)
Cap Theorem:
For a distributed system to be continuously available, every request received
by a non-failing node in the system must result in a response. That is, any
algorithm used by the service must eventually terminate. In some ways
this is a weak definition of availability: it puts no bound on how long the
algorithm may run before terminating, and therefore allows unbounded computation.
On the other hand, when qualified by the need for partition tolerance,
this can be seen as a strong definition of availability: even when severe
network failures occur, every request must terminate.
And
It is possible to provide high availability and partition tolerance, if atomic
consistency is not required. If there are no consistency requirements, the
service can trivially return v0, the initial value, in response to every request.
However it is possible to provide weakened consistency in an available, partition
tolerant setting. Web caches are one example of a weakly consistent
network.
It’s fine if you don’t think that reasoning about distributed systems in this way is useful–Brewer says much the same in Harvest and Yield, which is why presented what he considered a more useful metric than strict availability.
Great quotes. The theme I see through them is that availability is a spectrum and you can pick spots on the spectrum that make sense to you, which is what I got out of harvest and yield. The CAP Theorem quote you have just says you need an answer, not what the quality of that answer has. I do not believe the article tells that story.
To be fair, the harvest and yield paper’s treatment of CAP isn’t any better. For example, it says:
CA without P: Databases that provide distributed transactional semantics can only do so in the absence of a network partition separating server peers.
Even in context of the paper, that’s misleading. The point that the post author is making actually turns out to be a very similar one to the harvest and yield paper. It’s also expressed in a way that’s tricky to follow, but it’s a subtle topic and hard to write about well.
The biggest issue I have with the article is that it conflates the availability of a whole system with that of parts of the system. You are right that it is a subtle topic, but I believe this article adds nothing positive to the discussion and even adds some misinformation.
Sorry about that. The main point of this post is to look at the definitions of availability to show that there are different definitions: A system can be highly-available but not available by the definition of CAP. And then showing how it propagates to common systems. This said, the post is difficult to read if you have not seen previously the CAP theorem proof. If you want to try again :-), you can have a look at the first post of the series. Especially, the post http://blog.thislongrun.com/2015/03/comparing-eventually-consistent-and-cp_11.html introduces some of the definitions.
The author is very confusing. CA systems do not exist. I think they are trying to make some meaningful distinction but CAP is really not expressive enough for that, instead PACELC might be a better set of letters to work with.
I agree about PACELC, and I think that’s close to the author’s point. There’s a lot of confusion coming from CAP’s particular version of A, which is very useful for some system uses, but is frequently misunderstood.
I wrote about it here: http://brooker.co.za/blog/2014/07/16/pacelc.html
Both PACELC and yield/harvest are “beyond CAP”: You want to say that you chose eventual consistency for another reason than cap-availability? That’s PACELC You want to define a system with reduced availability? That’s yield/harvest.
I mention them in the conclusion of the post, but not inside the post because the post is just about CAP.
This post is of low-quality. The author doesn’t seem to have really grasped the Harvest and Yield paper.
Could you elaborate? I don’t think you’re right. Your main contention seems to be that you can’t give up partition tolerance, but his claim is that if you don’t have any replicas, you have both consistency and availability, but not partition tolerance, which seems reasonable. It’s just not distributed.
In particular, Jay Kreps mentioned this post as being high quality on twitter, and I trust him to have a pretty good distributed systems horse sense.
CAP applies to systems not a single piece of data, which is what harvest and yield are about. In the case of failure not all of your data is available but the system might be. And depending on the semantics of the system, that might be just fine.
When the author says there is no such thing as an AP big data store, that is simply false. I have worked on systems that are AP and big data. It worked because missing some data was ok it just meant the quality of a decision during a failure was degraded.
As far as I can tell, both Harvest and Yield are in terms of both data and entire systems. They phrase themselves in success rate, which can be considered an “entire system” idea, but clearly the success rate is bounded by availability of underlying data.
My understanding is that strict AP would mean that even in a partition, you have access to all of the data in a system, unless all of the replicas are down. Instead, the system you’re describing sounds closer to neither strictly available nor strictly consistent.
I know of nobody who believes availability is defined as that and I have not had the idea of “strict availability” come up in discussions. And it clearly is not useful definition after a few seconds thought. People care about systems as a whole.
In particular, Harvest and Yield and the CAP theorem paper both define availability that way.
Harvest and Yield:
And
Cap Theorem:
And
It’s fine if you don’t think that reasoning about distributed systems in this way is useful–Brewer says much the same in Harvest and Yield, which is why presented what he considered a more useful metric than strict availability.
Great quotes. The theme I see through them is that availability is a spectrum and you can pick spots on the spectrum that make sense to you, which is what I got out of harvest and yield. The CAP Theorem quote you have just says you need an answer, not what the quality of that answer has. I do not believe the article tells that story.
To be fair, the harvest and yield paper’s treatment of CAP isn’t any better. For example, it says:
Even in context of the paper, that’s misleading. The point that the post author is making actually turns out to be a very similar one to the harvest and yield paper. It’s also expressed in a way that’s tricky to follow, but it’s a subtle topic and hard to write about well.
The biggest issue I have with the article is that it conflates the availability of a whole system with that of parts of the system. You are right that it is a subtle topic, but I believe this article adds nothing positive to the discussion and even adds some misinformation.
I’m doing a lot of head scratching. Maybe there’s an idea here that I’m not getting but its really not laid out well.
Sorry about that. The main point of this post is to look at the definitions of availability to show that there are different definitions: A system can be highly-available but not available by the definition of CAP. And then showing how it propagates to common systems. This said, the post is difficult to read if you have not seen previously the CAP theorem proof. If you want to try again :-), you can have a look at the first post of the series. Especially, the post http://blog.thislongrun.com/2015/03/comparing-eventually-consistent-and-cp_11.html introduces some of the definitions.