I firmly believe Formal Method advocates should aim for a more pragmatic approach and “play nice” with compilers and optimizers and Design by Contract and Unit Testing.

Let’s face it, writing a full postcondition is just plain a lot harder than writing the code itself.

However, writing a perhaps partial postcondition for a few compelling concrete test cases is usually pretty easy.

Instead of aiming for “proving” the programmer right or wrong, if they aimed instead for assisting the programmer as much as they can.

If formal methods acted like warnings on steroids (but with no false positives) and a boost for optimizers and a power up boost for internal precondition checks….

Let’s face it, writing a full postcondition is just plain a lot harder than writing the code itself.

That’s not at all obviously true! We have many examples where it is false, and it’s probably false in general.

An example of this is are problems in NP. Problems that can be verified in polynomial time but whose solutions cannot be generated in polynomial time (assuming P!=NP, but in any case, even if P=NP the verification is still likely to be far easier).

Problems to do with concurrency are another example. There are models that are very simple to specify, but they result in thousands of tricky edge cases that need to be handled perfectly and that humans just don’t come up with on their own.

The real problem is that current tooling makes specifying models far too difficult. And even worse, they make knowing if you’ve specified the right mode quite tricky. This is probably a combination of having immature tools and having chosen the wrong formalism for expressing these models. A lot of logics look great at first but their semantics, particularly when it comes to reasoning across time, turn to be really tricky. We have some glimpses of how this can work with languages like Haskell or Isabelle in domains that can be modeled well and it’s beautiful. But there’s a long road from here to a point where we can express these postconditions in a more natural way and still reason about them automatically.

You’re kinda illustrating @JohnCarter’s point here.

I’m not working over in CSAIL, but here in startupcanistan even having basic tooling around post-conditions in design-by-contract for things like “this function will never go negative when calculating a sum of credits”, “this function will always return a smaller value towards zero than its inputs”, “this function will return a URL that doesn’t have a query string with such-and-such values in it” would still be super helpful.

We’re trying to make a cup of tea, not boil the ocean. Until formal methods folks realize this, their entire field is of limited utility to practical, real-world software engineering outside of a handful of cases (say, writing a database engine).

I’m going to recommend this great post about combining property based testing with contracts. If you can add the assertions for your pre and post conditions, through something like an assert statement in your language or some library support, you can use property based testing to at least generate a lot of inputs. And that’s more likely to find assertion failures than hand-writing test cases.

That’s one of the lightest-weight approaches for adding formal methods concepts to a real word project.

Yes, actually from pre/post-conditions you can get 3 things:

Support for generating random tests, similar to QuickCheck

Verification at compile-time, like Dafny

Fail-fast behavior at runtime, like Eiffel

The third item is really pragmatic and underappreciated. At runtime, a failed precondition should prevent a function/method from running. A failed postcondition should prevent the function from returning or writing its side effects. A huge class of software errors would disappear if software was written like this.

Probably, you can also get a fourth one, better static analyses (without going into full verification) and a fifth one (more efficient code, as discussed elsewhere in this thread).

It is unsurprising design-by-contract is so thorough as it is actually refinement types under disguise.

Practically, most solvers for such problems include the verifier as part of the solver. So the solver tends to be (much) more complex. But yes, that’s just a rough example.

I totally agree. My personal interest is in how to bring formal methods to work, in whatever package is the most useful. There’s a term called “lightweight formal methods” which is similar in philosophy to what you’re talking about. Basically, take the ideas from formal methods, like invariants / properties / formal logic, but use them in a way that doesn’t require full math proof.

I think Amazon’s approach here is really promising. They took the theorems that would be proven in a verification effort, like refinement of a model, and instead used property-based testing to check for it. So the property isn’t proven, but they have some amount of confidence in it. They also found 16 bugs this way, before shipping. And isn’t that the end goal?

So yea. I think coming up with the right package of a lightweight, practical way to bring formal methods thinking to work is something that there’s demand for.

This is an interesting point. Do we need mathematical proof? Or can we use scientific proof? Mathematical proof’s epistemology for this domain is fairly straightforward (classical logic for constructive proofs over finite problems). Scientific proof is epistemologically far more fraught…but often a lot easier to produce since it is subject to further refutation.

Yes definitely - tests are really a scientific / empirical activity. They are about observing what really happens, not just what the logical model says should happen. I would say the main difference between testing in software and actual scientific experiments though is that we don’t know how to talk about the statistical significance of individual test cases. At least I don’t. I’ve heard that the Cleanroom approach has some notion of statistics, but I’m honestly not familiar with it in detail.

As far as which one is appropriate for software development. It could be contextual, but I pretty much solely focus on “business software,” or “regular” software, and for that I would say math proof is almost never necessary. It could be desired in certain cases, but even then I would only prove things about a model or a broad algorithm in general. Proofs at the implementation level are just extremely time consuming, even with things like proof assistants.

So as a broad strategy, I think we (as an industry) have to get better at elevating testing to a point where we statistically know how effective a test suite is. And no, branch coverage is not what I’m talking about, since input data is really the coverage that we should really care about.

I would say the main difference between testing in software and actual scientific experiments though is that we don’t know how to talk about the statistical significance of individual test cases.

Statistical significance isn’t really the issue in most software tests because we set them up to be deterministic. Hypothesis testing is a means of working with randomness in observations. The issue is the selection of observations and test cases. This is true in biology or in software testing. Hypothesis testing is a tool for doing an individual observation, but the aggregate set of observations to run (the research program, more or less) doesn’t depend on that. I’ve written some stuff about this.

I would say math proof is almost never necessary.

I disagree. We depend on proofs heavily on a regular basis. Type systems are the most common example. But setting up a system where you control the flow of information to make certain states impossible is another. These aren’t comprehensive proofs, and they are often implicit, but they are properties we prove mathematically.

branch coverage is not what I’m talking about, since input data is really the coverage that we should really care about.

Mutation testing is the most defensible approach to coverage that I’ve seen: if you change the program, does the test suite start throwing errors. I also like property based testing because it gets humans out of the business of selecting test data individually.

I also like property based testing because it gets humans out of the business of selecting test data individually.

Unfortunately in practice I’ve found you have to spend a lot of time getting the data generator just right to cover an interesting enough set of inputs.

Re math proofs and necessity - I was moreso talking about a full proof for the functional correctness of an entire application. I think that is overkill in most cases. Not proofs in general, especially about smaller algorithms.

I firmly believe Formal Method advocates should aim for a more pragmatic approach and “play nice” with compilers and optimizers and Design by Contract and Unit Testing.

Let’s face it, writing a full postcondition is just plain a lot harder than writing the code itself.

However, writing a perhaps partial postcondition for a few compelling concrete test cases is usually pretty easy.

Instead of aiming for “proving” the programmer right or wrong, if they aimed instead for assisting the programmer as much as they can.

If formal methods acted like warnings on steroids (but with no false positives) and a boost for optimizers and a power up boost for internal precondition checks….

I can see a lot more adoption.

That’s not at all obviously true! We have many examples where it is false, and it’s probably false in general.

An example of this is are problems in NP. Problems that can be verified in polynomial time but whose solutions cannot be generated in polynomial time (assuming P!=NP, but in any case, even if P=NP the verification is still likely to be far easier).

Problems to do with concurrency are another example. There are models that are very simple to specify, but they result in thousands of tricky edge cases that need to be handled perfectly and that humans just don’t come up with on their own.

The real problem is that current tooling makes specifying models far too difficult. And even worse, they make knowing if you’ve specified the right mode quite tricky. This is probably a combination of having immature tools and having chosen the wrong formalism for expressing these models. A lot of logics look great at first but their semantics, particularly when it comes to reasoning across time, turn to be really tricky. We have some glimpses of how this can work with languages like Haskell or Isabelle in domains that can be modeled well and it’s beautiful. But there’s a long road from here to a point where we can express these postconditions in a more natural way and still reason about them automatically.

You’re kinda illustrating @JohnCarter’s point here.

I’m not working over in CSAIL, but here in startupcanistan even having basic tooling around post-conditions in design-by-contract for things like “this function will never go negative when calculating a sum of credits”, “this function will always return a smaller value towards zero than its inputs”, “this function will return a URL that doesn’t have a query string with such-and-such values in it” would still be super helpful.

We’re trying to make a cup of tea, not boil the ocean. Until formal methods folks realize this, their entire field is of limited utility to practical, real-world software engineering outside of a handful of cases (say, writing a database engine).

I’m going to recommend this great post about combining property based testing with contracts. If you can add the assertions for your pre and post conditions, through something like an assert statement in your language or some library support, you can use property based testing to at least generate a lot of inputs. And that’s more likely to find assertion failures than hand-writing test cases.

That’s one of the lightest-weight approaches for adding formal methods concepts to a real word project.

Yes, actually from pre/post-conditions you can get 3 things:

The third item is really pragmatic and underappreciated. At runtime, a failed precondition should prevent a function/method from running. A failed postcondition should prevent the function from returning or writing its side effects. A

hugeclass of software errors would disappear if software was written like this.Probably, you can also get a fourth one, better static analyses (without going into full verification) and a fifth one (more efficient code, as discussed elsewhere in this thread).

It is unsurprising design-by-contract is so thorough as it is actually refinement types under disguise.

And refinement types are Hoare logic in disguise :)

True! :)

Assuming P!=NP says nothing about the relative difficulty of

writingthe solver vs verifier, though. Or are you speaking metaphorically?Practically, most solvers for such problems include the verifier as part of the solver. So the solver tends to be (much) more complex. But yes, that’s just a rough example.

Really? Can I see those examples? I’d be really curious to some for something like https://github.com/seL4/seL4/blob/master/src/string.c

But I couldn’t see them from a quick dig around the repo.

I totally agree. My personal interest is in how to bring formal methods to work, in whatever package is the most useful. There’s a term called “lightweight formal methods” which is similar in philosophy to what you’re talking about. Basically, take the ideas from formal methods, like invariants / properties / formal logic, but use them in a way that doesn’t require full math proof.

I think Amazon’s approach here is really promising. They took the theorems that would be proven in a verification effort, like refinement of a model, and instead used property-based testing to check for it. So the property isn’t

proven, but they have some amount of confidence in it. They also found 16 bugs this way, before shipping. And isn’t that the end goal?So yea. I think coming up with the right package of a lightweight, practical way to bring formal methods thinking to work is something that there’s demand for.

This is an interesting point. Do we need mathematical proof? Or can we use scientific proof? Mathematical proof’s epistemology for this domain is fairly straightforward (classical logic for constructive proofs over finite problems). Scientific proof is epistemologically far more fraught…but often a lot easier to produce since it is subject to further refutation.

Yes definitely - tests are really a scientific / empirical activity. They are about observing

what really happens, not just what the logical model says should happen. I would say the main difference between testing in software and actual scientific experiments though is that we don’t know how to talk about the statistical significance of individual test cases. At least I don’t. I’ve heard that the Cleanroom approach has some notion of statistics, but I’m honestly not familiar with it in detail.As far as which one is appropriate for software development. It could be contextual, but I pretty much solely focus on “business software,” or “regular” software, and for that I would say math proof is almost

nevernecessary. It could be desired in certain cases, but even then I would only prove things about a model or a broad algorithm in general. Proofs at the implementation level are just extremely time consuming, even with things like proof assistants.So as a broad strategy, I think we (as an industry) have to get better at elevating testing to a point where we statistically know how effective a test suite is. And no, branch coverage is not what I’m talking about, since input data is really the coverage that we should really care about.

Statistical significance isn’t really the issue in most software tests because we set them up to be deterministic. Hypothesis testing is a means of working with randomness in observations. The issue is the selection of observations and test cases. This is true in biology or in software testing. Hypothesis testing is a tool for doing an individual observation, but the aggregate set of observations to run (the research program, more or less) doesn’t depend on that. I’ve written some stuff about this.

I disagree. We depend on proofs heavily on a regular basis. Type systems are the most common example. But setting up a system where you control the flow of information to make certain states impossible is another. These aren’t comprehensive proofs, and they are often implicit, but they are properties we prove mathematically.

Mutation testing is the most defensible approach to coverage that I’ve seen: if you change the program, does the test suite start throwing errors. I also like property based testing because it gets humans out of the business of selecting test data individually.

Unfortunately in practice I’ve found you have to spend a lot of time getting the data generator

just rightto cover an interesting enough set of inputs.You have probably used it far more in-depth than I have. I was mostly using it to generate fairly straightforward things like lists of a type.

Re math proofs and necessity - I was moreso talking about a full proof for the functional correctness of an entire application. I think that is overkill in most cases. Not proofs in general, especially about smaller algorithms.