Formal methods and the future of programming

(blog.janestreet.com)

127 points | by eatonphil 8 hours ago

16 comments

  • Animats 2 hours ago
    I used to do proof of correctness work, decades ago.[1] We had more proof automation than many of the later systems. The easy stuff was solved by the first SAT solver, the Oppen-Nelson simplifier. The harder stuff used the Boyer-Moore prover, which uses heuristics and previous lemmas to guide the theorem prover. The Boyer-Moore prover had to be helped along by suggesting lemmas, which it would try to prove and which would then be used for later proofs. That was the tough human job.

    Later systems seemed to have less automation. What tended to go wrong with formal methods was that the people doing them were too into the formalism. I was working on this for a commercial project that wanted bug-free code, not as an academic. The academic projects tended to get too clever. They had the mathematician's bias of wanting a terse notation and not much case analysis. That's not a good solution. You want lots of automated grinding and don't want to need much insight. The clever people kept inventing new logics - modal logic, temporal logic, etc, - to avoid verbosity in paper and pencil proof work. That wasn't really all that helpful. The basic truth of this business is that most of the theorems are rather banal.

    As the Jane Street people point out, there's a big advantage in having control of the language. You want the verification statements integrated into the programming language. In many systems, they're embedded in comments, in a different syntax than the programming language, or even in completely separate files. This adds unnecessary work. It's good to see them pushing this forward.

    We were doing this too early, over 40 years ago. It took about 45 minutes of compute back then to build up number theory from a cold start with the Boyer-Moore prover. Now it takes less than a second.

    [1] https://archive.org/details/manualzilla-id-5928072/page/n3/m...

  • jdw64 5 hours ago
    In other words, because GEN AI a lot of code, the idea is to shift human value toward verification. Sometimes I think about what programming really is. In fact, learning programming itself is a huge challenge for a non English speaker like me. I have to rely on machine translation to understand English documents that have no translation. The materials in my language are about 5 to 6 years behind.

    Now, since it's impossible to code review the tens of thousands of lines of code that AI produces, I see discussions about establishing an absolute rule like mathematical proofs. Reading this reminds me of Rust's borrow checker. In fact, after writing in Rust a few times, it often leads to bad practices where people use tricks to avoid the borrow checker.

    Actually, when mathematical rigor goes too far, humans tend to find ways around it. An undereducated programmer like me is especially prone to that.

    Looking back at this kind of attempt, it will probably result in writing code only for specific formalized answers. If it becomes that standardized, I'm not sure it will be able to respond to human needs.

    I think these defensive programming attempts are fine, but I want to do offensive programming (I coined that term). You take risks, but you fix things quickly and ship. Believing that over time, it will become good enough. Of course, for established industries where accuracy matters and the scope of work is well defined, like Jane Street, the approach in this article is correct. In other words, because there is enough data to adequately model the market's demands

    But for social losers like me trying to make money, constantly moving from place to place looking for gold mines, these kinds of methodologies seem like a luxury.Established businesses with mature modeling need highly educated and specialized personnel to continuously optimize. But I know that realistically, I can't keep up with that demand. So I look for places where modeling is unstructured, but I'm not sure if I can use this approach even then.

    • stephenlf 2 hours ago
      > these kinds of methodologies seem like a luxury

      Absolutely. The article acknowledges this. Jane Street is pretty uniquely equipped to benefit from this.

  • addaon 32 minutes ago
    I've been playing with related ideas recently, and can talk about them at length... but one thing that's surprised me is just how effective frontier models (ChatGPT-5.5 in particular) are at completing certain manual proofs in the Roqc (né Coq) proof assistant. The proofs aren't always pretty, but ChatGPT can often prove something in minutes and 10 - 100 iterations that would take me, a human who has limited but non-zero proof assistant experience but significant domain experience in the lemmas being proven, much much longer.

    The reason I think this is relevant and interesting in the context of the (short) article is because it challenges basic assumptions about how certain formal method tooling works. Frama-C, Ada/SPARK, etc are focused on generating proof obligations that can be automatically discharged by CVC5, Z3, etc; with a relatively painful fallback of manually proving the obligation in Rocq. The most common workflow seems to be to discover that an obligation is "hard" (not automatically discharged), and to restructure the set of visible lemmas and assertions at the obligation generation point in the code to try to make it "easy"; or even restructuring the code to try to make it easy. Which, in a world where Roqc proofs are doubly expensive (first because they're a challenge for a human to write; second because they tend to require quite a bit of maintenance churn as the code and proof around them evolves), makes sense. But if Roqc proofs are "automatically discharged with AI in the loop", this cost delta goes away -- and it becomes possible to think about writing proofs the same we (usually) write code, with human readability and clarity as the first goal, and [compiler|prover] optimization a distant second.

    Which I find quite exciting, although I haven't fully digested the implications yet.

    • bluGill 28 minutes ago
      How often will AI look at something that can't be proven because you change requirements and decide to change the code and your requirements to make the proof easier though? I haven't played with proofs at all, but I do know that occasionally when I say, this test failed after making a change, AI will just change the test instead of making the code pass both the old test and the new test which is most often my intent.
  • brap 3 hours ago
    Whenever I read about formal specs it always seems to me like “write the same tests just in a different way”, or worse, “write the same implementation but in a different way”.

    I guess doing things twice can help catch errors, but I fail to see what’s so special about formal specs if they can suffer from the exact same bugs as the tests/implementation.

    I guess the root of the problem is if you want to formally prove that a program does something, you have to be very specific (heh) about what that something is, at which point you are basically just writing tests/implementation all over again.

    I have been looking into this on and off for years now, and I keep feeling like I’m missing something, but I don’t know what it is.

    Can anyone enlighten me?

    • chadgpt3 3 minutes ago
      For trivial functions they can be. Function: isPrime. Impl: loop to sqrt(n) and check divisibility. Spec: returns true if nothing divides. Impl: return true at the end if we didn't return false.

      It's not exactly the same because your spec says nothing divides, not just up to sqrt(n). It's definitely not a test if you do it right.

    • jlebar 3 hours ago
      > Whenever I read about formal specs it always seems to me like “write the same tests just in a different way”, or worse, “write the same implementation but in a different way”. [...] Can anyone enlighten me?

      A big difference is that formal methods allow you to use the "for all" quantifier.

      For example, you might write a unit test that says "foo('abc') returns a string with no trailing whitespace".

      But with formal methods, you can prove that "for any input x, foo(x) returns a string with no trailing whitespace".

      This is a trivial example, but you could imagine something more complicated, such as "for any program P, compile(P) has the same behavior as P".

      Of course, you have to define what "has the same behavior as" means!

      • brap 35 minutes ago
        >Of course, you have to define what "has the same behavior as" means

        And that's really my issue, for example when you define "has no trailing whitespace", you are basically writing a piece of the implementation. Cover all behaviors, and you have basically re-implemented the function, no?

        In other words, if I have the full formal spec of f(), isn't that the same thing as having f()?

        • chadgpt3 1 minute ago
          Non trailing whitespace means the string doesn't end with a space. But foo is a function that converts an AST to a string, that's totally different. Or it's a function that loops until \0 and changes all spaces to +

          The spec should be a summary of what the impl is supposed to do. You'd want more than just doesn't end with whitespace of course.

        • bluGill 22 minutes ago
          > In other words, if I have the full formal spec of f(), isn't that the same thing as having f()?

          In some cases, however quite often, the spec is much simpler. For instance, it's easy to say that after running sort on some list, that the result is sorted. However, it is very hard to come up with the algorithm to do that from the specification. Sometimes that is even a point. Bubble sort, quick sort, tim sort, we can go on and on. There's a huge number of different sorts that computer science have discovered over the years. They all should have the same result and so you should be able to prove they do the same thing. However, in the real world there are often reasons you would prefer one to another despite all meeting the same spec.

          • brap 4 minutes ago
            This makes a lot of sense, thank you!
          • pydry 7 minutes ago
            I did some of this stuff in college and what bugged me was that the spec actually ended up more complex than the code and it had bugs.

            That was a long time ago and they said that formal methods were the future back then, too.

    • y1n0 3 hours ago
      I don’t write software, but in asic and fpga design, formal method specifications enable the use of things like SAT solvers to determine if the underlying test article meets the specification.

      You specify properties of the design and the tooling tests the design in a variety of ways to see if it can violate those properties.

      Let’s say you have a system that controls turn signals. You can specify properties that say things like lanes that cross each other can’t both have a green light at the same time or even within some period of time of each other.

      The tooling can exhaustively check the design for this and surface code traces that violate it.

    • jaggederest 3 hours ago
      It's more than just "write the same tests just in a different way", because each test is largely independent, or grows to unmanageably large, and you have to do the work of figuring out the completeness of your test suite by e.g. branch coverage or other relatively crude overlap methods.

      Statically proven type systems let you do this in a way that each contributing piece is checked in advance against all the other pieces, guaranteeing not just "this test passes" but, in combination, "all these tests create a reasonable, coherent whole", and it disallows incoherent models from being implemented in the code. An example of this is like Rust's match, which requires complete coverage of all the possible branches, but writ large across an entire codebase.

      You're right that it does nothing for you if you make a fundamental logic or theory error, it can't catch that kind of error, but you'd be surprised how much more robust it is than "merely" e.g. Haskell's type system + ad hoc functional testing + property testing - which I would consider a quite strong system overall, but formally proven e.g. cryptography is a much higher bar.

    • comonoid 1 hour ago
      The book "Program = Proof" by Samuel Mimram starts with a formula which is true for all n below

      n = 15 341 178 777 673 149 429 167 740 440 969 249 338 310 889

      I don't think you can catch it with any test suite.

      • pdhborges 1 hour ago
        If you have an int32 or less you can!
    • IshKebab 9 minutes ago
      That can definitely be a problem, and I have definitely written formal specs (for hardware in SVA) where I'm like "I'm just implementing this again". And actually that's a totally valid formal verification task to do - formal equivalence checking of an implementation against a model (which is just another implementation). I can feel (and sometimes is) tautological.

      However you usually try to not do that. If possible (and it usually is possible) your formal spec will be either:

      1. Another implementation, but a much simpler one. For example you can formally verify the equivalence of a pipelined dual issue CPU, with a combinatorial single cycle model.

      2. More general properties that much be true about the implemention, rather than exactly what the implementation must be. The classic example is compression: decompress(compress(x)) == x.

  • spenrose 3 hours ago
    As usual, this paean to deductive reasoning (“formal methods”) leaves out its fundamental limit: how closely do the postulates and definitions fit the domain they purport to map? (“In theory, there is no difference between theory and practice. In practice ...”) My guess is that Jane Street maintains large code bodies where the mapping is 1:1, because the purpose of the code is to implement a deterministic algorithm. Many other coders work in such areas. But millions of us don’t: most UIs, most exploratory work, etc.

    There is a movement parallel to formal methods to define acceptance criteria at high resolution but not logico-mathematically, which at least grapples with the mapping problem but can’t resolve it where the map isn’t the territory, which is most places. Has Google’s results page, with its extremely evolved internal optimization frameworks really hit an optimum? Could that prototype you whipped up to capture a hazy idea have better illustrated it? These questions are best answered by looking outside the system to what the system serves.

    • benlivengood 3 hours ago
      Formal methods are precisely for the domains where the semantics are well-defined. Logical circuits (a lot of CPU components get formal verification), kernels, protocols, parsers, compilers, cryptography, security frameworks, concurrency primitives, etc. all benefit a lot from verification.
    • petra 2 hours ago
      Google's results page isn't well defined. But probably 90%+ of the code below it is well defined.

      And in some cases, where the result isnt well defined, it can be learned, so it's not about sitting and thinking what would make sense.

  • awinter-py 4 hours ago
    See previous kleppmann post https://martin.kleppmann.com/2025/12/08/ai-formal-verificati..., and yes, obviously anything that you can put in the typesystem or the linter, you should weigh doing so.

    Hopefully we get more ergonomic ways to do this? Like of the tools listed in the post, dafny + iris are the closest to being industrial I think. And amzn S3 has a history of TLA use in-house I think. But we probably haven't seen the typescript in this space yet, a zero cost abstraction that drops into existing tools, and people genuinely prefer it to the old way.

    (And custom linters are also still pretty bad to write. Like golangci-lint is a painful codebase, haven't tried semgrep but the rules engine seemed intimidating. I've yet to use an AST API that I liked)

  • xvilka 5 hours ago
    seL4 despite being formally verified contained gnarly bugs, like [1]. Thus, it's not a silver bullet as some people think. Yes, it improves the quality, but only one of the aspects of hardening software, like ASAN, SAST, fuzzing, strict languages, and so on.

    [1] https://github.com/seL4/seL4/issues/1199

    • cayley_graph 5 hours ago
      > Since it is the unverified SMP config of the kernel

      I don't disagree with your point (formal verification does not rid you of all bugs), but this is not the subject of the linked issue. This was a bug in an unverified path.

  • mark4 5 hours ago
    Relevant keynote I wrote about

    Formal methods won, now what? https://muratbuffalo.blogspot.com/2026/04/bugbash26-keynote....

  • reinitctxoffset 3 hours ago
    I caught the religion on using types in conjunction with LLMs about eighteen months ago, but I only really got serious about `lean4` like six to eight months ago and now I wouldn't even consider using AI assist in software work with a `CIC` proof substrate that has practical C/C++ (and therefore, everything) FFI.

    We've banned everything from JSON to Python, rewritten `nix` to have a compiler, and almost everything we write is not only property tested and multiply fuzzed to within an inch of it's life, but we have proofs in `lean4` that at a minimum drive property tests via `.olean` linkage, and when we have the bandwidth we prove exhaustiveness over the domain and property test that.

    We skip the whole C++/Rust thing because all of the fast stuff is generated from `lean4` and so it doesn't really matter (C++ has advantages when bugs in it are actually bugs in `lean4` code, but you could go either way).

    This is a big departure, and I certainly don't blame anyone who is skeptical: "ban JSON and Python wtf?!?!", but we've done millions of lines (checked) of this stuff and AI + formal systems is a dramatically bigger leap than no AI -> AI and Python. The latter in our experience is not monotone in progress, the former is almost always monotone in progress.

    And you can do wild shit, this is a formal proof of the polyhedral tensor calculus that is modeled by things like ISL and CuTe, and using that we can get swizzling and tiling using `mdspan` in C++23 on device and prove it's right (up to some L'Hopital arugment about the coverage which this example doesn't demonstrate well: https://github.com/b7r6/mdspan-cute

    That in turn, well, it goes real fast. On the first try.

    https://straylight.software/assets/lambda-hierarchy.svg

    • skybrian 1 hour ago
      Ok but what software have you built and why is it useful?
      • reinitctxoffset 58 minutes ago
        This is all pre-release but a few highlights:

        - `continuity` is a `lean4` metaprogramming system that we use all kinds of ways but the real meat is it allows formal specification of codecs and state machines in ways that make security and performance properties proof amenable, the key trick here is to limit parser power to just what you need and no more. a very cool thing is that we can add targets to it, so when we do Zig for example, Zig will immediately get proven correct and frontier performance support for dozens of protocols that are not all mature right now.

        - `libevring-cpp` (bound up into `libevring-hs` and `libevring-rs`) is a Trinity-inspired deterministic event replay system that replaces anything you would have done with `libuv` or whatever, and it's wired into `io_uring` (we're stuck with `kqueue` on darwin, eh). it interfaces with `continuity` machines and codecs (which are generated very carefully for the hardware they run on) and we have yet to find a way of measuring such programs where it doesn't resoundingly shatter the performance of any other asynchronous programming primitive in any language. i'm sure the community will prove us wrong when we release it, but it's real fast. and you get much stronger guarantees than in most such systems (Trinity derived, so if you can repro a bug, you can walk the event trace until it's sitting there in GDB and shit)

        - `hyperconsole-cpp` (and `hyperconsole-hs` and `hyperconsole-rs`) is the TUI library idea taken to some deranged extreme on performance and supports everything in `notcurses`, it's pretty wild: https://youtu.be/YqgEtpJ8tGI

        - straylight-nix is a complete rewrite of nix that fixes thousands of bugs, hundreds of them security adjacent and dozens of them we only talk to vendors about. it's daemonless, dramatically faster, ground-up WASM-targeting compiler with a formal grammer, uses an extremely fast LSM-based store (it can read the legacy store but we don't write it) that fixes all the problems in floating CA, IFD is too cheap to care about, and recursive nix is no longer and issue (see daemonless). it supports tearing derivations into `REAPI` actions that you can feed to your friendly native NativeLink or whatever, which just goes through them like a woodchipper. KVM-based sandbox with snapshot and restore, really opens the world up on what your builders can be.

        - `slide` is the reference implementation of a family of protocols called `SIGIL`: `SIGIL-LLM` is a binary encoding for LLM data that resets on ambiguity and drops the average bytes on the wire from e.g. OpenRouter to your harness from ~hundreds to ~1.5 per token, `SIGIL-API` is a bijection on OpenAPI 3.1.0 and AsyncAPI 3.0.0 that gives comparable improvements, and `SIGIL-SH` is such an encoding for a sensible subset of bash. this does about 1.5 billion tokens per second on a laptop and never emits partial frames, so you don't get speculative execution rollback problems in your harness that tilt agents off.

        - `// WEAPON //` is an adversarial, vendor-skeptical, full-take surveilance agemt harness built on `hyperconsole-cpp` and `SIGIL` (so, you could absorb the entire token output of OpenRouter on one machine if you wanted at least on the wire and in the terminal, clearly the bottleneck rapidly becomes whatever the agents are calling, but it's `zmq4` transport underneath `SIGIL` so it's also trivial to full-take all of your data for fine tuning or whatever you want it for into e.g. `parquet` on R2. `// WEAPON //` does a bunch of stuff: the tool call surface is heavily optimized for AST-level edits that miss dramatically less, we intercept and manipulate shell commands (slice off the stupid `| tail -n5` that keeps the droid in a loop not seeing the error, pre-emptively ground using heuristics that have been tuned (defeats the search flinch), and always recovers from any stall, or nag box, or anything else that would serve as an unannounced rate limit, it's fine if vendors rate limit but they need to put it in their ToS. it has a bunch of other primitives, agents run in real KVM sandboxes and speculate out as wide as you want to pay the tokens for. we hyper-manage things like the cache breakpoint geometry of Anthropic so e.g. Opus rarely misses in cache and always hands off edits to specialized tool use models. it's pretty extreme the difference in outcomes relative to all this React jank.

        - `s4` is a general-purpose compiler from most any pytorch 2.0 model to `myelin`-level performance on NVIDIA (we only support NVIDIA Blackwell at the moment, that might change) and it's never worse than `myelin` because if we don't out-tune it we invoke it, but we out tune it a lot because we've proven a lot of decideability theorems about tiling and scheduling on both 1CTA and 2CTA, so we can often arrive at a finite, enumerable set of schedule/tile choices. `myelin` mops up the random garbage around the big GEMMs just like in TRT-LLM.

        - `sigil-trtllm` is inspired by TensorRT-LLM-Edge but designed from the ground up around Mellanox/ConnectX and in particular GPUDirect, so it can stream `SIGIL-LLM` tokens directly onto the wire whereas something like Dynamo is usually traversing both Python and NATS, which is super weird to us. this uses the `s4` compiler very heavily.

        - `straylight-cas` is a geographically distributed content addressable store backed by any R2-compatible (so most any S3-compatible too) object store with multi-level LSM and extreme performance memtables, optimistic hinted handoff over `zmq4` to other geographies, and a really simple operational story, this is kind of the thing that powers the product surface.

        ... which is the thing i'm less ready to talk about because it's supposed to be a surprise.

        • skybrian 46 minutes ago
          It sounds impressive but also rather obscure. Do you have users?
          • reinitctxoffset 36 minutes ago
            No we're pre-alpha, I guess most people would call it stealth but it's more like, not quite done and we don't want to waste people's time because our entire thesis is around correct outcomes in AI systems at a level that permits their use in outcome-critical regimes (we sometimes call it "insurable" AI as a north star, would LLoyds of London or Swiss Re stand behind someone who was writing policies on this?).

            Now a bunch of that is development tooling that copes with agent-scale software development, and a lot of that might become product surface, so we have a lot of usage denominated by like, bytes and agent hours and stuff because we build this stack in itself, but that's somewhat orthogonal to the north star vision.

            We'll make sure to give the HN community the opportunity to see this stuff as early as anyone does if people find it interesting, most of the above will be open source fairly soon. Don't know if it'll make the front page, but the product will be called `ORBITAL`, so if you see that floating around that's us.

            Appreciate the interest!

  • Syzygies 2 hours ago
    I nearly adopted OCaml for my current math research, in no small part because of Jane Street contributions. I instead chose Lean 4, as LLMs became able to help code in Lean. I give up a factor of two in performance (a gap likely to close) for what reads to me as the clearest expression of each algorithm. Lean is a beautiful language that makes OCaml read like Old English.

    Lean is designed to be optionally verified; proof is its primary (but not only) application. It seems an impedance mismatch for Jane Street to explore this direction without also migrating to Lean 4.

  • JacobAsmuth 1 hour ago
    Sounds like Jane Street would be interested in the Lean framework I've been working on for formally verified frontends. https://github.com/JacobAsmuth/qed
  • jp0001 1 hour ago
    Formal methods is like a plan. Everyone has one until they are punched in the face (real world requirements and trust boundaries).
    • Paracompact 1 hour ago
      Formal methods are not meant to replace trust in a system. They are meant to minimize the surface area of trust. To not understand and advertise what surface area still exists is foolhardy, and mistakes logic for magic.
  • eddiepete 6 hours ago
    I wonder how formal methods can help us move faster with GenAI.

    Is it that they can help write formulas faster? That they can help ensure formulas match the system they're modeling faster? If the problem you think formal methods will help with is sloppy code, isn't the verification code going to be sloppy as well, unless some (not sloppy) intelligence carefully confirms that the specification matches the target system, which was the labor that previously made formal methods too expensive? I guess I don't understand how the argument works if code was previously less sloppy and verification was too expensive, and now code is more sloppy, and there's more of it, but somehow the sloppy intelligence will make verification move fast enough to make it worthwhile.

    Unless we have some non-sloppy intelligence that's less of a bottleneck on verification than humans, how are we in a better place?

    Maybe it's that investing that huge amount of labor of verification by human experts is now worth it because so much code will be produced that uses the verification systems that the investment will now pay off. But that requires creating pretty general verification systems, such as type system verification or something (which is what they seem to be aimed at), rather than individually verifying software systems like the micro-kernel example.

    In other words, maybe the play is to invest in reusable verification systems that can be run tons of times on new code and systems. If so, it's surprising that this wasn't always the strategy.

    • jsenn 5 hours ago
      > isn't the verification code going to be sloppy as well

      The beauty of formal methods is it doesn't matter if your proof is sloppy. As long as it passes verification, it is correct. And unlike in pure math, the proof that a software system is correct is usually a huge mess of special cases, loop invariants, proofs by induction, and boilerplate that requires a large amount of human labour while providing no insight.

      Proofs are also brittle: a tiny change in the code can force you to throw your proof away and start from scratch.

      To me, the exciting thing about formal methods in the LLM era is it allows humans to offload the difficult and tedious work of writing proofs to a computer. Taken to an extreme, the human could live entirely in the world of a formal specification, and the LLM could generate 100% of the code. The code may be a mess, but if the system proves it satisfies the spec then it can't be wrong.

      • pron 3 hours ago
        The problem is that generating either code or proofs with LLMs is very expensive, and generating good proofs (I don't mean elegant, I mean proving the most important properties) is probably not very fast, either. Reducing the verification time of a program from 100 years to 10 years or the cost from $1bn to $100m is still not practical enough to become truly mainstream.

        Things can be improved when people help guide and focus the LLMs, but these people still need to be formal methods experts.

      • odyssey7 4 hours ago
        So, formal methods produce runnable systems, but communication remains the challenge.

        If a formal spec is messy, then it's a proof of ... what, exactly?

        A formal specification that bridges tech and product, that lets non-technical contributors read and discuss all the logical nuances, directly as operational code, at product's level of abstraction of interest, would transform a lot.

        It's no longer a challenge to create code, it's a challenge to create business requirements and translate them into systems.

    • jnwatson 5 hours ago
      The general idea is that formal methods are self-verifying, up to a point. Sloppy formal methods simply won't prove.

      The point up to which formal methods stops is: do the formally encoded requirements actually encode what I want to be true?

      One can make the argument that the requirements is a much smaller surface to verify than that of the entire program.

      The counter argument is that figuring out what you want the program to do has always been the hardest part of programming, and that programming in itself is a journey to discover latent requirements.

      • Veserv 4 hours ago
        > One can make the argument that the requirements is a much smaller surface to verify than that of the entire program.

        This argument is unfortunately empirically false for any program of any meaningful complexity (>1000 lines, probably even as low as >100 lines ignoring well-defined algorithms and data structures) using current formal methods.

        Complete formal specifications are usually multiple times larger than the corresponding source code and encode esoteric propertys necessary for the proof, but which are largely even more impenetrable than a undocumented codebase.

        So, it is both harder to figure out if you encoded the desired requirements and it is more complex. Your only advantage is confidence that what you wrote down is proven.

  • pjmlp 4 hours ago
    I am already seeing the uptake on Spec-Driven Development as the Rational Unified Process revenge.
  • bobkb 5 hours ago
    I have been testing formal verification methods with multiple products. It will be great to also understand more about what’s tried and how it was done. For example attempting to verify the spec is what I have been trying to implement.
  • yawaramin 2 hours ago
    Quis custodiet ipsos custodes?
    • Animats 2 hours ago
      The proof checker.

      You verify a proof system by having it produce a proof trace:

      - Step 3185: Apply rule that a * b = b * a to "length * width" in the current formula.

      Then you run a proof trace checker which applies each transformation in sequence and checks that the expected result is obtained.

      Provers are complicated, but proof trace checkers are really dumb.