Provenance Is the New Version Control

(aicoding.leaflet.pub)

34 points | by gpi 4 hours ago

15 comments

  • hnlmorg 23 minutes ago
    Code still matters in the world of LLMs because they’re not deterministic and different LLMs produce different output too. So you cannot pin specification to application output in the way the article implies.

    What the author actually wants is ADRs: https://github.com/joelparkerhenderson/architecture-decision...

    That’s a way of being able to version control requirements.

  • gritzko 1 hour ago
    LLMs can implement red-black trees with impressive speed, quality and even some level of determinism. Here I buy the argument. Once we take something that is not already on GitHub in a thousand different flavors, it becomes an adventure. Like real adventure.

    So what did you say about version contol?

    • nine_k 1 hour ago
      Basically, if it's in the commit history, it can be checked out and adjusted to the local circumstances. If not, then somebody has to actually write it!
  • RHSeeger 1 hour ago
    I'm a bit confused by this because a given set of inputs can produce a different output, and different behaviors, each time it is run through the AI.

    > By regenerable, I mean: if you delete a component, you can recreate it from stored intent (requirements, constraints, and decisions) with the same behavior and integration guarantees.

    That statement just isn't true. And, as such, you need to keep track of the end result... _what_ was generated. The why is also important, but not sufficient.

    Also, and unrelated, the "reject whitespace" part bothered me. It's perfectly acceptable to have whitespace in an email address.

    • onion2k 1 hour ago
      I'm a bit confused by this because a given set of inputs can produce a different output, and different behaviors, each time it is run through the AI.

      How different the output is each time you generate something from an LLM is a property called 'prompt adherence'. It's not really a big deal in coding LLMs, but in image generation some of the newer models (Z Image Turbo for example) give virtually the same output every time if the prompt doesn't change. To the point where some users claim it's actually a problem because most of the time you want some variety in image gen. It should be possible to tune a coding LLM to give the same response every time.

      • gizmo686 1 hour ago
        Even if you have deterministic LLMs (which is absolutely something that can be done), you still need to pin a specific version to get that. That might work in the short term; but 10 years from now, your not going to want to be using a model from today.
        • nextaccountic 58 minutes ago
          > Even if you have deterministic LLMs (which is absolutely something that can be done),

          Note, when Fabrice Bellard made his LLM thing to compress text, he had to make sure it was deterministic. It would be terrible if it slightly corrupted files in different ways each time it decompressed

      • belZaah 43 minutes ago
        If that is true, and a given history of prompts combined with a given mosel always gives the same code, then you have invented what’s called a compiler. Take human-readable text and convert it into machine code. Which means we have a much higher level language, than before and your prompts become your code.
      • locknitpicker 17 minutes ago
        > How different the output is each time you generate something from an LLM is a property called 'prompt adherence'. It's not really a big deal in coding LLMs, (...)

        I strongly disagree. Nowadays most LLMs support updating context with chat history. This means the output of a LLM will be influenced by what prompts you have been feeding it. You can see glaring changes in what a coding agent does based on what topics you researched.

        To take the example a step further, some LLMs even update their system prompts to include context such as where you are in the world at that precise moment and the time of the year. Once I had ChatGPT generate a complete example project based around an event that was taking place at a city I happened to be cruising through at that moment.

  • alphabetag675 44 minutes ago
    If you could regenerate some code from another code in a deterministic manner, then congrats you have developed a compiler and a high-level language.
  • viraptor 2 hours ago
    I'm not sure if this actually needs a new system. Git commits have the message, arbitrary trailers, and note objects. If this was of source control is useful, I'm sure it could be prototyped on top of git first.
    • smaudet 1 hour ago
      The article smacks of someone who doesn't understand version control at all...

      Their main idea is to version control the reasoning, which, OK, cool. They want to graph the reasoning and requirements, sounds nice, but there are graph languages that fit conviently into git to achieve this already...

      I also fundamentally disagree with the notion that the code is "just an artifact". The idea to specify a model is cute, but, these are indeterminate systems that don't produce reliable output. A compiler may have bugs yes, but generally speaking the same code will always produce the same machine instructions, something that the proposed scheme does not...

      A higher order reasoning language is not unreasonable, however the imagined system does not yet exist...

  • elzbardico 16 minutes ago
    I am exhausted of this ThoughtWorks style of writing. I can smell it from a mile away.
  • ricksunny 28 minutes ago
    “ the code itself becomes an artifact of synthesis, not the locus of intent.”

    would not be unfamiliar to mechanical engineers who work with CAD. The ‘Histories’ (successive line-by-line drawing operations - align to spline of such-and-such dimensions, put a bevel here, put a hole there) in many CAD tools are known to be a reflection of design intent moreso than the final 3D model that the operations ultimately produce.

    • crote 3 minutes ago
      CAD tools also really don't like changes in the history. A tiny change in one step can corrupt the entire model, because a subsequent step can no longer properly "attach" to a reference point which no longer exists.

      Fixing this in CAD is already a massive pain, fixing it with black-box LLMs sounds nearly impossible.

  • mmoustafa 39 minutes ago
    I wrote an article on this exact issue (albeit more simpleminded) and I suggested a rudimentary way of tracking provenance in today's agents with "reasoning traces" on the objects they modify.

    Would love people's thoughts on this: https://0xmmo.notion.site/Preventing-agent-doom-loops-with-p...

  • d--b 3 minutes ago
    I found it quite insightful.

    Looking at individual line changes produced by AI is definitely difficult. And going one step higher to version control makes sense.

    We're not really there yet though, as the generated code currently still needs a lot of human checks.

    Side thoughts: this requires the code to be modularized really well. It makes me think that when designing a system, you could imagine a world where multiple agents discuss changes. Each agent would be responsible for a sub system (component, service, module, function), and they would chat about the format of the api that works best for all agents, etc. It would be like SmallTalk at the agent level.

  • jayd16 2 hours ago
    What if I told you a specification can also be measured (and source controlled) in lines?
    • JellyBeanThief 1 hour ago
      This was the very first thing I thought when I was taught about requirement traceability matrices in uni. I was like "Ew, why is this happening in an Excel silo?" I had already known about ways of adding metadata to code in Java and C#, so I expected everything to be done in plain text formats so that tooling could provide information like "If you touch this function, you may impact these requirements and these user stories." or "If you change this function's signature, you will break contracts with these other team members (here's their email)."
  • klodolph 1 hour ago
    > Once an AI can reliably regenerate an implementation from specification…

    I’m sorry but it feels like I got hit in the head when I read this, it’s so bad. For decades, people have been dreaming of making software where you can just write the specification and don’t have to actually get your hands dirty with implementation.

    1. AI doesn’t solve that problem.

    2. If it did, then the specification would be the code.

    Diffs of pure code never really represented decisions and reasoning of humans very well in the first place. We always had human programmers who would check code in that just did stuff without really explaining what the code was supposed to do, what properties it was supposed to have, why the author chose to write it that way, etc.

    AI doesn’t change that. It just introduces new systems which can, like humans, write unexplained, shitty code. Your review process is supposed to catch this. You just need more review now, compared to previously.

    You capture decisions and specifications in the comments, test cases, documentation, etc. Yeah, it can be a bit messy because your specifications aren’t captured nice and neat as the only thing in your code base. But this is because that futuristic, Star Trek dream of just giving the computer broad, high-level directives is still a dream. The AI does not reliably reimplement specifications, so we check in the output.

    The compiler does reliably reimplement functionally identical assembly, so that’s why we don’t check in the assembly output of compilers. Compilers are getting higher and higher level, and we’re getting a broader range of compiler tools to work with, but AI are just a different category of tool and we work with them differently.

    • charcircuit 1 hour ago
      >If it did, then the specification would be the code.

      Except you can't run english on your computer. Also the specification can be spread out through various parts of the code base or internal wikis. The beauty of AI is that it is connected to all of this data so it can figure out what's the best way to currently implement something as opposed to regular code which requires constant maintenance to keep current.

      At least for the purposes I need it for, I have found it reliable enough to generate correct code each time.

      • free_bip 1 hour ago
        What do you mean? I can run English on my computer. There are multiple apps out there that will let me type "delete all files starting with" hacker"" into the terminal and end up with the correct end result.

        And before you say "that's indirect!", it genuinely does not matter how indirect the execution is or how many "translation layers" there are. Python for example goes through at least 3 translation layers, raw .py -> Python bytecode -> bytecode interpreter -> machine code. Adding one more automated translation layer does not suddenly make it "not code."

        • charcircuit 38 minutes ago
          I mean that the prompt is not like code. It's not a set of instructions that encodes what the computer will do. It includes instructions for how an AI can create the necessary code. Just because a specification is "translated" into code, that doesn't mean the input is necessarily code.
          • yaris 18 minutes ago
            What is conceptually different between prompts and code? Code is also not always what the computer will do, declarative programming languages are an example here. The only difference I see is that special precaution should be taken to get deterministic output from AI, but that's doable.
      • alphabetag675 42 minutes ago
        As long as your language is good enough to generate correct code at any point, it is a specification. If not, it is an ambiguous approximation.
  • sebaschi 2 minutes ago
    This style of writing is insufferable (to me). The idea is also not as deep is it may seem based on the language used. I also don’t think it’s strictly valid, i.e. that version control somehow needs to be adjusted to AI.
  • atoav 40 minutes ago
    So what they want is to write a spec with business rulws and implementation drtails and version control that instead?
  • akoboldfrying 1 hour ago
    Yes, in theory you can represent every development state as a node in a DAG labelled with "natural language instructions" to be appended to the LLM context, hash each of the nodes, and have each node additionally point to an (also hashed) filesystem state that represents the outcome of running an agent with those instructions on the (outcome code + LLM context)s of all its parents (combined in some unambiguous way for nodes with multiple in-edges).

    The only practical obstacle is:

    > Non-deterministic generators may produce different code from identical intent graphs.

    This would not be an obstacle if you restrict to using a single version of a local LLM, turn off all nondeterminism and record the initial seed. But for now, the kinds of frontier LLMs that are useful as coding agents run on Someone Else's box, meaning they can produce different outcomes each time you run them -- and even if they promise not to change them, I can see no way to verify this promise.

  • hekkle 1 hour ago
    TL;DR, the author claims that you should record the reasons for change, rather than the code changes themselves...

    CONGRATULATIONS: you have just 'invented' documentation, specifically a CHANGE_LOG.