Fundamentally it cannot be much better than how well we can write the spec and then validate the results.
It’s always gonna be a multi shot process. And it can already write code good enough. That’s no longer the bottleneck.
Further, Qwen 27b is such an incredible masterpiece for coding and it can run on consumer hardware today. Anthropic/OpenAI are gonna give up on coding models very soon. There’s not gonna be any money in it when you can run your own local model for significantly cheaper.
Qwen27b is not SOTA but the value is insane. You can basically use it for small tasks and then route harder problems to opus or sonnet and boom you’ve said a lot of money.
There are probably some respectable workflows that involve an LLM writing most of the code, but AI is still terrible at understanding some critical parts of the problem. You still have to tell it what to write and how it should work or there are high odds that you'll get a hot mess. And there still needs to be a human that understands everything there and how to debug it. For me, the most enjoyable path there is to write it myself, because I would rather be involved in writing the code than only involved in reading it. It might not be the fastest path there, but it gets the job done for the foreseeable future. I could end up like the Amish who choose not to use technology that was developed after a certain point, from what I can tell they do alright.
Kind of a great video! I enjoyed it. His point about testing coverage and generating mutations to ensure the tests fail resonated. I get concerned sometimes that the AI is writing tests not to ensure the logic is correct, but to ensure the tests pass against the code it already wrote. Any other ideas on this? Is there a code review step or CI checkpoint that would decrease the likelihood of that?
"Forty years later, in September of 2018, I started working on this version of Space War. It's an animated GUI driven system with a frame rate of 30fps. It is written entirely in Clojure and uses the Quil shim for the Processing GUI framework." - Robert Martin
It’s hard to give up, but likely necessary. That doesn’t mean quality has to suffer, we can still gate with deterministic quality tooling where it matters. But yeah, at some scale it stops mattering how human readable the code is, as long as AI can effectively and efficiently (token-wise) make edits or add features.
The point is not human readability, but good structure. Spaghetti code is as bad for an LLM as for a human, because structural complexity and the amount of coupling are fundamental limits, not human-specific.
Here's how history rhymes with this logic. The development of compilers v writing assembly language was not without a very similar "controversy" — that is, are the new tools more efficient or less efficient.
The first compilers were measured relative to hand-tuned assembly language efficiency. The existing world of compute was very much "compute bound" and inefficient code was being chased out of every system.
The introduction of the first compilers generally delivered code "within 10-30%" as efficient as standard professional assembly. This "benchmark" was enough for almost a generation of Fortran programmers to dismiss the capabilities of compilers.
Also worth noting, early compilers (all through the 1980s) routinely had bugs that generated incorrect code. Debugging a compiler is a nightmare (personal experience). This only provided more "ammo."
With the arrival of COBOL the debate started to shift. COBOL generated decidedly "bloated" code so there was no way to win the efficiency argument. But what people started to realize was that a "modern" programming language made it possible to deliver vastly more software and for many more people to work on the same code (ASM notorious for being challenging for multiple engineers on the same portion of code). So the metric slowly started to move from "as good as hand tuned assembler" to "able to write bigger, more sophisticated code in less time with more people). Computers gained timesharing, more memory, and faster CPUs which made the efficiency argument far less compelling (only to repeat with the first 8K or 64K PCs).
This entire transition is capped off with a description in Fred Brooks "Mythical Man Month" book, one of the seminal books in the field of programming and standard issue book sitting in my office waiting for me on my first day at Microsoft. (See full book free here https://web.eecs.umich.edu/~weimerw/2018-481/readings/mythic...)
It is very early. I was not a programmer when the above happened though I did join the professional ranks while many still held these beliefs. For example, I interned writing COBOL on mainframes while PCs were using C and Pascal which were buggy and viewed as inefficient on processor/space-constrained PCs.
The debate would continue with C++, garbage collection, interpreted v compiled (Visual Basic) and more. As a fairly consistent observation over decades, every new tool is viewed through a lens (at first) by experienced programmers over what is worse while new programmers use the tool and operate in a new context (eg "more software" or "bigger projects"). The excerpt below shows this debate as captured in 1972.
For all LLM flaws, if it kills the whole Agile/SCRUM/whatever grift, it will have been worth it. The damage these guys have done to software industry at large is unfathomable.
This. Uncle Bob was already over, and now he seems to be hitting the skids REAL bad. Just listening to him is tough: this guy's bad news, I didn't realize he was this bad off.
Wealthy white dude edging towards senility taking a liking to bathrobe social media shorts. Take a guess. It's going to involve a political party and a lot of weird public takes unrelated to software.
I use Claude code and codex daily. They have become an integral part of my workflow.
There is no task that takes me a day that they can complete in five minutes.
Even with the lightning fast progress being made, it looks like LLMs are a decade or more away from being that good.
If AI can do your job for you, you should be the first to know. Just try it and see!
It’s always gonna be a multi shot process. And it can already write code good enough. That’s no longer the bottleneck.
Further, Qwen 27b is such an incredible masterpiece for coding and it can run on consumer hardware today. Anthropic/OpenAI are gonna give up on coding models very soon. There’s not gonna be any money in it when you can run your own local model for significantly cheaper.
Qwen27b is not SOTA but the value is insane. You can basically use it for small tasks and then route harder problems to opus or sonnet and boom you’ve said a lot of money.
Five minutes is pushing it, but 15 minutes? Absolutely.
https://blog.cleancoder.com/uncle-bob/2021/11/28/Spacewar.ht...
https://x.com/stevesi/status/2050325415793951124
Here's how history rhymes with this logic. The development of compilers v writing assembly language was not without a very similar "controversy" — that is, are the new tools more efficient or less efficient.
The first compilers were measured relative to hand-tuned assembly language efficiency. The existing world of compute was very much "compute bound" and inefficient code was being chased out of every system.
The introduction of the first compilers generally delivered code "within 10-30%" as efficient as standard professional assembly. This "benchmark" was enough for almost a generation of Fortran programmers to dismiss the capabilities of compilers.
Also worth noting, early compilers (all through the 1980s) routinely had bugs that generated incorrect code. Debugging a compiler is a nightmare (personal experience). This only provided more "ammo."
With the arrival of COBOL the debate started to shift. COBOL generated decidedly "bloated" code so there was no way to win the efficiency argument. But what people started to realize was that a "modern" programming language made it possible to deliver vastly more software and for many more people to work on the same code (ASM notorious for being challenging for multiple engineers on the same portion of code). So the metric slowly started to move from "as good as hand tuned assembler" to "able to write bigger, more sophisticated code in less time with more people). Computers gained timesharing, more memory, and faster CPUs which made the efficiency argument far less compelling (only to repeat with the first 8K or 64K PCs).
This entire transition is capped off with a description in Fred Brooks "Mythical Man Month" book, one of the seminal books in the field of programming and standard issue book sitting in my office waiting for me on my first day at Microsoft. (See full book free here https://web.eecs.umich.edu/~weimerw/2018-481/readings/mythic...)
It is very early. I was not a programmer when the above happened though I did join the professional ranks while many still held these beliefs. For example, I interned writing COBOL on mainframes while PCs were using C and Pascal which were buggy and viewed as inefficient on processor/space-constrained PCs.
The debate would continue with C++, garbage collection, interpreted v compiled (Visual Basic) and more. As a fairly consistent observation over decades, every new tool is viewed through a lens (at first) by experienced programmers over what is worse while new programmers use the tool and operate in a new context (eg "more software" or "bigger projects"). The excerpt below shows this debate as captured in 1972.
But I found myself laughing at the style; just ranting about software like a cartoon villain in his bathrobe. No fucks given.
It comes as no surprise to me that the guy who has bad opinions about software architecture, has worse opinions about vibe coding.