AI helps ship faster but it produces 1.7× more bugs

(coderabbit.ai)

53 points | by birdculture 2 hours ago

25 comments

tyleo 1 hour ago
I have a theory that vibe coding existed before AI.
I’ve worked with plenty of developers who are happy to slam null checks everywhere to solve NREs with no thought to why the object is null, should it even be null here, etc. There’s just a vibe that the null check works and solves the problem at hand.
I actually think a few folks like this can be valuable around the edges of software but whole systems built like this are a nightmare to work on. IMO AI vibe coding is an accelerant on this style of not knowing why something works but seeing what you want on the screen.
[-]
- jmkni 1 hour ago
  Blindly copying and pasting from StackOverflow until it kinda sorta works is basically vibe coding
  AI just automates that
  [-]
  - giantg2 1 hour ago
    Yeah, but you had to integrate it until it at least compiled, which kind of made people think about what they're pasting.
    I had a peer who suddenly started completing more stories for a month or two when our output was largely equal before. They got promoted over me. I reviewed one of their PRs... what a mess. They were supposed to implement caching. Their first attempt created the cache but never stored anything in it. Their next attempt stored the data in the cache, but never looked at the cache - always retrieving from the API. They deleted that PR to hide their incompetence and opened a new one that was finally right. He was just blindly using AI to crank out his stories.
    That team had something like 40% of capacity being spent on tech debt, rework, and bug fixes. The leadership wanted speed above all else. They even tried to fire me because they thought I was slow, even though I was doing as much or more work than my peers.
    [-]
    - skydhash 12 minutes ago
      > Yeah, but you had to integrate it until it at least compiled, which kind of made people think about what they're pasting
      That’s a very low bar. It’s easy to get a program to compile. And if it’s interpreted, you can coast for months with no crashes, just corrupted state.
      The issue is not that they can’t code, it’s that they can’t problem solve and can’t design.
- palmotea 1 hour ago
  > I actually think a few folks like this can be valuable around the edges of software but whole systems built like this are a nightmare to work on. IMO AI vibe coding is an accelerant on this style of not knowing why something works but seeing what you want on the screen.
  I would correct that: it's not an accelerant of "seeing what you want on the screen," it's an accelerant of "seeing something on the screen."
  [Hey guys, that's a non-LLM it's not X, it's Y!]
  Things like habitual, unthoughtful null-checks are a recipe for subtle data errors that are extremely hard to fix because they only get noticed far away (in time and space) from the actual root cause.
- zipy124 1 hour ago
  I agree but I'd draw a different comparison. That is vibe coding has accelerated the type of developers who relied on stack overflow to solve all their problems. The kind of dev who doesn't try to solve problems themselves. It has just accelerated this type of working, but is less reliable than before.
- whynotmaybe 1 hour ago
  "on error resume next" has been the first line of many vba scripts for years
  [-]
  - eterm 1 hour ago
    I caught claude trying to sneak in the equivalent to a CI script yesterday as I was wrangling how to run framework and dotnet tests next to each other without slowing down the framework tests horrendously.
    It tried to sneak in changing the CI build script to proceed to next step on failure.
    It's a bold approach, I'll give it that.
- jerf 1 hour ago
  One of my frustrations with AI, and one of the reasons I've settled into a tab-complete based usage of it for a lot of things, is precisely that the style of code it uses in the language I'm using puts out a lot of things I consider errors based on the "middle-of-the-road" code style that it has picked up from all the code it has ingested. For instance, I use a policy of "if you don't create invalid data, you won't have to deal with invalid data" [1], but I have to fight the AI on that all the time because it is a routine mistake programmers make and it makes the same mistake repeatedly. I have to fight the AI to properly create types [2] because it just wants to slam everything out as base strings and integers, and inline all manipulations on the spot (repeatedly, if necessary) rather than define methods... at all, let alone correctly use methods to maintain invariants. (I've seen it make methods on some occasions. I've never seen it correctly define invariants with methods.)
  Using tab complete gives me the chance to generate a few lines of a solution, then stop it, correct the architectural mistakes it is making, and then move on.
  To AI's credit, once corrected, it is reasonably good at using the correct approach. I would like to be able to prompt the tab completion better, and the IDEs could stand to feed the tab completion code more information from the LSP about available methods and their arguments and such, but that's a transient feature issue rather than a fundamental problem. Which is also a reason I fight the AI on this matter rather than just sitting back: In the end, AI benefits from well-organized code too. They are not infinite, they will never be infinite, and while code optimized for AI and code optimized for humans will probably never quite be the same, they are at least correlated enough that it's still worth fighting the AI tendency to spew code out that spends code quality without investing in it.
  [1]: Which is less trivial than it sounds and violated by programmers on a routine basis: https://jerf.org/iri/post/2025/fp_lessons_half_constructed_o...
  [2]: https://jerf.org/iri/post/2025/fp_lessons_types_as_assertion...
  [-]
  - tyleo 1 hour ago
    This is close to my approach. I love copilot intellisense at GitHub’s entry tier because I can accept/reject on the line level.
    I barely ever use AI code gen at the file level.
    Other uses I’ve gotten are:
    1. It’s a great replacement for search in many cases
    2. I have used it to fully generate bash functions and regexes. I think it’s useful here because the languages are dense and esoteric. So most of my time is remembering syntax. I don’t have it generate pipelines of scripts though.
- eurekin 1 hour ago
  "ship fast, break things"
cmiles8 1 hour ago
There are certainly some valid criticisms of vibe coding. That said, it’s not like the quality of most code was amazing before AI came along. In fact, most code is generally pretty terrible and took far too long for teams to ship.
Many folks would say that if shipping faster allows for faster iterations across an idea then the silly errors are worth it. I’ve certainly seen a sharp increase on execs calling BS on dev teams saying they need months to develop some basic thing.
[-]
- coliveira 1 hour ago
  When a team says that a "trivial" feature takes months to ship is not because of the complexity of the algorithm. It's because of the infrastructure and coordination work required for the feature to properly work. It is almost aways a failure of the technical infrastructure previously created in the company. An AI will solve the trivial aspects of the problem, not the real problem.
  [-]
  - dj_gitmo 12 minutes ago
    > It is almost aways a failure of the technical infrastructure previously created in the company. An AI will solve the trivial aspects of the problem, not the real problem.
    This is so true. Software that should be simple can become so gnarly because of bad infra. For example, our CI/CD team couldn't get updated versions of Python on the CI machines, and so suddenly we need to start using Docker for what should be a very simple software. That's just an example, but you get the idea, and it causes problems to compound over the years.
    You really want good people with sharp elbows laying the foundations. At one time I resented people like that, but now I have seen what happens when you don't have anyone like that making technical decisions.
- tyleo 1 hour ago
  I think you need a balance. I’ve seen products fall apart due to high error rate.
  I like to think of intentionalists—people who want to understand systems—and vibe coders—people who just want things to work on screen expediently.
  I think success requires a balance of both. The current problem I see with AI is that it accelerates the vibe part more than the intentionalist part and throws the system out of balance.
  [-]
  - cmiles8 1 hour ago
    Don’t disagree… I think it’s just applying a lot more pressure on dev teams to do things faster though. Devs tend to be expensive and expectations on productivity have increased dramatically.
    Nobody wants teams to ship crap, but also folks are increasingly questioning why a bit of final polishing takes so long.
- jmathai 1 hour ago
  More important than code quality is a joint understanding of the business problem and the technical solution for it. Today, that understanding is spread across multiple parties (eng, pm, etc).
  Code quality can be poor as long as someone understands the tradeoffs for why it's poor.
- WhyOhWhyQ 1 hour ago
  And you think people who don't understand the software telling people who do they're doing it wrong is an outright positive?
- Aurornis 54 minutes ago
  > I’ve certainly seen a sharp increase on execs calling BS on dev teams saying they need months to develop some basic thing.
  Some of the teams I worked with in the years right before AI coding went mainstream had become really terrible about this. They would spend months forming committees, writing documents, getting sign-offs and approvals, creating Gantt charts, and having recurring meetings for the simplest requests.
  Before I left, they were 3 months deep into meetings about setting up role based access control on a simple internal CRUD app with a couple thousand users. We needed about 2-3 roles. They were into pros and cons lists for every library and solution they found, with one of the front runners involving a lot of custom development for some reason.
  Yet the entire problem could have been solved with 3 Boolean columns in the database for the 3 different roles. Any developer could have done it in an afternoon, but they were stuck in a mindset of making a big production out of the process.
  I feel like LLMs are good at getting those easy solutions done. If the company really only needs a simple change, having an LLM break free from the molasses of devs who complicate everything is a breath of fresh air.
  On the other hand, if the company had an actual complicated need with numerous and changing roles over time, the simple Boolean column approach would have been a bad idea. Having people who know when to use each solution is the real key.
sailfast 5 minutes ago
How many more bugs does it produce if we use CodeRabbit to review PRs? I assume the number will be less? (Asking seriously and hopefully if the product will help or would’ve caught the bugs, while also pointing out the natural conclusion of the article is to purchase your service :) )
bogzz 1 hour ago
oh wow, an LLM-based company with an article that claims AI is oddly not as bad when it comes to generating gobbledegook as everyday empirical evidence should suggest
[-]
- jjmarr 1 hour ago
  Coderabbit is an LLM code review company so their incentives are the opposite. AI is terrible and you need more AI to review it.
  fwiw, I agree. LLM-powered code review is a lifesaver. I don't use Coderabbit but all of my PRs go through Copilot before another human looks at it. It's almost always right.
  [-]
  - elktown 54 minutes ago
    You're comment history suggests a pro-AI bias on par with AI companies. I don't understand it. It seems like critical thinking, nuance, and just basic caution have been turned off like a light-switch for far too many people.
    [-]
    - naasking 17 minutes ago
      Our industry never exhibited an abundance of caution, but if you have trouble understanding the value of AI here, consider that you are akin to an assembly language programmer in the 1970s or 80s who couldn't understand why people are so gung-ho about these compilers that just output worse code than they could write by hand. In retrospect, compilers only got better and better, and familiarity with programming languages and compilation toolchains became a valuable productivity skill and the market for assembly language programming either stagnated, or shrank.
      Doesn't it seem plausible to you that, whatever the ratio of bugs in AI-generated code today, that bug count is only going to really go down? Doesn't it then seem reasonable to say that programmers should start familiarizing themselves with these new tools, where the pitfalls are and how to avoid them?
      [-]
      - bogzz 5 minutes ago
        compilers aren't probabilistic models though
  - bpicolo 1 hour ago
    Their incentives are perfectly aligned - you’re making more bugs, surely you need some AI code review to help prevent that.
    It’s literally right at the end of their recommendations list in the article
    [-]
    - jjmarr 1 hour ago
      The original comment said:
      > an article that claims AI is oddly not as bad when it comes to generating gobbledegook
      Ironically, Coderabbit wants you to believe AI is worse at generating gobbledegook.
bodge5000 1 hour ago
As has already been said, we've been here before. I could ship significantly faster if I ignored any error handling or edge cases and basically just assumed the data would flow 100% how I expect it to all the time. Of course that is almost never the case, so I'd end up with more bugs.
I'd like to say that AI just takes this to an extreme but I'm not even sure about that, I think it could produce more code and more bugs than I could in the same amount of time but not significantly so if I just gave up on caring about anything
0x3f 1 hour ago
At best this would be 1.7x more _discovered_ bugs. The average PR (IMO) is hardly checked. AI could have 10x as many real issues on PRs, but we're just bad at reviewing PRs.
kristopherleads 9 minutes ago
I really think the answer here is human-in-the-loop. Too many people are thinking that AI is a full on drop-in replacement for engineers or managers, but ultimately having it be an augment is the magic. I work at FlowFuse so super biased, but that's something I've really enjoyed with our MCP and Expert Assistant - it's built to help you, not to replace you, so you can ask questions, get insights, etc. faster.
I suppose the tl;dr is if you're generating bugs in your flow and they make it to prod, it's not a tool problem - it's a cultural one.
yomismoaqui 1 hour ago
Agentic AI coding is a tool, you can use it wrong.
To give an example of how to use AI successfully check the following post:
https://friendlybit.com/python/writing-justhtml-with-coding-...
nerdjon 1 hour ago
Something I have been very curious about for some time now. We know the quality of the code is not very high and has a high likelihood of bugs.
But, assuming there are not bugs and the code ships. Has there been any study in resource usage creeping up and an impact of this on a whole system. The tests I have done with trying to build things with AI it always seems like there is zero efficiency unless you notice it and can put it in the right direction.
I have been curious about the impact this will have on general computing as more low quality code makes it into applications we use every day.
lherron 1 hour ago
They buried the lede. The last half of the article with ways to ground your dev environment to reduce the most common issues should be its own article. (However implementing the proper techniques somewhat obviates the need for CodeRabbit, so guess it’s understandable.)
windex 1 hour ago
I think devs have now split into two camps, the kvetchers and the shippers. It's a new tool, it's fresh. Things will work itself out over the next couple of years/months(?). The kvetching helps keep AI research focused on the problem which is good. Meanwhile continue to ship.
strangescript 1 hour ago
Do they consider code readability, formatting and variable naming as "errors" for the overall count. That seems dubious given where we are headed.
No one cares what a compiler or js minifier names its variables in its output.
Yes, if you don't believe we will get there ever, then this is totally valid complaint. You are also wrong about the future.
[-]
- oblio 1 hour ago
  The "future" is a really long time.
  I'll take the other side of your bet for the next 10 years but I won't take it for the next 30 years.
  In that spirit, I want my fusion reactor and my flying car.
  [-]
  - strangescript 1 hour ago
    If your outlook is 10 years then for sure, its valid. I am not sure how you come to that conclusion logically though. At the beginning of the year we had 0 code agents. Now we have dozens, some are basically free, (of various degrees of quality, sure).
    The last 2-3 months of releases have been an unprecedented whirlwind. Code writing will be solved by the end of 2026. Architecture, maybe not, but formatting issues isn't architecture.
    [-]
    - bopbopbop7 1 hour ago
      Code writing was solved in 1997 when Dreamweaver was released.
    - oblio 1 hour ago
      It's similar with every technology, there's a reason we have sigmoids.
      In 1960 they were planning nuclear powered cars and nuclear mortars.
phartenfeller 1 hour ago
Definitely. But AI can also generate unit tests.
You have to be careful by exactly telling the LLM what to test for and manually check the whole suite of tests. But overall it makese feel way more confident over increasing amounts of generated code. This of course decreases the productivity gains but is necessary in my opinion.
And linters help.
[-]
- stuaxo 1 hour ago
  It doesn't generate good tests by default though.
  I worked on a team where we had someone come in and help us improve our tests a lot.
  The default LLM generated tests are bit like the ones I wrote before that experience.
  [-]
  - dnautics 1 hour ago
    this is solvable by prompting and giving good examples?
- SketchySeaBeast 1 hour ago
  I've been using Claude sonnet 4.5 lately and I've noticed a tendency for it to create tests that prove themselves. Rather than calling the function we're hoping to test, it re-implements the code in the test and then tests it there. It's helpful, and it usually works very well if you have well defined inputs and outputs, I much prefer it over writing tests manually, but you have to be very careful.
exitb 1 hour ago
1.7x does not look that bad? If "AI code" is a broad classification that includes people using bad tools, or not being very skilful operators of said tools, then we can expect this number to meaningfully improve over time.
[-]
- speed_spread 1 hour ago
  Tell that to your customers. And tell them how much longer the bugs generated by AI will take to fix by humans. Or tell them that you'll never fix the bugs because you're too busy vibe coding new ones.
  [-]
  - exitb 1 hour ago
    I'm not saying bugs aren't a problem. I'm saying that if an emerging, fast improving tech is only slightly behind a human coder now, it seems conceivable that we're not that far off when they reach parity.
    [-]
    - naasking 14 minutes ago
      Exactly. I'm sure assembly language programmers from the 1980s could easily write code that ran 2x faster than the code produced by compilers of the time, but compilers only got better and eventually assembly language programming became a rare job, and humans can rarely outperform compilers on whole program compilation.
cgearhart 1 hour ago
So…great for prototyping (where velocity rules) but somewhere between mixed to negative for critical projects. Seems like this just puts some mildly quantitative numbers behind the consensus & trends I see emerging.
everdrive 1 hour ago
Sounds like what companies have been scrambling for this whole time. People just want to dump something out there. They don't really care if it works very well.
neallindsay 1 hour ago
1.7x more is not the same as 1.7x as many.
827a 17 minutes ago
Archetypes of prompts that I find AI to be quite good at handling:
1. "Write a couple lines or a function that is pretty much what four years ago I would have gone to npm to solve" (e.g. "find the md5 hash of this blob")
2. "Write a function that is highly represented and sampleable in the rest of the project" (e.g. "write a function to query all posts in the database by author_id" (which might include app-specific steps like typing it into a data model)).
3. "Make this isolated needle-in-a-haystack change" (e.g. "change the text of such-and-such tooltip to XYZ") (e.g. "there's a bug with uploading files where we aren't writing the size of the file to the database, fix that")
I've found that it can definitely do wider-ranging tasks than that (e.g. implement all API routes for this new data type per this description of the resource type and desired routes); and it can absolutely work. But, the two problems I run into:
1. Because I don't necessarily have a grokable handle on what it generated, I don't have a sense of what its missing and needed follow-on prompts to create. E.g.: I tell it to write an endpoint that allows users to upload files. A few days later, we realize we aren't MD5-hashing the files that got uploaded; there was a field in the database & resource type to store this value, but it didn't pick up on that, and I didn't prompt it to do this; so its not unreasonable. But oftentimes when I'm writing routes by hand, I'm spending so much time in that function body that follow-on requirements naturally occur to me ("Oh that's right, we talked about needing this route available to both of these two permissions, crap let me implement that"). With AI, it finishes so fast that my brain doesn't have time to remember all the requirements.
2. We've tried to mitigate this by pushing more development into the specs and requirements up-front. This is really hard to get humans to do, first of all. But more critically: None of our data supports the hypothesis that this has shortened cycle times. It mostly just trades writing typescript for reading & writing English (which few engineers I've ever worked with are actually all that good at). The engineers still end up needing long cycle times back-and-forth with the AI to get correct results, and long cycle times in review.
3. The more code you ask it to generate, the more vibeslop you get. Deeply-nested try/catch statements with multiple levels of error handling & throwing. Comments everywhere. Reimplementing the same helper functions five times. These things, we have found, raise the cost and lower the reliability & performance of future prompting, and quickly morph parts of the system into a no-man's-land (literally) where only AIs can really make any change; and every change even by the AIs get harder and harder to ship. Our reported customer issues on these parts of the app are significantly higher than others, and our ability to triage these issues is also impacted because we no longer have SMEs that can just brain-triage issues in our CS channels; everything now requires a full engineering cycle, with AI involvement, to solve.
Our engineers run the spectrum of "never wanted to touch AI, never did" to "earnestly trying to make it work". Ultimately I think the consensus position is: Its a tool that is nice to have in the toolbox, but any assertion that its going to fundamentally change the profile of work our engineers do, or even seriously impact hiring over the long-term, is outside the realm of foreseeable possibility. The models and surrounding tooling are not improving fast enough.
brainless 1 hour ago
I use LLMs to generate almost all my code. Currently at 40K lines of Rust, backend and a desktop app. I am a senior engineer with almost all my tech career (16 years) in startups.
Coding with agents has forced me to generate more tests than we do in most startups, think through more things than we get the time to do in most startups, create more granular tasks and maintain CI/CD (my pipelines are failing and I need to fix them urgently).
These are all good things.
I have started thinking through my patterns to generate unit tests. I was generating mostly integration or end to end tests before. I started using helping functions in API handlers and have unit tests for helpers, bypassing the API level arguments (so not API mocking or framework test to deal with). I started breaking tasks down into smaller units, so I can pass on to a cheaper model.
There are a few patterns in my prompts but nothing that feels out of place. I do not use agents files and no MCPs. All sources here: https://github.com/brainless/nocodo (the product is itself going through a pivot so there is that).
[-]
- WhyOhWhyQ 1 hour ago
  I see that your release is GPL 3.0. Are you worried about LLM's effectively laundering your source code a year from now? I've become hesitant about releasing source code since LLM's, though I do use Claude heavily while programming to make suggestions and look for issues etc.., but I'd be interested in hearing your perspective.
naasking 23 minutes ago
It's totally plausible that AI codegen produces more bugs. It still seems important to familiarize yourself with these tools now though, because that bug count is only ever going to go down. These tools are here to stay.
SideburnsOfDoom 1 hour ago
> ship faster but it produces more bugs
This is ... not actually faster.
mmastrac 1 hour ago
In the pre-AI days I worked on a system like this that was constructed by a high-profile consulting team but continuously lost data and failed to meet even the basic standards.
I think I've seen so much rush-shipped slop (before and after) that I'm really anxiously waiting for this bubble to pop.
I have yet to be convinced that AI tooling can provide more than 20% or so speedup for an expert developer working in a modern stack/language.
geldedus 21 minutes ago
Not for me.
bgwalter 1 hour ago
The report is from cortex.io, based on only 50 self-selected responses from "engineering leaders" as well as from idpcon.com, hosted by cortex.
All websites involved are vibe coded garbage that use 100% CPU in Firefox.