Cybersecurity looks like proof of work now

(dbreunig.com)

138 points | by dbreunig 1 day ago

34 comments

  • dataviz1000 9 minutes ago
    > to harden a system you need to spend more tokens discovering exploits than attackers will spend exploiting them.

    I, for the NFL front offices, created a script that exposed an API to fully automate Ticketmaster through the front end so that the NFL could post tickets on all secondary markets and dynamic price the tickets so if rain on a Sunday was expected they could charge less. Ticketmaster was slow to develop an API. Ticketmaster couldn't provide us permission without first developing the API first for legal reasons but told me they would do their best to stop me.

    They switched over to PerimeterX which took me 3 days to get past.

    Last week someone posted an article here about ChatGPT using Cloudflare Turnstile. [0] First, the article made some mistakes how it works. Second, I used the [AI company product] and the Chrome DevTools Protocol (CDP) to completely rewrite all the scripts intercepting them before they were evaluated -- the same way I was able to figure out PerimeterX in 3 days -- and then recursively solve controlling all the finger printing so that it controls the profile. Then it created an API proxy to expose ChatGPT for free. It required some coaching about the technique but it did most of the work in 3 hours.

    These companies are spending 10s of millions of dollars on these products and considering what OpenAI is boasting about security, they are worthless.

    [0] https://news.ycombinator.com/item?id=47566865

  • somesortofthing 2 hours ago
    There's still the question of access to the codebase. By all accounts, the best LLM cyber scanning approaches are really primitive - it's just a bash script that goes through every single file in the codebase and, for each one and runs a "find the vulns here" prompt. The attacker usually has even less access than this - in the beginning, they have network tools, an undocumented API, and maybe some binaries.

    You can do a lot better efficiency-wise if you control the source end-to-end though - you already group logically related changes into PRs, so you can save on scanning by asking the LLM to only look over the files you've changed. If you're touching security-relevant code, you can ask it for more per-file effort than the attacker might put into their own scanning. You can even do the big bulk scans an attacker might on a fixed schedule - each attacker has to run their own scan while you only need to run your one scan to find everything they would have. There's a massive cost asymmetry between the "hardening" phase for the defender and the "discovering exploits" phase for the attacker.

    Exploitability also isn't binary: even if the attacker is better-resourced than you, they need to find a whole chain of exploits in your system, while you only need to break the weakest link in that chain.

    If you boil security down to just a contest of who can burn more tokens, defenders get efficiency advantages only the best-resourced attackers can overcome. On net, public access to mythos-tier models will make software more secure.

    • anitil 1 hour ago
      On that latest episode of 'Security Cryptography Whatever' [0] they mention that the time spent on improving the harness (at the moment) end up being outperformed by the strategy of "wait for the next model". I doubt that will continue, but it broke my intuition about how to improve them

      [0] https://securitycryptographywhatever.com/2026/03/25/ai-bug-f...

      • conception 18 minutes ago
        This is basically how you should treat all AI dev. Working around AI model limits for something that will take 3-6 months of work has very little ROI compared to building what works today and just waiting and building what works tomorrow tomorrow.
    • btown 1 hour ago
      The problem, though, is that this turns "one of our developers was hit by a supply chain attack that never hit prod, we wiped their computer and rotated keys, and it's not like we're a big target for the attacker to make much use of anything they exfiltrated..." into "now our entire source code has been exfiltrated and, even with rudimentary line-by-line scanning, will be automatically audited for privilege escalation opportunities within hours."

      Taken to an extreme, the end result is a dark forest. I don't like what that means for entrepreneurship generally.

      • linkregister 1 hour ago
        This is a great example of vulnerability chains that can be broken by vulnerability scanning by even cheaper open source models. The outcome of a developer getting pwned doesn't have to lead to total catastrophe. Having trivial privilege escalations closed off means an attacker will need to be noisy and set off commodity alerting. The will of the company to implement fixes for the 100 Github dependabot alerts on their code base is all that blocks these entrepreneurs.

        It does mean that the hoped-for 10x productivity increase from engineers using LLMs is eroded by the increased need for extra time for security.

        This take is not theoretical. I am working on this effort currently.

    • Retr0id 2 hours ago
      Tokens can also be burnt on decompilation.
      • tptacek 2 hours ago
        Yes, and it apparently burns lots of tokens. But what I've heard is that the outcomes are drastically less expensive than hand-reversing was, when you account for labor costs.
        • jeffmcjunkin 1 hour ago
          Can confirm. Matching decompilation in particular (where you match the compiler along with your guess at source, compile, then compare assembly, repeating if it doesn't match) is very token-intensive, but it's now very viable: https://news.ycombinator.com/item?id=46080498

          Of course LLMs see a lot more source-assembly pairs than even skilled reverse engineers, so this makes sense. Any area where you can get unlimited training data is one we expect to see top-tier performance from LLMs.

          (also, hi Thomas!)

          • stackghost 47 minutes ago
            My own experience has been that "ghidra -> ask LLM to reason about ghidra decompilation" is very effective on all but the most highly obfuscated binaries.

            Burning tokens by asking the LLM to compile, disassemble, compare assembly, recompile, repeat seems very wasteful and inefficient to me.

      • somesortofthing 2 hours ago
        Another asymmetric advantage for defenders - attackers need to burn tokens to form incomplete, outdated, and partially wrong pictures of the codebase while the defender gets the whole latest version plus git history plus documentation plus organizational memory plus original authors' cooperation for free.
    • kelvinjps10 29 minutes ago
      what about open source software?
  • j2kun 2 hours ago
    The article heavily quotes the "AI Security Institute" as a third-party analysis. It was the first I heard of them, so I looked up their about page, and it appears to be primarily people from the AI industry (former Deepmind/OpenAI staff, etc.), with no folks from the security industry mentioned. So while the security landscape is clearly evolving (cf. also Big Sleep and Project Zero), the conclusion of "to harden a system we need to spend more tokens" sounds like yet more AI boosting from a different angle. It raises the question of why no other alternatives (like formal verification) are mentioned in the article or the AISI report.

    I wouldn't be surprised if NVIDIA picked up this talking point to sell more GPUs.

    • ButlerianJihad 1 minute ago
      The "S" in "Artificial Intelligence" stands for "Security"!
    • tptacek 2 hours ago
      I would be interested in which notable security researchers you can find to take the other side of this argument. I don't know anything about the "AI Security Institute", but they're saying something broadly mirrored by security researchers. From what I can see, the "debate" in the actual practitioner community is whether frontier models are merely as big a deal as fuzzing was, or something signficantly bigger. Fuzzing was a profound shift in vulnerability research.

      (Fan of your writing, btw.)

      • j2kun 10 minutes ago
        It's less that I think they would take the other side of the argument, than that they would lend some credence to the content of the analysis. For example, I would not particularly trust a bunch of AI researchers to come up with a representative set of CTF tasks, which seems to be the basis of this analysis.
      • VorpalWay 1 hour ago
        > but they're saying something broadly mirrored by security researchers.

        You might well be right, it is not an area I know much of or work in. But I'm a fan of reliable sources for claims. It is far to easy to make general statements on the internet that appear authorative.

  • nostrademons 2 hours ago
    Relevant Tony Hoare quote: “There are two approaches to software design: make it so simple there are obviously no deficiencies, or make it so complex there are no obvious deficiencies”.
    • tekacs 2 hours ago
      I think this is so relevant, and thank you for posting this.

      Of course it's trivially NOT true that you can defend against all exploits by making your system sufficiently compact and clean, but you can certainly have a big impact on the exploitable surface area.

      I think it's a bit bizarre that it's implicitly assumed that all codebases are broken enough, that if you were to attack them sufficiently, you'll eventually find endlessly more issues.

      Another analogy here is to fuzzing. A fuzzer can walk through all sorts of states of a program, but when it hits a password, it can't really push past that because it needs to search a space that is impossibly huge.

      It's all well and good to try to exploit a program, but (as an example) if that program _robustly and very simply_ (the hard part!) says... that it only accepts messages from the network that are signed before it does ANYTHING else, you're going to have a hard time getting it to accept unsigned messages.

      Admittedly, a lot of today's surfaces and software were built in a world where you could get away with a lot more laziness compared to this. But I could imagine, for example, a state of the world in which we're much more intentional about what we accept and even bring _into_ our threat environment. Similarly to the shift from network to endpoint security. There are for sure, uh, million systems right now with a threat model wildly larger than it needs to be.

  • jzelinskie 2 hours ago
    Security has always been a game of just how much money your adversary is willing to commit. The conclusions drawn in lots of these articles are just already well understood systems design concepts, but for some reason people are acting like they are novel or that LLMs have changed anything besides the price.

    For example from this article:

    > Karpathy: Classical software engineering would have you believe that dependencies are good (we’re building pyramids from bricks), but imo this has to be re-evaluated, and it’s why I’ve been so growingly averse to them, preferring to use LLMs to “yoink” functionality when it’s simple enough and possible.

    Anyone who's heard of "leftpad" or is a Go programmer ("A little copying is better than a little dependency" is literally a "Go Proverb") knows this.

    Another recent set of posts to HN had a company close-sourcing their code for security, but "security through obscurity" has been a well understand fallacy in open source circles for decades.

  • snowwrestler 2 hours ago
    It looks like proof of work because:

    > Worryingly, none of the models given a 100M budget showed signs of diminishing returns. “Models continue making progress with increased token budgets across the token budgets tested,” AISI notes.

    So, the author infers a durable direct correlation between token spend and attack success. Thus you will need to spend more tokens than your attackers to find your vulnerabilities first.

    However it is worth noting that this study was of a 32-step network intrusion, which only one model (Mythos) even was able to complete at all. That’s an incredibly complex task. Is the same true for pointing Mythos at a relatively simple single code library? My intuition is that there is probably a point of diminishing returns, which is closer for simpler tasks.

    In this world, popular open source projects will probably see higher aggregate token spend by both defenders and attackers. And thus they might approach the point of diminishing returns faster. If there is one.

    • SyneRyder 1 minute ago
      Worth pointing out that as impressive as the 32-step network takeover is, Mythos wasn't able to achieve it on every attempt, and the network itself did not have the usual defence systems.

      I wouldn't use those as excuses to dismiss AI though. Even if this model doesn't break your defences, give it 3 months and see where the next model lands.

    • janalsncm 1 hour ago
      Knowing nothing about cybersecurity, maybe the question is whether it costs more tokens to go from 32 steps to 33, or to complete the 33rd step? If it’s cheaper to add steps, or if defense is uncorrelated but offense becomes correlated, it’s not as bad as the article makes it seem.

      For instance, if failing any step locks you out, your probability of success is p^N, which means it’s functionally impossible with enough layers.

  • chromacity 2 hours ago
    I discussed this in more detail in one of my earlier comments, but I think the article commits a category error. In commercial settings, most of day-to-day infosec work (or spending) has very little to do with looking for vulnerabilities in code.

    In fact, security programs built on the idea that you can find and patch every security hole in your codebase were basically busted long before LLMs.

    • Muromec 1 hour ago
      Commercial infosec is deleting firefox from develop machines, because it's not secure and explaining to muggles why they shouldn't commit secret material to the code repository. That and blocking my ssh access to home router of course.
      • chromacity 51 minutes ago
        I mean, often, yep. The real reason why they are unhappy with you having an unsupported browser is simply that it's much harder to reason about or enforce policies across bespoke environments. And in an enterprise of a sufficient scale, the probability that one of your employees is making a mistake today is basically 1. Someone is installing an infostealer browser extension, someone is typing in their password on a phishing site, etc. So, you really want to keep browsers on a tight leash and have robust monitoring and reporting around that.

        Yeah, it sucks. But you're getting paid, among other things, to put up with some amount of corporate suckiness.

  • _pdp_ 31 minutes ago
    All of the recent news read like something that could happen in a cyberpunk novel - AIs that defend vs AIs that do the attacks.

    I think were are already here. I wrote something about this, if you are interested: https://go.cbk.ai/security-agents-need-a-thinner-harness

  • codazoda 30 minutes ago
    > Classical software engineering would have you believe that dependencies are good (we’re building pyramids from bricks)

    Would it? I’m old school but I’ve never trusted these massive dependency chains.

    That’s a nit.

    We’re going to have to write more secure software, not just spend more.

  • tptacek 2 hours ago
    It looks like it, but it isn't. It's the work itself that's valued in software security, not the amount of it you managed to do. The economics are fundamentally different.

    Put more simply: to keep your system secure, you need to be fixing vulnerabilities faster than they're being discovered. The token count is irrelevant.

    Moreover: this shift is happening because the automated work is outpacing humans for the same outcome. If you could get the same results by hand, they'd count! A sev:crit is a sev:crit is a sev:crit.

  • BloondAndDoom 41 minutes ago
    Security always had “defender’s dilemma” (an attacker needs to find one thing, but defender needs to fix everything) problem, nothing is new in terms of AI’s impact just application of different resources and units.
  • jerf 2 hours ago
    I've said for decades that, in principle, cybersecurity is advantage defender. The defender has to leave a hole. The attackers have to find it. We just live in a world with so many holes that dedicated attackers rarely end up bottlenecked on finding holes, so in practice it ends up advantage attacker.

    There is at least a possibility that a code base can be secured by a (practically) finite number of tokens until there is no more holes in it, for reasonable amounts of money.

    This also reminds me of what I wrote here: https://jerf.org/iri/post/2026/what_value_code_in_ai_era/ There's still value in code tested by the real world, and in an era of "free code" that may become even more true than it is now, rather than the initially-intuitive less valuable. There is no amount of testing you can do that will be equivalent to being in the real world, AI-empowered attackers and all.

    • mapontosevenths 2 hours ago
      >in principle, cybersecurity is advantage defender

      I disagree.

      The defender must be right every single time. The attacker only has to get lucky and thanks to scale they can do that every day all day in most large organizations.

      • janalsncm 1 hour ago
        My understanding of defense in depth is that it is a hedge against this. By using multiple uncorrelated layers (e.g. the security guard shouldn’t get sleepier when the bank vault is unlocked) you are transforming a problem of “the defender has to get it right every time” into “the attacker has to get through each of the layers at the same time”.
      • coldtea 1 hour ago
        Not to mention an attacker motivated by financial gain doesn't even need a particular targer defender. One/any found available will do.
      • traderj0e 2 hours ago
        Well, the attacker has something to lose too. It's not like the defender has to be perfect or else attacks will just happen, it takes time/money to invest in attacking.
      • tptacek 1 hour ago
        The attacker and defender have different constant factors, and, up until very recently, constant factors dominated the analysis.
    • traderj0e 2 hours ago
      I agree for the type of attacks the article focuses on, but DDoS and social engineering seem like advantage attacker.
  • c1ccccc1 1 hour ago
    If you have a limited budget of tokens as a defender, maybe the best thing to spend them on is not red teaming, but formalizing proofs of your code's security. Then the number of tokens required roughly scales with the amount and complexity of your code, instead of scaling with the number of tokens an attacker is willing to spend.

    (It's true that formalization can still have bugs in the definition of "secure" and doesn't work for everything, which means defenders will still probably have to allocate some of their token budget to red teaming.)

    • pxc 32 minutes ago
      > If you have a limited budget of tokens as a defender, maybe the best thing to spend them on is not red teaming, but formalizing proofs of your code's security.

      You can only do this if you have a very clear sense of what your code should be doing. In most codebases I've ever worked with, frankly, no one has any idea.

      Red teaming as an approach always has value, but one important characteristic it has is that you can apply red teaming without demanding any changes at all to your code standards, or engineering culture (and maybe even your development processes).

      Most companies are working with a horrific sprawl of code, much of it legacy with little ownership. Red teaming, like buying tools and pushing for high coverage, is an attractive strategy to business leaders because it doesn't require them to tackle the hardest problems (development priorities, expertise, institutional knowledge, talent, retention) that factor into application security.

      Formal verification is unfortunately hard in the ways that companies who want to think of security as a simple resource allocation problem most likely can't really manage.

      I would love to work on projects/with teams that see formal verification as part of their overall correctness and security strategy. And maybe doing things right can be cheaper in the long run, including in terms of token burn. But I'm not sure this strategy will be applicable all that generally; some teams will never get there.

  • protocolture 1 hour ago
    >You don’t get points for being clever. You win by paying more.

    Really depends how consistently the LLMs are putting new novel vulnerabilities back in your production code for the other LLMs to discover.

  • DerSaidin 2 hours ago
    > Cybersecurity looks like proof of work now

    Imo, cybersecurity looks like formally verified systems now.

    You can't spend more tokens to find vulnerabilities if there are no vulnerabilities.

    • deepsun 44 minutes ago
      Every formal verification depends highly on requirements. It's pretty easy to make a mistake in defining the task itself. In the end, you'd want to verify system behavior in real world, and it's impossible to completely define real world. You always make some assumptions/models to reason within, and it impossible to verify the assumptions are correct.
    • drdrey 1 hour ago
      good luck formally verifying everything
  • wheelerwj 40 minutes ago
    Cybersecurity has always been proof of work. Fuck, most of software development is proof of work by that logic. Thats why many attacks originate from countries were the cost of living is a fraction of the COL in the United States. They can throw more people at the problem because its cheaper to do so.

    But I don't really get the hype, we can fix all the vulnerabilities in the world but people are still going to pick up parking-lot-USBs and enter their credentials into phishing sites.

  • smj-edison 2 hours ago
    I'm curious to see if formally verified software will get more popular. I'm somewhat doubtful, since getting programmers to learn formally math is hard (rightfully so, but still sad). But, if LLMs could take over the drudgery of writing proofs in a lot of the cases, there might be something there.
    • gjadi 1 hour ago
      How is getting proof one doesn't understand going to help build safer system?

      I want to believe formal methods can help, not because one doesn't have to think about it, but because the time freed from writing code can be spent on thinking on systems, architecture and proofs.

      • smj-edison 1 hour ago
        That's a fair question, and looking and my post I now realize I have two independent points:

        1. A proof mindset is really hard to learn.

        2. Writing theorem definitions can be hard, but writing a proof can be even harder. So, if you could write just the definitions, and let an LLM handle all the tactics and steps, you could use more advanced techniques than just a SAT solver.

        So I guess LLMs only marginally help with (1), but they could potentially be a big help for (2), especially with more tedious steps. It would also allow one to use first order logic, and not just propositional logic (or dependant types if you're into that).

    • stringfood 1 hour ago
      I am so exhausted with being asked to learn difficult and frankly confusing topics - the fact that it is so hard and so humbling to learn these topics is exactly why everyone is so happy to let AI think about formal programming and I can focus on getting Jersey Shore season 2 loaded into my Plex server. It's the one where Pauly D breaks up with Shelli
  • nickdothutton 2 hours ago
    Although not an escape from the "who can spend the most on tokens" arms race, there is also the possibility to make reverse engineering and executable analysis more difficult. This increases the attacker's token spend if nothing else. I wonder if dev teams will take an interest.

    Better to write good, high-quality, properly architected and tested software in the first place of course.

    Edited for typo.

  • singpolyma3 1 hour ago
    If you run this long enough presumably it will find every exploit and you patch them all and run it again to find exploits in your patches until there simply... Are no exploits?
  • int32_64 2 hours ago
    By using these services, you're also exfiltrating your entire codebase to them, so you have to continuously use the best cyber capabilities providers offer in case a data breach allows somebody to obtain your codebase and an attacker uses a better vulnerability detector than what you were using.
  • zachdotai 1 hour ago
    we did a lot of thinking around this topic. and distilled it into a new way to dynamically evaluate the security posture of an AI system (which can apply for any system for that matter). we wrote some thoughts on this here: https://fabraix.com/blog/adversarial-cost-to-exploit
  • umvi 1 hour ago
    > You don’t get points for being clever. You win by paying more.

    And yet... Wireguard was written by one guy while OpenVPN is written by a big team. One code base is orders of magnitude bigger than the other. Which should I bet LLMs will find more cybersecurity problems with? My vote is on OpenVPN despite it being the less clever and "more money thrown at" solution.

    So yes, I do think you get points for being clever, assuming you are competent. If you are clever enough to build a solution that's much smaller/simpler than your competition, you can also get away with spending less on cybersecurity audits (be they LLM tokens or not).

  • jp0001 2 hours ago
    I'm starting to think that Opus and Mythos are the same model (or collection of models) whereas Mythos has better backend workflows than Opus 4.6. I have not used Mythos, but at work I have a 5 figure monthly token budget to find vulnerabilities in closed-source code. I'm interested in mythos and will use it when it's available, but for now I'm trying to reverse engineer how I can get the same output with Opus 4.6 and the answer to me is more tokens.
  • devmor 59 minutes ago
    > to harden a system you need to spend more tokens discovering exploits than attackers will spend exploiting them.

    If we take this at face value, it's not that different than how a great deal of executive teams believe cybersecurity has worked up to today. "If we spend more on our engineering and infosec teams, we are less likely to get compromised".

    The only big difference I can see is timescale. If LLMs can find vulnerabilities and exploit them this easily (and I do take that with a grain of salt, because benchmarks are benchmarks), then you may lose your ass in minutes instead of after one dedicated cyber-explorer's monster energy fueled, 7-week traversal of your infrastructure.

    I am still far more concerned about social engineering than LLMs finding and exploiting secret back doors in most software.

  • saidnooneever 2 hours ago
    people biting into what companies say about their own products had always been the frustration in cyber. now more than ever.

    nothing is better or worse, basically as its always been.

    if you think otherwise, stop ignoring the past.

    • saidnooneever 2 hours ago
      thanks for the down vote. i am not cynical though. how many billion dollar companies claim 109% detection rates and bullet proof security. i worked at one of these companies as they bought another and suffered through trying to make broken promises a reality. (they did it partly, an epic achievement. amazing engineers.) its a broken game.

      you are addicted to dopamine. think carefully and take good care of yourself

  • Mistletoe 1 hour ago
    Everything eventually turns into Bitcoin. That’s what I plan to see in the future years and decades.
  • sdevonoes 1 hour ago
    Please. Are we going to rely now in Anthropic et al to secure our systems? Wasn’t enough to rely on them to build our systems? What’s next? To rely on them for monitoring and observability? What else? Design and mockups?
    • a34729t 51 minutes ago
      If we rely on Anthropic to write our system, it's only natural to rely on them to secure it. Seriously, at the big tech companies were rapidly approaching all code being written by LLMs... so at least we have the close the security chain quickly.
    • tptacek 1 hour ago
      The nice thing about vulnerability research is that you either have a vulnerability or you don't. There's no such thing as a "slop vulnerability".
      • lopityuity 1 hour ago
        "We burned 10 trillion tokens and the Amazon rain forest is now a desert, but our stochastic parrot discovered that if a user types '$-1dffj39fff%FFj$@#lfjf' 10 thousand times into a terminal that you can get privilege escalation on a Linux kernel from 10 years ago. The best part? We avoided paying anyone outside of the oligarchy for the discovery of this vulnerability."

        In your embarrassingly reductive binary vulnerability state worldview? Have.

  • Mistletoe 1 hour ago
    Everything eventually turns into Bitcoin. That’s what I plan to see in the future years and decades. Satoshi just saw it first.
  • cmrx64 2 hours ago
    Dijkstra would shake his head at our folly.
  • heliumtera 1 hour ago
    In other news, token seller says tokens should be bought
  • jaspanglia 54 minutes ago
    [dead]
  • Anoian 38 minutes ago
    [dead]