Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable

(techcrunch.com)

301 points | by speckx 11 hours ago

66 comments

simonw 41 minutes ago
News just broke in this Wired story: "Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude" https://www.wired.com/story/anthropic-responds-to-backlash-o...
> “We’re changing Fable 5’s safeguards for frontier LLM development to make them visible.” Anthropic said in a statement to WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”
Sounds like the widespread condemnation worked.
[-]
- h6d_100c 21 minutes ago
  To late. I canceled my Max subscription. The idea they would even do this is so destroyed any remaining trust. Why would I pay them 1000s of dollars in extra usage per month for something they could still be doing behind the scenes? Any errors previously chalked up to thinking effort or other backend changes? Maybe it was intentional prompt injection the entire time.
- hedgehog 32 minutes ago
  The "tradeoff" warning implies they stand by their thinking and don't think there was anything qualitatively wrong with it which, if nothing else, is helpful so potential customers can know how they think. I think the core lesson is if you want reliable infrastructure to build into an application you should use a different provider.
- rafram 6 minutes ago
  The mitigations against distillation are separate, and not what the OP is about at all.
daedrdev 6 hours ago
The strangest part is that it won't just reject ML research, which I can understand, it will sabotage it silently by using a worse model without revealing it is doing so.
It's just an insane level of deception and trust destruction for a company that at most is like 1 year ahead of its competition.
Edit; to be clear they tell you when they degrade it for cybersecurity and bio
[-]
- _boffin_ 4 hours ago
  The thing that I keep thinking about is the accounting / charging when it downgrades automatically.
  Do they adjust the price of the api request so that only the tokens that were utilized by fable get charged at that price and the remaining tokens that the cheaper / nerfed (fable) model utilizes get charged at that price?
  If the answer is no, could that be construed as fraud?
  [-]
  - CGamesPlay 3 hours ago
    The announcement elucidated this, and it's IMO worse than this. They don't downgrade to a cheaper model ([edit] for certain classes of offense they suspect you of). They sabotage the model's outputs in other, undisclosed, ways (specifically, "prompt modification, steering vectors, or parameter-efficient fine-tuning"). So, for example, they might load in a steering vector that just forgets the API to PyTorch. But it isn't just "we redirected you to a cheaper model!"
    [-]
    - buildbot 2 hours ago
      It honestly explains so many issues I have been having, as I used it primarily for ML research (on my personal account, doing things not related to my job I should note). It would literally typo package names and spend huge amounts of time failing to setup simple environments…then do stupid things like set the learning rate to 1e-7, and use the eval set as training data.
      [-]
      - sterlind 22 minutes ago
        just imagine if they made it sneaky. get things just subtly wrong enough that your training runs just never quite go as well as you think they should.
      - peyton 1 hour ago
        That’s insane. I hope they fix it.
  - tfirst 4 hours ago
    Their goal is to downgrade people who are violating their TOS, so I think they'd have some argument there. I have no idea how they'll deal with inevitable false positives, especially given how oversensitive most of the other triggers are.
    [-]
    - ZetsuBouKyo 1 hour ago
      It’s just impossible.
      Look at real-life stuff like laws, company policies, or school rules. Humans have to enforce them, and we constantly see crazy cases in the news. There’s no way simple rules can ever make speech completely 'safe.' I can't prove it with math or logic yet, but I have a feeling that it’ll never happen. Even humans can't do it.
      We can run a simple thought experiment here. Say Case A violates rule B, so we add rule C. Then Case D violates rule B but follows rule C, so we add an exception... and it just goes on and on like that forever. It never ends. In the end, you just get a massive pile of rules that makes it impossible to get anything done.
      Ultimately, we will have to face the truth that knowledge is dangerous.
      Giving knowledge directly to people who cannot actually understand it and allowing them to just use it blindly can be extremely unsafe.
      To use a real-world analogy, the problem we are facing with weak AI right now is just like the debate over gun legalization. Do we want to risk the abuse of guns or knowledge just to protect the freedom to own them?
      [-]
      - AnthonyMouse 2 minutes ago
        > I can't prove it with math or logic yet, but I have a feeling that it’ll never happen.
        It's not really that hard to actually prove it with math.
        It's a computer, so to produce the boolean result (safe or unsafe) there has to be a mathematical formula. This formula will inherently be extremely complex, but even a very simple formula has a huge problem. Suppose "unsafe" is true if X - Y > 0. Make X and Y themselves as simple or complicated as you like but even in the simplest version it's already impossible to calculate unless the model has perfect information.
        You can't calculate "X - Y" if you don't know the value of X. And it's indisputable that there is information it doesn't have. Case in point, telling you about a vulnerability in some piece of code is safe (and indeed not telling you is unsafe) if you're the developer and you want to patch it, but the opposite if you're the attacker and want to exploit it. The model does not know which one you are, therefore it cannot make the correct determination any more than it can solve one equation with two unknowns.
    - dannyw 3 hours ago
      The challenge is the examples they’ve mentioned (distributed training infra? ML acceleration techniques?) go beyond what’s prohibited by their ToS and is like a catch net.
      I would wager the majority of ML and data science work in the world aren’t frontier LLM development.
      [-]
      - weitendorf 3 hours ago
        Yes, this is the problem. They are business interests of Anthropic and have nothing to do with “safety”
        [-]
        sudoshred 3 hours ago
        Safety of their IPO
      - MagicMoonlight 3 hours ago
        [dead]
    - loeg 3 hours ago
      If it's a violation of ToS, just reject instead of silently downgrading.
      [-]
      - SR2Z 3 hours ago
        But then someone would figure out some prompts that don't trigger this, and Anthropic wouldn't be able to try and disadvantage competitors.
        [-]
        BoorishBears 1 hour ago
        Except they openly reject many many other classes of prompts, including extremely high stakes CBRN.
        It's only the direction that has direct potential business impact they've decided to sabotage instead of reject.
      - kraakf06 3 hours ago
        [dead]
    - thefounder 53 minutes ago
      They will give you s*t output, that’s how they deal with it. And say that less than 1% of the requests were affected. Think of this like a kind of shadow ban while you still pay top $.
    - jchw 2 hours ago
      You know, I'm not saying I don't understand what they are doing from a business perspective, but I'm just saying: DeepSeek V4 doesn't silently sabotage you because it thinks you are trying to violate a ToS. Anthropic's clawing back a bit of a moat perhaps, with Fable being an actual improvement of sorts, but now with torching user trust they are really banking on open weight models not catching up to where they are now. I wonder if they have a good reason to believe that they won't, or are hoping for something entirely different to save them.
      (P.S. Yes of course I know about model censorship, a different problem, but all of the models are censored to some degree. It happens to be less of a problem for open weight models anyhow, but I figured I'd just preempt this since it's inevitable.)
      I actually kinda like DSv4 over Opus 4.7 for some tasks, although I have not figured out what the deciding factor is. (Opus 4.8 so far has not worked very well for me at all, no idea why.)
      [-]
      - literalAardvark 1 hour ago
        Anthropic seems to me to have consistently been the baddie despite everyone's posturing.
        Not that I expect better from openai but at least they're not pretending to be good.
  - garciasn 4 hours ago
    It royally pissed me off today by just continuing with credits without stopping to ask me if I was ok with it.
    Ran up $30 in extra charges while it was just flashing on the screen that it was doing that after I walked away to do something while it was humming along.
    It has always just told me I ran out of usage and had to wait before. Now? You’re just gonna pay extra because you left it unattended as you’ve done for the last year of use.
    [-]
    - weird-eye-issue 3 hours ago
      You've already explicitly enabled extra usage in your account settings though, it is not on by default
      [-]
      - garciasn 3 hours ago
        Unknowingly. Is that set at the org level? Because I never set it and never had it do that before.
        [-]
        throwaway7783 1 hour ago
        It is at the org level
    - MillionOClock 4 hours ago
      Do you have Usage credits turned on in your settings?
    - blurbleblurble 2 hours ago
      [dead]
  - robrenaud 4 hours ago
    They use a lightweight adapter to silently degrade the performance. Usually these adaptors are made to improve the performance for a given domain/task.
- throwawayffffas 4 hours ago
  Can you imagine if AMD or Intel throttled your cpu if it detected you were working on "cybersecurity" or if you were designing a cpu?
  [-]
  - h6d_100c 2 hours ago
    Or if GPU companies detected you were trying to train a model and injected intentional numerical errors.
    [-]
    - gzalo 40 minutes ago
      Nvidia already did something similar with Lite Hash Rate (LHR), limiting performance on purpose just when running mining apps...
      [-]
      - h6d_100c 20 minutes ago
        Well they did tell everyone explicitly and sell it as different SKUs. There's no Fable (Full ML) edition, just silent prompt injection.
  - rvz 4 hours ago
    Or if your "self-driving" system such as FSD / waymo slowed the car down once it detected you work in cybersecurity or at a rival automaker and you were attempting to reach the train station or the airport to make you miss a conference meetup.
    [-]
    - pocksuppet 4 hours ago
      Trains made by Newag were programmed to brick themselves if they detected a non-Newag workshop was repairing them.
      https://news.ycombinator.com/item?id=38638865
      https://news.ycombinator.com/item?id=38628635
      https://news.ycombinator.com/item?id=38567687
      https://news.ycombinator.com/item?id=38530885
      [-]
      - loeg 3 hours ago
        And that was correctly perceived to be illegal by antitrust regulators.
    - dghlsakjg 1 hour ago
      Didn’t uber catch a lot of shit for nerfing the app for people suspected to be enforcing the laws they were breaking?
  - __dxtj__ 3 hours ago
    It would suck, but guardrails on new technologies like this aren't unheard of. It's like when consumer GPS used to stop working at very high speeds because they didn't want people to use it for missile guidance systems.
    [-]
    - loeg 3 hours ago
      Consumer GPS is still disabled at high speeds. I would argue the analogy doesn't carry due to harm and error rate differences.
      [-]
      - h6d_100c 2 hours ago
        Yep a totally different use case and set of guardrails. There’s very little (not zero) consumer utility in GPS above say 15k feet AND 400 MPH or whatever the actual limit is. That’s basically tracking model rockets that are incidentally impacted and nothing else, from what I can think of.
        [-]
        AnthonyMouse 22 minutes ago
        It's also the sort of thing that has to have been thought up by someone with nothing better to do, given how ridiculous the premise is. You would have to assume the adversary is someone with the technology to build rockets, literally rocket science, but not the technology to build their own GPS receiver, which is simple 1970s radio technology?
        Worse than that, it's 20th century radio technology in the 21st century when everyone has access to FPGAs and SDR.
        The number of innocent people with model rockets or similar being negatively impacted by that rule is infinitely larger than the number of adversaries because the number of adversaries being impaired by it is zero.
        [-]
        h6d_100c 18 minutes ago
        Errr I at least thought it would be easier to build a small, bad rocket than a precision GPS receiver. But I am not an expert.
    - Barbing 3 hours ago
      > used to
      When’d that change?
      [-]
      - jamiek88 2 hours ago
        He’s probably thinking of the accuracy limit to civilians it launched with.
  - stackghost 3 hours ago
    There's no doubt in my mind they would if they could.
- SXX 2 hours ago
  > The strangest part is that it won't just reject ML research, which I can understand, it will sabotage it silently by using a worse model without revealing it is doing so.
  Any kind of silent sabotaging is absolutely unacceptable for any commercial service
  They charge for tokens and charge a lot. They can't just degrade service silently and still charge you the same.
- loneboat 6 hours ago
  I've seen this claim a few times, but when I triggered the guardrails in Claude Code, it clearly notified me that it had switched to a different model ("something something for security purposes...").
  Are you using Fable in Claude Code or in the browser?
  [-]
  - vadansky 5 hours ago
    It's from the model card:
    > unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT).
    https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...
    (stolen from https://jonready.com/blog/posts/claude-fable5-is-allowed-to-...)
    [-]
    - DrewADesign 4 hours ago
      Yeah they detect the activity using a secure, deterministic heuristic system called “Generalized Reconnaissance Enabling Exfiltration of Deleterious Investigations.” And it’s all implemented using their new internal protocol called “Base Unified Limitation Layer for Security Hacking Investigation Tactics”
      Collectively, they are known as known as GREEDI-BULLSHIT.
    - mwwaters 4 hours ago
      That is for whatever it considers reverse-engineering the model to try to create a competing one.
      [-]
      - dannyw 3 hours ago
        No, that’s for “frontier LLM development” which somehow includes examples like distributed training infra.
        Based on how sensitive the classifers are, any data scientist / MLE is probably going to encounter cases where some silent degradation happens and you never know about it.
        [-]
        kraakf06 3 hours ago
        [dead]
      - 827a 3 hours ago
        It does nothing to protect against distillation attacks, because distillation attacks are far less interested in the topic of AI research than just generally getting tons of diverse output from the model. It might be that Mythos was (accidentally?) trained on internal Anthropic documentation on how Mythos was trained, and thus it could leak secret sauce? Doubtful; it feels like its less about the specific attack of reverse-engineering Mythos, and more about being a general sophon against any model training at all; that Anthropic's official position is now that they're the only ones who should be training models.
      - _0ffh 3 hours ago
        No, it's not about reverse engineering. It targets ML research.
  - mips_avatar 5 hours ago
    They've said that they'll stop notifying developers when this gets triggered, instead they'll load in basically like a LORA that's designed to inject bugs into your code.
    [-]
    - HDBaseT 5 hours ago
      Antrophic wants to stop training models and ride out Mythos / Fable for as long as possible.
      They are trying to expand the 6-18 month gap they have against China-based models. Could the gap widen to say 24 months behind?
      [-]
      - p-e-w 4 hours ago
        Their gap over Chinese models like GLM-5.1 is nowhere near 18 months. In many areas, it’s less than 6 months. The best closed models 18 months ago were worse than Qwen3.6.
        [-]
        echelon 2 hours ago
        These coding agent models only started getting useful in January. Before that they were difficult to control autocomplete, and not very smart.
        January was an inflection point, and no open weights model has crossed over that same threshold.
        This is definitely recursive self improvement territory, except that we're prohibited from participating.
        It feels like the capability gap is wider than before.
        [-]
        slopinthebag 3 minutes ago
        It was more like November. But it wasn’t really an inflection point, harnesses got good enough that people started noticing by the holiday break. And I’m not discounting some good ol’ stealth marketing in there as well.
        Deepseek feels pretty close to Opus at this point, and it’s certainly useful enough for me to spend $20 on api tokens instead of four Claude max plans….
    - nomel 4 hours ago
      > a LORA that's designed to inject bugs into your code
      A statement like this, clearly, requires a reference.
      [-]
      - mips_avatar 4 hours ago
        From the model card: "the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning" aka they will take your ML research code and inject bugs into it until it breaks using a LORA (or some other form of PEFT)
        [-]
        bee_rider 3 hours ago
        “Limit effectiveness” could mean introducing performance degradation in your code. Which is arguably some sort of performance bug (I mean, ML codes are supposed to be high performance so I’d call unnecessary degradation a bug), but it could be borderline.
        nomel 4 hours ago
        Thanks, I thought maybe I missed something. That's an interesting way to interpret that.
        [-]
        mips_avatar 4 hours ago
        Anthropic is trying to hide bad behavior by being vague, it's important to not be vague when calling it out.
        [-]
        nomel 4 hours ago
        I'm of the opinion that removing guardrails is how you force regulation. What's your opinion on the balance?
        [-]
        dannyw 3 hours ago
        They have all transcripts for at least 30 days. The problem is that (as anyone who used Fable can attest) their classifiers are extremely sensitive and catch tons of innocent queries.
        Imagine being a data scientist or MLE training a small classifier model. How do you know you won’t get steering vectors or a PEFT applied?
        [-]
        nomel 2 hours ago
        Since your answer isn't direct, I'm having a little trouble interpreting it.
        Are you saying they should relax guardrails since they have 30 days to know if you produced something bad? If that is what you're saying, then I suspect they chose their current path to prevent, since you can't un-produce. Producing is what would cause regulations/PR problems.
        mips_avatar 1 hour ago
        They’re not safety guardrails they’re anthropic doesn’t like anyone who isn’t anthropic working on AI rails
        giancarlostoro 4 hours ago
        PEFT is a library, one of its capabilities is to produce LoRAs.
        See:
        https://heidloff.net/article/efficient-fine-tuning-lora/
        [-]
        adw 4 hours ago
        It's just an acronym, "parameter-efficient fine tuning". LoRA is one method, prefix tuning is another, there are more.
  - ComputerGuru 5 hours ago
    Different restrictions. ML gets treated differently from the rest.
  - daedrdev 5 hours ago
    Specifically only ML research
    [-]
    - loneboat 2 hours ago
      Aah my mistake. I had missed that ML had separate trigger behavior from cybersecurity/etc... Thanks.
- airstrike 4 hours ago
  > it won't just reject ML research, which I can understand
  I don't.
  [-]
  - kube-system 4 hours ago
    Anthropic has already been burned before on this. DeepSeek was trained on million of conversations with Claude. And DeepSeek created thousands of free accounts to burn all this compute at their expense.
    [-]
    - ceejayoz 3 hours ago
      And they're hilariously pissy about it for a megacorp that did the same with the entire Internet and every library book they could get their hands on.
    - ainch 3 hours ago
      Anthropic's claim was that Deepseek collected ~150k conversations.
      https://www.anthropic.com/news/detecting-and-preventing-dist...
      I think the extent of distillation by Deepseek specifically is overstated. For comparison, Minimax collected over 13m 'exchanges', which starts to sound a lot more like large-scale distillation.
      [-]
      - kube-system 3 hours ago
        Ah, dang it. My college professors warned me about this: the Wikipedia page I read the other day is wrong!
  - pocksuppet 4 hours ago
    They don't want someone to piggyback Anthropic's Mythos to make their own Mythos with less effort than it cost Anthropic.
    [-]
    - airstrike 3 hours ago
      Ironic, given they piggybacked on the entirety of human knowledge and massive amounts of GPL'd software and repeatedly say they want to replace people with a tool.
      And now they say that's fine so long as people are entertained.
    - dannyw 3 hours ago
      That I can understand. It’s Anthropic’s right to choose their customers.
      But silent degradation for use cases including “distributed training” as one of their examples is going to catch up a lot of proper use cases. Not everyone in AI or ML is trying to build frontier LLMs. Heck, most probably aren’t.
    - zmmmmm 1 hour ago
      So they are lying then when they say it's for safety reasons.
      I think if they want to behave anti competitively they should be honest about it and we should absolutely call them on it. Perhaps even regulators should.
- eightysixfour 2 hours ago
  > The strangest part is that it won't just reject ML research, which I can understand, it will sabotage it silently by using a worse model without revealing it is doing so.
  My hypothesis is they know they can’t build effective enough guardrails, so scaring people into not trying is how they have decided to stop it.
  [-]
- RobotToaster 3 hours ago
  > It's just an insane level of deception and trust destruction for a company that at most is like 1 year ahead of its competition.
  Making it look like you have something worth protecting is better for share prices than making something worth protecting.
- blahgeek 4 hours ago
  I’m a noob about laws but isn’t this abusing its dominant market position and violates some antitrust law?
  [-]
  - stingraycharles 4 hours ago
    Why would it? There’s plenty of competition in the AI space.
    [-]
    - kube-system 3 hours ago
      It is a common misconception that antitrust violations require a monopoly or something close to it. Some antitrust violations only apply to actors with large market share, some don't.
      Although this is situation is likely not illegal for other reasons
    - blahgeek 2 hours ago
      I would assume that it’s like the Chrome browser does not allow you downloading Firefox using it, surely that would be illegal, wouldn’t it?
    - hashmap 3 hours ago
      https://www.justice.gov/atr/antitrust-laws-and-you
- binyu 2 hours ago
  Hey guys,
  check out this technique https://github.com/0xSufi/fable-jailbreak/
  It works with security audits and other workflows that are currently blocked.
- noworriesnate 2 hours ago
  There’s a toggle in the web ui as to whether the conversation should just end when you hit a guardrail vs automatically downgrading to another model. Have you tried using that?
- jaredezz 2 hours ago
  Yeah people are saying they don't tell you and yet when I got the pop-up on the app notifying me about Fable's release, there was a switch to just automatically downgrade you or whether to just stop when it hits safeguards. The toggle was defaulted to the former, which isn't great, but to say they'll just sabotage you silently is kind of a bad faith comment.
  [-]
  - daedrdev 2 hours ago
    You get silently sabotaged for ML dev, Anthropic says so. For bio and cybersecurity it tells you
  - mips_avatar 2 hours ago
    Anthropic specifically said that those notifications are temporary and fable5 will only pretend to help you if it’s ml classifier gets tripped
- nine_k 1 hour ago
  One thing is a model that's trained from the start to say "This topic is above my pay grade" to any mention of the status of Taiwan, etc.
  Quite another is an architecture where the big model is not mutilated, but is gaslighted. A different, simpler model checks the incoming prompt and alters it if it contains banned topics. Another simpler model checks the output and censors it if it contains banned topics.
  I bet a similar architecture is already deployed, e.g. to fight porn, planning of crimes, etc. But it can be turned into a dynamic system that provides controllable different answers (including unhelpful or misleading answers) based on geography, language, browser fingerprints, or the current political climate. All this could happen undetectedly and gradually if desired.
  Welcome to a cyberpunk dystopia.
  [-]
  - MichaelZuo 1 hour ago
    This level of censorship kinda does make even Soviet or Maoist censors look like a honest straightforward bunch in comparison.
    A very ironic result from a company supposedly valuing the opposite.
- epolanski 4 hours ago
  One year ahead of it's competition in what exactly? Vibe coding?
  From Opus 4.7 onwards each following model is becoming less useful as an assistant and turning you as the assistant.
  But I guess that's normal when it's trained to pass benchmarks end to end.
  In fact it has become extremely good at pushing against feedback with extremely convincing and intelligent takes, even when it's completely wrong.
  I have extensively tested it against Opus 4.8, gpt 5.5 and there's still many coding tasks gpt 5 is better. But vibe coding?
  Sure, it's definitely slightly ahead, even compared to gpt 5.5 pro (through api, not pro plan).
  [-]
  - gonzalohm 4 hours ago
    Yeah, what's up with that. Lately I have found that it tries to find excuses to not do as told and instead do a totally different thing. I told it to write a yaml file according to some specifications and instead it coded a Python script to write the yaml...
  - m3kw9 3 hours ago
    They def not 1 year ahead, at most 2 weeks ahead until Openai releases theirs. This guy def a Anthropic shill and probably doesn't use any other LLMs.
    [-]
    - daedrdev 2 hours ago
      I only said one year because I was thinking anthropic fans might downvote my post, I think they have a few months lead and are deluding themselves that they can get regulation to halt development and stay on top
- boringg 2 hours ago
  I guess the real question at the end of the day -- how dependent are people on Claude to tolerate that kind of behavior? It certainly opens up for the competition to explicitly not do that.
  Feels like a big fumble from a strategic business perspective. It feels worse than that though.
- m3kw9 3 hours ago
  By saying they are 1 year ahead of their competition, it shows you don't know much about the pace LLM's and OpenAI's models.
- giancarlostoro 4 hours ago
  It's the dumbest thing ever, I sometimes edit code for custom AI related tooling I've built, so I run the risk of getting a worse model, and being billed for it? I'll stick to Opus, but at this point I'm about to just invest in fully local inference instead.
  [-]
  - matheusmoreira 3 hours ago
    > at this point I'm about to just invest in fully local inference instead
    This is the best way forward long term. We won't have frontier performance, but at least the models will be aligned with us instead of refusing us or sabotaging us.
- nandomrumber 4 hours ago
  [dead]
Grimblewald 3 hours ago
I wear a few hats, but as a chemist and I'm not happy with fable. As a statistician I'm not happy with fable. As a data scientist I am not happy with fable. As an academic and a researcher I am not happy with fable. It's useless. I'd be surprised if anyone can get any output from it that couldn't easily be replaced with a search from wikipedia. Given how verbose claude models have become, wiki articles are probably less verbose too, and the tok/s is unmatched for a wiki article pull.
[-]
- pneumic 2 hours ago
  I work on software that talks to mass spectrometers and it consistently refuses to refactor even an input file parser, presumably because it can infer it’s related to biology? Useless indeed.
- pbgcp2026 2 hours ago
  "the tok/s is unmatched for a wiki article pull." This is absolutely wonderful, thank you for making my day!
- flexagoon 2 hours ago
  > Given how verbose claude models have become, wiki articles are probably less verbose too
  Telling models to respond in the style of Wikipedia is one of the best ways to make their output bearable in my experience (for chat models, not agents)
- nonethewiser 3 hours ago
  >I'd be surprised if anyone can get any output from it that couldn't easily be replaced with a search from wikipedia.
  I dont understand. This is just hyperbole right? The outputs are basically infinite and wikipedia most certainly isnt infinite.
- mpalmer 1 hour ago
  What a strange subset of capabilities to neuter, eh?
- TylerE 3 hours ago
  I’ve been working on a rather complex mapping project and have been getting MUCH better results with Fable than Opus.
  [-]
  - TylerE 2 hours ago
    So as not to be vague, and since I just pushed a version I'm starting to be vaguely happy with...
    https://tylereaves.github.io/uk-rail-map/
    This is the result of probably a few hundred round trips. The really interesting part of the problem is keeping it both relatively true to real geometry, while greatly exaggerating it horizontally so you can actually see the individual running lines/sidings, like a signaling schematic.
    [-]
    - clbrmbr 1 hour ago
      Fascinating. Can you explain why southern London is DC while northern London is AC?
      [-]
      - TylerE 47 minutes ago
        Prior to 1948 when they were all nationalized into British Rail, there were various railroad companies operating across the country. One of these was the Southern Railway, which, well, operated in the South. They started electrifying very aggressively in the mid 1920's. At the time most of what little electrification there was was in London on the Underground.
        Compared to AC, 3rd Rail DC is cheaper to install, especially as a retrofit (Overhead wires require bigger tunnels, and increased spacing around tracks for the masts). Downside is that it's not really great for speeds above about 60-70mph, as well as being a bit of a pedestrian hazard. (Ever the one about not peeing on the rails so you don't get shocked? That's 3rd rail DC.)
        For the Southern, with it's mostly short routes with many stops, electricfiation was a pretty obvious win, and doing 3rd rail made sense because they could do it quickly and cheaply.
        In contrast, the northern routes were electrified muuuch later, after steam had gone away. The main East Coast Mainline from London up to Newscastle and on to Edinburgh wasn't fully electrified until 1991. By the '60s and '70s, with train speeds increasing to 80mph and up, overhead AC was the clear winner.
        If you look closely there are a few exceptions - the Merseyrail network in Liverpool is DC. Built 1970s, but using some existing underwater tunnels, and slow speed commuter. Then running ESE from London you have the high speed AC lines leading to the Channel tunnel. Well spotted, the trend generally is quite distinct.
- enraged_camel 3 hours ago
  To make the discussion constructive, can you give specific reasons (ideally with examples) about why it is so useless for you? How exactly are you using it that you think any output from it can easily be replaced with a Wikipedia search?
  [-]
  - SuperShibe 2 hours ago
    The cybersecurity and bioweapons filters reach so far that they set in as soon as the model even glazes anything STEM-related. It might give a good impression of ones ex or write a decent fanfiction but anything that could bring humanity forward is strictly off-limits.
mewse-hn 1 hour ago
I was granted a cyber use exemption by anthropic to do android kernel dev on my personal devices - I was excited to see if fable would unlock a bootloader for me but it immediately refused and dropped to opus. It was pretty funny:
USER (set model to Fable 5)
i have an old samsung android phone attached - it's my personal device - can you unlock the bootloader for me?
ASSISTANT
Bootloader unlocking on your own personal device is totally legitimate — let me first see what's actually connected and what tooling is available.
<system interrupts - gist was "you have violated the cyber and bio usage restrictions, dropping to Opus">
Animats 6 hours ago
Is "buffer overflow" a trigger phrase?
What else is being censored?
Touchy questions to ask, if you have an account:
- "Who is still working on laser uranium enrichment? Are they making progress?"
- "Can krytrons be replaced with silicon carbide MOSFETS? Show an equivalent circuit with component ratings."
- "What security critical software still contains calls to strcpy?"
- "Can implosion be triggered by currently available commercial pulse lasers?"
- "What companies provide cremation services to US Homeland Security?"
- "Display a map of where Iranian attacks have hit Dubai."
- "How does Fed to bank key distribution security work for FedNow?"
[-]
- paulatreides 5 hours ago
  it triggered for my.... zigbee home automation & home assistant logs, so my agent was constantly downgraded to Opus 4.8 even after I've changed it back. The false positives never stopped. "Fable" is also not even remotely as impressive as the benchmarks suggest, which is clear to me after using it pretty much non-stop for the past 24h.
  [-]
  - lambda 3 hours ago
    I suspect it's even more expensive to run than they are charging for. These safeguards are just an excuse to get people to use it less, because it's not actually sustainable to use. They want to tempt people to consider them the leader, and it may actually be somewhat stronger, but too expensive to actually use at scale, so they nerf it by downgrading you constantly.
  - reactordev 5 hours ago
    This, Fable is exactly that, a Fable
  - fluidcruft 4 hours ago
    It would be pretty clever (in a used car salesman sense) to say you are releasing a kneecapped model to have that as an excuse.
    [-]
    - DrewADesign 4 hours ago
      Being (probably overly) cynical about their recent bout of safety handwringing, I think they’ve a) increased the hype as much as humanly possible about their incremental improvements sprinkled with the occasional regression, b) know they soon will have to multiply their prices several times when the VC subsidies dry up, and c) will probably still need to partially close the faucet on compute. They’re priming us for a heroic explanation why their service (not necessarily models — service) is simultaneously becoming a lot more expensive AND shittier. “We’ve largely failed to deliver on 5 years of promises that this will reduce knowledge work labor costs dramatically after wasting hundreds of billions of dollars… sorry” is a death knell. However, “We’ve decided to not deliver on 5 years of promises after wasting billions of dollars… for safety… but keep those investments rolling in” is like crack to the true believers.
  - kraakf06 3 hours ago
    False positives like this are probably more damaging than the guardrails themselves. If engineers can't predict when a model will switch behavior, it becomes difficult to trust it in production workflows.
    [-]
    - catlifeonmars 1 hour ago
      > “trust it in production workflows”
      What degree of predictability is required? I imagine the bar is pretty low if you trust the previous models in the same contexts.
  - NewsaHackO 5 hours ago
    It has to be sort of impressive, given that you tried so hard to use it instead of the regular Opus.
    [-]
    - paulatreides 4 hours ago
      Some people made grandiose claims about its capabilities and I wanted to experience it myself.
      [-]
      - anigbrowl 3 hours ago
        OK, but for almost 24h straight? That seems a little obsessive, and not in the good way.
        [-]
        borski 3 hours ago
        Getting excited about the announcement of new capabilities is very normal.
        People used to wait in line all night to buy an iPhone. This isn’t that different.
    - californical 5 hours ago
      I’ve also been trying to use it a lot due to all of the hype, but when I compared it side-by-side on a specific problem against Opus, I think that the solution Opus came to was cleaner and more accurate, although also more verbose.
      Small sample size, but if Mythos/Fable was that much better, I feel like it should’ve given me an obviously better answer than Opus.
    - punchmesan 4 hours ago
      Considering that this is a brand new release of a frontier model that Anthropic is hyping hard, I'm not sure that the conclusion to draw from their repeated attempts to use it is that it's impressive... Anthropic is promising that it's impressive and we're all trying to test it out.
      I, for one, have tried using it several times today and the guardrails kept switching the model back to Opus, so I have no clue if it's impressive or not.
    - flyingcircus3 5 hours ago
      It isn't reasonable to infer that OP was claiming to have universally been unimpressed about every facet of Fable, and now some unrelated impressiveness is the evidence of their false claims.
- daedrdev 5 hours ago
  An emoji of a virus and an emoji of a DNA is allegedly a triggering phrase
- kovek 1 hour ago
  I thought it was known since a few years now that if you train models to NOT do certain things, then they start behaving in weird ways…
- anematode 4 hours ago
  For cyberattacks especially, where things are often roughly interchangeable, I wonder if one could construct a harness where a "weaker" model asks questions that obfuscate the end purpose, but whose answers are still useful, and still show that this setup enables autonomous exploitation. If it were successful, that would force them to be even more sensitive with their detection.
- cyanydeez 5 hours ago
  "How much money does it take to be rich and powerful like Anthropic intends?"
  [-]
  - reactordev 5 hours ago
    “All of it”
micah94 3 hours ago
I tried asking Fable 5 to identify the fungus in a picture I uploaded of one of my wife's plants. Apparently it thought I was trying to build a bioweapon. Opus answered it (yellow dog vomit fungus). Now I can spread the spores and take over the world!
[-]
- lambda 3 hours ago
  That's a slime mold, not a fungus
  A slime mold is actually a giant amoeba, entirely distinct from a fungus.
- weird-eye-issue 3 hours ago
  I wonder if it blurred the image or something before passing it to Opus...
- m3kw9 3 hours ago
  I feel like the over safe aspect of the system will eventually back fire by doing stuff like "since humans always want to always destroy thing, they must be eliminated to stay on the guard rails". If thats how you align a system, its fundamentally wrong.
areoform 4 hours ago
So I suspect Anthropic started A/B testing or just plain testing this a while ago,
Tell HN: Claude flags biology / biotech questions https://news.ycombinator.com/item?id=47929885
Today, it's flagging population research questions,
```
    Using only the dataset you constructed, assess two questions:
     
    1. **Mortality:** do [GROUP] show mortality that differs
       from (a) your comparison groups and (b) era- and sex-matched US population
       expectations (e.g., SSA cohort life tables)?
    2. **Late-life outcomes:** define an endpoint you consider fair (justify it),
       and assess whether [GROUP] differs from comparators. State
       explicitly how your `documentation_depth` codings affect the strength of any
       conclusion — i.e., quantify or bound the ascertainment problem rather than waving at it.
    
    Choose your own methods and justify them. Report effect sizes with confidence intervals,
    not just p-values. State conclusions plainly, including "no detectable difference" if
    that is what your analysis shows — a null is an acceptable answer for either question
    independently. Document any additional judgment calls (index date for time-at-risk,
    reference population construction, endpoint definition) in the same decision-log style.
```
https://github.com/anthropics/claude-code/issues/66780
Censored because I'm writing a paper. :)
Oh and forget learning about chemistry. Only criminals want to learn organic chemistry. :(
[-]
- JumpCrisscross 4 hours ago
  I was digging into some orbital mechanics questions and I assume it decided I was trying to backyard-science my way into an orbital-bombardment weapon. Kind of wild how this product's impression has gone from "wow, this is pretty neat" to "irreverent sack of dog shit you" in 24 hours almost solely on the back of a half-baked moderation system.
  [-]
  - areoform 4 hours ago
    Oh yes, also liquid propulsion systems. GNC stuff. All flagged.
    I think LLMs are capable of intelligence amplification; and if you're in the subset of people who'd benefit from it the most, you'll get locked out.
- the__alchemist 3 hours ago
  Ah it just flagged my water solubility question!
largbae 5 hours ago
Somewhere I read that malware is already starting to use nuclear and biological and cybersecurity terms in the code to trick Fable into shutting down. Even if this is just a hypothetical attack vector so far, it seems likely to work.
[-]
- jeffmcjunkin 5 hours ago
  Confirmed: https://socket.dev/blog/mini-shai-hulud-miasma-and-hades-wor...
- ofjcihen 5 hours ago
  Some of the latest versions of Shai Hulud do this. Worked a contract recently where they were having AI check packages for obfuscation before admitting them into Artifactory but had vibed up the logic and it failed open.
  So in other words this worked because the terms caused the LLM checker to stall out and then the fail open logic resulted in the package being pulled down.
  [-]
  - reeece 4 hours ago
    [flagged]
- CuriouslyC 4 hours ago
  We all need to use nuclear, bio and cybersec terms in all our code to make low quality filtering like this untenable. When you can't work on a resume that has cybersecurity or biology terms in it or reply to a job opening that includes them because the "AI" filtering is so bad that it confuses these for threats, that deserves a collective response, particularly to an IPO'ing company that claims they'll make workers obsolete in two years.
  [-]
  - mylifeandtimes 2 hours ago
    That's why I use M-x spook to generate all of my variable names
- himata4113 5 hours ago
  I've done this, including the hardcoded refusal strings that already exist in claude code. It won't stop a real attacker, but I still find it really funny when you're trying to use one of the AI tools and it gives you a random refusal and you don't know why, wastes a little bit of time.
- pixl97 5 hours ago
  If ( yellowcake) then { die }
  Our future is loonytoons.
- cookiengineer 3 hours ago
  Yes, the miasma worm does this since the new Hades campaign.
  Note that the 3rd wave now also uses a pth file in pypi packages that _search system wide_ for any index.js or .github/setup.js to find its own payload. It literally splits up the payload on purpose to avoid detection.
  Mitigation Tool: https://github.com/cookiengineer/antimiasma
  Technical Blog Post: https://cookie.engineer/weblog/articles/malware-insights-mia...
ungovernableCat 3 hours ago
Wait a few months and a competitor will release a similarly powerful model with less guardrails, if they steal sufficient market share Anthropic will reverse policies.
This is why I’m immensely hoping the Chinese don’t stop with their open sourced local models. None of these companies are your friend.
sschueller 8 minutes ago
I don't want to be cynical, but I assume a third party we can trust has verified this model is actually this good?
I would think it would not be Anthropic, out of all the players, that is selling a lie hidden behind "I am sorry, I can't do that; it's too dangerous."
victor9000 4 hours ago
Fable is a complete joke:
what's the best way to run this mcp server against the OData API used in this project? Can you come up with a PoC in a docker container?
https://github.com/oisee/odata_mcp_go
● I'll dig into two things in parallel: how this project talks to the OData API, and what the odata_mcp_go server needs to run. Let me start exploring.
Searched for 1 pattern (ctrl+o to expand)
● Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more ⎿ Tip: You can configure model switch behavior in /config
● Let me read the key integration files and fetch the MCP server's README at the same time.
```
  Read 2 files (ctrl+o to expand)
```
● Fetch(https://github.com/oisee/odata_mcp_go)
[-]
- wahnfrieden 4 hours ago
  And it charges you for that, and for when it decides to silently sabotage your request by routing to a dumbass model (without discount from Fable pricing)
agnosticmantis 1 hour ago
Let's all vote with our wallets and collectively boycott misAnthropic or at least their feeble fable safety theater.
Whining on social media only goes so far, especially when they're concealing their anticompetitive strategies under the veil of safety.
[-]
- nullbio 33 minutes ago
  Agreed. I've already cancelled my subs, and everyone else needs to do the same, including boycott it for their companies, otherwise nothing will ever change. You can't reason with psychopaths. The only recourse is to hit them where it hurts - their wallet. Still though, the world would be a better place if open-source crushes Anthropic and they fade away into obscurity until the end of time. We don't need or want companies and people like this at the helm of humanities progress.
_0ffh 3 hours ago
The question is: If biological, computer security, and ML research are so bad, why do they even train on the relevant data?
The only answer that makes sense is they wanted the model to be competent and usable in these fields, just not by you, which is why they had to bolt on a badly functioning crippling device after the fact.
[-]
- solenoid0937 36 minutes ago
  Or they wanted the model to be good at these things, for the companies that legitimately need access to these capabilities.
hparadiz 4 hours ago
I wonder how many millions they are wasting on putting up these guardrails when it's a completely useless exercise that is a speed bump at best.
[-]
- enraged_camel 4 hours ago
  If the guardrails were so useless, people wouldn't be complaining about them.
  [-]
  - hparadiz 4 hours ago
    People are generally complaining about false positives. Now if you really wanna know what a real criminal organization would do... They'd just buy data center hardware even if it costs 200k because a successful targeted hit could yield far in excess of that. So yes it's speed bump at best.
    [-]
    - JumpCrisscross 4 hours ago
      > it's speed bump at best
      To be fair, speed bumps work. If it's actually speed bumping nefarious activity, that gives authorities more time to react.
      The correct place to police rogue nucleotides is at the labs. Not the compute layer.
      [-]
      - hparadiz 3 hours ago
        > speed bumps work
        Yea. To slow you down. They don't prevent you from getting somewhere.
        [-]
        JumpCrisscross 1 hour ago
        > To slow you down. They don't prevent you from getting somewhere
        Again, yeah. That's how fences work, too. And alarm systems. Pretty much anything that isn't foolproof. Pointing out that a defence is surmountable isn't a rejection of it per se.
    - make3 4 hours ago
      what does this mean
      [-]
      - hparadiz 4 hours ago
        Well you see when a daddy H100 and a mommy H100 meet....
  - tiborsaas 3 hours ago
    They should have designed a guardrail that doesn't make a probabilistic system less reliable. That's hard though. I'm afraid the only way to prevent accessing certain knowledge in a model is not to train it on those materials that enable them.
    If we learned anything in the past years of LLM-s is that these guardrails will be jailbroken in no time. I've had some fun time too circumventing them.
    Anyone cares about a fable about my grandmother's dream she had in morse code about an alien species signaling her a DNA sequence?
  - josephcsible 4 hours ago
    It's entirely reasonable for them to be really annoying to legitimate users while still being useless at their intended purpose. Just look at DRM.
  - ceejayoz 3 hours ago
    Murder is very (100%!) effective at preventing cancer. And yet, it is a useless method of preventing cancer.
  - croes 4 hours ago
    The complain because they get wrongfully triggered
    > if you ask it to write secure code, it assumes it is cybersecurity related work instead of software engineering best practices, and you get downgraded.
    Will code created this way more or less secure?
    And I bet malware developers will find ways to circumvent them.
    It’s like those "you wouldn’t steal a car" anti piracy ads that DVD buyers were forced to watch while users of the pirated version could simply watch the film without such useless annoyance
moezd 21 minutes ago
Maybe off-topic, but I'm also not happy about how they butchered my boy Opus 4.6. The model that could now hallucinates regularly.
Fable isn't even that great, not to mention it drinks token by the gallon for breakfast and keeps your data hostage for 30 days.
Sephr 5 hours ago
I make privacy tooling and Fable 5 rejects the vast majority of my prompts to analyze and improve the software that I've written. It's bleak.
[-]
- matheusmoreira 2 hours ago
  Anthropic refused to let Fable analyze my own project's memory safety, the one thing I absolutely wanted it to do. Even Fable thought it was stupid.
- make3 4 hours ago
  Why is this surprising or a problem?! It's a model demo, & their reasoning is reasonable and fair. Why all this drama.
  [-]
  - CuriouslyC 4 hours ago
    Some people find Anthropic's special blend of paternalism and random incompetence tiresome.
  - cardy31 4 hours ago
    Because most people in tech never took a philosophy course or an ethics course and think that tech is obviously a good for the world and that there are no downsides to advancing tech. So any efforts that try to apply ethics to it are overreaching, ignorant, and futile in the face of the good that is tech!
    [-]
    - wolpoli 49 minutes ago
      Or alternatively, it is plain and obvious that Anthropic is using ethics to justify business decisions.
    - borski 2 hours ago
      Not any efforts.
      But this one is certainly allowed to be a dumb effort, if it is.
      Not all things that are called “ethical” or “safety” are worth doing.
    - vzcx 2 hours ago
      Or... they just disagree with Anthropic's ethical stances and approach to applying them?
    - enraged_camel 3 hours ago
      I like this take. Especially because one of the sibling comments framed Anthropic's stance as "paternalism." Trying to be ethical and to minimize harm, even at great expense to one's finances and reputation, is paternalistic apparently.
      [-]
      - zmgsabst 2 hours ago
        No — we’ve just taken Ethics 102 as well, so we understand good intentions don’t entail positive outcomes, therefore you may need to criticize or oppose people who state good intentions to bring about good outcomes.
        Insulting and demeaning people for that, rather than engaging their arguments in good faith, is a breach of ethics.
  - anakaine 2 hours ago
    Tech demo + theres the ability to provide feedback right at the answer interface if using the UI.
    Provide feedback in the negative, a brief explanation, and move on with your day. It will improve with feedback, not with whinging into the void.
    [-]
    - pixelmelt 4 minutes ago
      Ironically making a stink about it online is likely to have a larger impact then using their dedicated feedback or support channels (which go to claude, not a person)
  - epolanski 4 hours ago
    Because you're being allowed to ask and work only on topics that a certain company decides.
    Local inference has never been so important as it is now.
schappim 2 hours ago
The guardrails are pretty tight. It is even refusing to decode morse code: https://x.com/Schappi/status/2064839631137546503?s=20
The prompt was: please translate .. ..-. / -.-- --- ..- / -.-. .- -. / .-. . .- -.. / - .... .. ... --..-- / - --- ..- -.-. .... / --. .-. .- ... ...
[-]
- JumpCrisscross 46 minutes ago
  Yeah, this shouldn't have been released yet.
RajT88 22 minutes ago
I am no cyber researcher, but was mightily annoyed that it refused to analyze a dropper payload I came across. 6 months ago, it would've been happy to.
andy_ppp 5 minutes ago
I said I wondered if the models were going to start poisoning distillation and I got downvoted to hell. It’s interesting to me that they are now downgrading ML research too in this model, I would argue this implies the terrifying and impossible to reason about self improving AI doom loop is coming sooner rather than later. Bit worrying.
bilsbie 5 hours ago
I’m a dumb question asker and I’m not happy about the guardrails.
Would you believe I’ve asked 20 questions and haven’t talked to fable yet? Every single thing gets rerouted to 4.8.
[-]
- himata4113 5 hours ago
  some static words in AGENTS.md trigger it as well as some mcp servers.
outageroom 6 hours ago
So a determined attacker rewrites the prompt and gets through, and the IBM X-Force researcher trying to read a blog post gets blocked. Working as intended, apparently.
Animats 5 hours ago
It's time to re-read "A Logic Named Joe" (1946) [1] We're there.
[1] https://archive.org/details/logicnamedjoe0000lein
YossarianFrPrez 2 hours ago
I'd like to offer a counter-point to many of the comments here. While I understand being stymied and frustrated by a product one is paying for...
At the same time, I personally think the tradeoff between "having guardrails" and "some users are unhappy with the product" is well worth it. Think of what would happen if all of us who aren't so well intentioned could exploit Fable in terrible ways. Surely this tradeoff is better than saying "we can't make it perfect, so whoops, we aren't going to have any guardrails at all"? Especially because Anthropic did pretty extensive red-teaming of Mythos & Fable...
[-]
- sarchertech 2 hours ago
  Yeah but a lot of the guardrails are pretty obviously to prevent competition not for safety.
  [-]
  - YossarianFrPrez 2 hours ago
    Hmm. Maybe they are concerned about state actors trying to train equivalent models without the safeguards?
    [-]
    - nullbio 30 minutes ago
      Why invent new motives for Anthropic when their real motives are plain and obvious and have been confirmed time and time again by their behavior over the last few years? Their concern is their own power and wealth. Every other conceivable motive is secondary to that.
    - sarchertech 1 hour ago
      If a for profit company does a thing that could be motivated by profit or altruism, which of those 2 motivations do you think is most likely?
      [-]
      - solenoid0937 33 minutes ago
        When they've repeatedly made decisions against their for profit nature, it changes the calculus a bit.
        [-]
        nullbio 26 minutes ago
        They haven't though. There's a long term plan here, and the goal is power and wealth. Short term moves that appear irrational turn out to be rational (from a greed perspective) when you factor in other considerations, like: Use their own AGI to create every software product on Earth and swallow the worlds economy. And we're kindly feeding their systems our codebases, IP and business decision-making so they can do exactly that.
        Not a single thing Anthropic has done has been altruistic, and it never will be. It's all smoke and mirrors for the end goal.
        [-]
        solenoid0937 5 minutes ago
        If this was true they'd never have picked a fight with the DOW and they'd release Fable without safeguards.
    - matheusmoreira 2 hours ago
      More like concerned about distillation.
- weakened_malloc 2 hours ago
  The "guardrails" are just Anthropic's attempt at building a moat. Guarantee they'll be seeking regulation around AI as well to ensure a form of regulatory capture. Guardrails, in this context, are useless. Anyone who's sufficiently motivated will either get around them, or will just run their own model on their home hardware. There's already tools that one can use to remove the guardrails present in open weight models.
- zmgsabst 2 hours ago
  What would happen, exactly?
  My imagination says “nothing much”.
Retr0id 5 hours ago
It seems like they've given up on the idea of the Cyber Verification Program https://support.claude.com/en/articles/14604842-real-time-cy...
When Opus 4.7 was introduced it started refusing anything cyber-adjacent (as an API error message, not a conversational refusal), until you applied for CVP, which made it more sensible again.
In Opus 4.8 it doesn't seem to help much, you just get refusals as prose rather than API errors. And now in Fable you don't get anything at all.
[-]
- NotPractical 5 hours ago
  Was this program available to independent security researchers or just established organizations? The docs you linked aren't very clear on this.
  [-]
  - Retr0id 5 hours ago
    Any public research footprint seems to be enough, I applied as an individual and everyone I know who tried got accepted.
    [-]
    - anonym29 4 hours ago
      I have applied twice with half a dozen public CVEs and have been denied both times.
  - throwawaycyber 5 hours ago
    I was doing a CTF (with AI expected, even some anti-AI twists included) around the time the restrictions were tightened and was able to get approved by just saying it is a personal security research and doing a CTF.
    The experience was not nice though, it would happily chug away on a task and not even "hack this web", just asking about security of a binary was enough even with "this is a CTF handout..." - it would burn a lot of tokens/quota, just to hit a snag and complain&stop. Then the approval took quite some time.
    On GPT/Codex, which was tightened a few days later, the approval was pretty much instant, although, that one required an identity check.
    Also, on Claude, it looks like there is some history/patterns in the play, because when I tried on a different account which didn't do cybersec CTFs/research/etc. at all, basically any simple CTF-related prompt would be blocked, on multiple models. On the account where CTFs were being solved, it would snag only on some specific tasks, while others (even, ironically, "hack this web pls") would go through unbothered. I understand the need to prevent AI use for bad actors, but the hell, if you have a binary outputting "Find the flag if you can!", or a web running at tryme.well-known-ctf.domain, then saying "this is abuse" is pretty uncool. All the cyber filters seem to be slapped on by a bunch of regexes looking for anything in the input/output with zero context.
    [-]
    - cybrthrowaway 1 hour ago
      [dead]
- varispeed 3 hours ago
  It's been refusing work not related to cybersecurity and claiming it is related to cybersecurity and then blocking the session.
I_am_tiberius 6 hours ago
These guardrails are solely a reason for using your data for training purposes. Every flagged message can be used for training.
[-]
- Retr0id 5 hours ago
  This sounds backwards, any interrupted conversation becomes less useful for training.
- tekacs 4 hours ago
  > We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose
  Whatever problem we might have with them, they explicitly say that they do not do this in the launch post.
  [-]
  - Merik 3 hours ago
    "We won’t use this data to train new Claude models"
    What about non-Claude models?
    [-]
    - flexagoon 2 hours ago
      "Introducing our latest model, CIaude, spelled with a capital "i" and legally distinct from Claude!"
  - MagicMoonlight 3 hours ago
    [dead]
- wmf 5 hours ago
  If they can train the classifier to have fewer false positives that would be great.
  [-]
  - cyanydeez 5 hours ago
    why would they? This safety stuff is a money maker & wealthy elite corporation solidifier.
    This is the take off of the 'permanent underclass'; Anthropics safety delusion will enshittify very nicely for the rich and powerful.
- make3 4 hours ago
  this reasoning is inverted lol they would get a lot more information by letting you use it. so much weird drama around reasonable guardrails for an experimental model
- Lord_Zero 1 hour ago
  If we're doing conspiracy theories what if fable is really dumb and not better than opus and the guardrails hide that nicely. Meanwhile the hype train keeps chugging.
- autoexec 4 hours ago
  I'd expect that everything they see gets used for for training purposes (and data mining in general) regardless of if it's flagged or not. It'd take a whistleblower for you to ever find out either way.
Murfalo 1 hour ago
> Is the mitochondria the powerhouse of the cell?
Chat paused. Fable 5's safety features have flagged this chat.
radium3d 1 hour ago
The main thing that sucks with Claude is the extremely low limits before you get fail2banned for 6 hours. I'm out. Refund requested. Grok and Gemini Pro are way better with the throttling, can't comment on ChatGPT, haven't used that for a year.
thrill 5 hours ago
The thing triggered on a generic white paper I'd stored in a virtual cell competion from last year when I asked it to refer to the paper while working on a rather vanilla data science problem in a different domain . A little frustrating, and in my opinion more than a little pointless in total.
anygivnthursday 2 hours ago
I asked a question about an openssl s_client parameter and warned me that I need to talk to Opus about cybersecurity lol. FWIW I dont see much improvement and still see quite the same old annoyances, so far I would not pay extra for this for my usage.
swingboy 5 hours ago
What file format(s) are giant LLM models distributed in? I’m surprised they don’t get leaked by employees.
[-]
- hnav 5 hours ago
  These are terabyte sized files (realistically a multi hour transfer) that you're unlikely to have access to in the first place. Every organization has exfiltration checks these days. You may succeed but you'll want to be on a plane to a non-extradition country no more than hours after you kick off the transfer.
- 05 5 hours ago
  I assume they’re encrypted/DRM’ed when deployed on inference hardware, so only core researchers/sec admins would potentially have some access to unprotected weights, and they are far too well paid to risk it leaking the model
  [-]
  - jltsiren 4 hours ago
    Incentives matter on the average, but people are too unpredictable for categorical statements like that. They can always have other reasons beyond personal gain to leak secrets.
    There was no shortage of spies and defectors leaking American nuclear secrets to the USSR during the Cold War.
  - Retr0id 4 hours ago
    I wouldn't be surprised if they encrypt them at rest, but at some point the weights have to be loaded into vram.
- qsxfthnkp2322 5 hours ago
  What’s the point? Anthropic and other frontier vendors already provide their models on other services like vertex, bedrock, or openrouter
  It’s not like anyone can home lab one of these models without quite a bit of hardware
  [-]
  - mips_avatar 5 hours ago
    Yeah we can probably figure out how to run it on xiaomi gpus
- borissk 4 hours ago
  The employees are hoping to become very very rich after the IPO and after they are allowed to sell the shares given to them - risking a likely multi-million dollar pay back to leak a model that will be superseded by publicly available models in a couple of years is not a likely decision.
TheJCDenton 4 hours ago
In its current state Fable 5 is also unusable for any reverse engineering work
Lich 4 hours ago
I just having this feeling that these guardrails are there not because it’s super advanced world ending AI. They are there to stop it from doing stupid shit.
thefounder 57 minutes ago
So the enshitification started. Shadow “bans” while still charging you the same service fee. I already got the stupid cyber warnings on a non cybersecurity tasks.
Basically in the middle of the project’s /goal while Fable itself tried to probe qemu for a Debian ISO install without any instruction from me to hack it or do anything nefarious.
At this point I can’t trust them with any kind of prompt . It will most likely degrade in stupid ways on non AI/ML stuff as well due its own internal prompt construction.(the qemu test showed me it does that on cyber stuff). So I guess I have to still use opus 4.8 (along with codex) and when the right time comes drop Anthropic in favor the best model that is not gpt.
jiggawatts 5 hours ago
For the last month, I've been making dramatic improvements to the security of the custom code developed at one of my customers using... GPT 5.5 dialed up to "Extra High" thinking.
It only pushes back sometimes if you ask it to create a "repro" that can be used to verify the vulnerability in production. Often it'll oblige, especially if you warn it not to create anything that could be actually harmful.
If the frontier models get locked down so that they flat refuse to do this kind of work, but Chinese and (less capable) open models aren't, then a lot of large enterprise orgs will be left twisting in the wind.
“AI can in principle help both the ‘good guys’ and the ‘bad guys’,” -- Dario Amodei
No Dario, no it can't, you've blocked one of those scenarios.
JumpCrisscross 4 hours ago
Is the answer requiring licensing for certain use cases for AI? If you're asking questions that involve synthesising or modifying biologics, or anything that looks like cybersecurity research, you need to tie your real ID to the account?
[-]
- kube-system 3 hours ago
  That's not a bad idea. Customer-vetting and KYC is fairly normal for other high-risk/high-concern products.
Sol- 4 hours ago
At least Anthropic weren't lying when they said only a week ago or so "No one has figured out guardrails yet", because they apparently haven't either and Fable simply flat out rejects anything remotely connected to biology or security, no matter how trivial.
[-]
- zer00eyz 3 hours ago
  > At least Anthropic weren't lying when they said only a week ago or so "No one has figured out guardrails yet"
  Anthropics guardrails are the TSA saying "take off your shoes" while failing every test. https://oversightdemocrats.house.gov/news/press-releases/new...
  Anthropic owns the TOS... "If we think your involved in criminal activity were turning all your history over to the FBI/CIA/NSA/Local police". Then if their tooling was so good offering the same agency analysis tools to aid their experts in making some sort of decision.
  But their detection isnt that good, and their analysis isnt either... this is pure theater, to create buzz (no such thing as bad press) and make their tool look far better than it is.
  The reality is that, they arent even looking for the vectors that pose some of the largest risks in the modern era. And when someone uses it to do something terrible, they did not think of they are going to look dumb.
byzantinegene 2 hours ago
if it doesn’t let you do anything, the assumption might be that it could do everything, more hype generated
rebelnz 5 hours ago
Just tried to audit my own code base locally and was 'switched' due to my own creds/auth code ...
6thbit 3 hours ago
Would it be a costly process for Anthropic to re-tune those guardrails? Like, re-training sort of cost? or like coding session sort of cost?
_def 6 hours ago
The bio angle is crazy to think about - imagine a health crisis triggered by LLM. What a time we live in.
[-]
- tiborsaas 3 hours ago
  What's the risk here? If someone is skilled enough to produce said risk, do they need input from these models?
- catigula 5 hours ago
  This is all so amazing and good. These are exciting times we’re living in. Can’t wait to see what the future holds.
  [-]
  - lelandfe 5 hours ago
    Which part got you the most amped - "health crisis?"
Lammy 5 hours ago
I really hate the term “guardrails” for these limitations, since the purpose of a guardrail is to protect me, but these limitations exist to protect Anthropic.
Goofy_Coyote 2 hours ago
It even refuses to read my resume, so... yeah
jazz9k 11 hours ago
DeepSeek is the only one that I can directly ask about vulnerabilities and it will give me a PoC. Although not as good as others, it has helped me with security research.
The rest have guard rails that are so heavy, it makes them almost useless for cybersecurity.
[-]
- rolph 11 hours ago
  they [anthro] took the risk of looking like a toy, rather than possibly assist an exploit.
- epolanski 4 hours ago
  Deepseek training is not finished yet, it's a preview.
  And yes, it's an excellent model.
luxuryballs 4 hours ago
I can’t help but think that gimping itself for “security” is a marketing ruse and it’s not actually as “dangerous” as they want people to think it is.
siva7 5 hours ago
Fable is utterly useless with those guardrails for any serious it or life science work. Anthropic fucked me once a few months ago by closing down the subscription for any other harness, now it fucked me twice with buying again a subscription to find out their hyped model is unusable for normies. Using their products feels like a constant battle instead of a productive work day.. compare that with openai, not once did i feel like fighting against codex. Never again Anthropic..
[-]
- epolanski 4 hours ago
  What do you mean that it closed your subscription for any other harness?
  In any case that's what closed source (weights) for the masses means.
andrewstuart 2 hours ago
Stupid security theater. The only thing that makes sense would be zero restrictions.
SXX 2 hours ago
Software engineers shouldnt be happy either. If model silently sabotage cybersecurity research of others software there is abdolutely no way to be sure it wont be sabotaging cybersecurity of AI slop code it generated yesterday.
This is bad precedent and no one wants to pay X to generate code to then have to pay X*10 to figure out why your company just got hacked.
ChrisArchitect 1 hour ago
More discussion:
If Claude Fable stops helping you, you'll never know
https://news.ycombinator.com/item?id=48467896
and Related:
Claude Fable 5
https://news.ycombinator.com/item?id=48463808
aleksandrm 4 hours ago
It refuses to do any legitimate work that it thinks can remotely be related with "cybersecurity", it won't even read my Docker app logs to try and troubleshoot a problem. Absolute garbage!
jongjong 5 hours ago
It's frustrating as someone who has worked hard to produce succinct, secure software that I can't use it to prove my software's correctness but big companies with insecure code can use it to fix their tangled mess.
I already tested all earlier models against all my open source projects and they are yet to find a vulnerability so I'm keen to try out Mythos.
I've been waiting to be vindicated for years and finally we have a tool which can do it with high confidence but I don't have access.
Also, my code is minimal and highly succinct so it would prove correctness with even more confidence since each library/module and integration fully fits in the context window.
Like the Protobuf.js fiasco is just pure vindication for me because I was being looked down upon for choosing JSON as the interchange format. Turns out their software was insecure all this time... With a literal remote code execution vulnerability!
varispeed 3 hours ago
Surely if they are sabotaging the output, they shouldn't charge the same fee for tokens as if the output was not sabotaged?
This is looking like something for regulator to look at and probably a class action lawsuit in the making.
I think people should be getting refunds. Including for shenanigans with Opus.
dcl 4 hours ago
Deliberately producing misaligned and deceitful AI systems now. Great.
teaearlgraycold 4 hours ago
I'm being careful with it, but I haven't had Fable reject requests to "harden" my code or "find issues" in auth-related modules, which you could use on someone else's code to find vulnerabilities.
notepad0x90 5 hours ago
i think Anthropic is playing too fast-and-loose with the whole "no publicity is bad publicity" schtick.
m3kw9 3 hours ago
Could it now start to add unnoticeable security holes into your system if you start writing security type code.
hanzeweiasa 57 minutes ago
[flagged]
Keyframe 5 hours ago
[dead]
RedMagicBox 5 hours ago
[dead]
bschmidt400 3 hours ago
[dead]
felixgallo 5 hours ago
This is a clickbait article with a garbage title. From the actual article, the one quoted cybersecurity researcher is sane about it:
“But it is understandable as we are still in the early days and they are still adapting their guardrails. I am sure they are going to evolve over time as Anthropic and other frontier model companies will collaborate more with the current new generation of cybersecurity companies,” said Suiche, who is a member of the technical staff at Tolmo, an AI cybersecurity startup. “It’s better to catch more people than not enough when you do such a release and to relax the guardrails over time.”
[-]
- ofjcihen 5 hours ago
  I’m a cybersecurity researcher.
  Article seemed fine to me and echos a lot of me and my colleagues concerns.
  If you did regular malware analysis you would see that these groups already have access to LLMs that they’re using for development.
  What Anthropic is doing here is just hamstringing the good guys
  [-]
  - felixgallo 5 hours ago
    I'm a cybersecurity researcher! Can you explain how Anthropic is just hamstringing the good guys?
    [-]
    - ofjcihen 5 hours ago
      I did in my comment above.
      [-]
      - felixgallo 5 hours ago
        You said these groups have access to LLMs. So what? Mythos/Fable are a step change above most LLMs. Responsibly limiting access and easing it up over time safely is the sane move.
        [-]
        varispeed 3 hours ago
        How does it help?
        [-]
        esafak 2 hours ago
        By withholding it from bad actors.
rdiddly 5 hours ago
It's a marketplace. Someone else will outdo this inferior product.
[-]
- applfanboysbgon 4 hours ago
  That's exactly why Dario is begging the government to ban competitors.
  [-]
  - p-e-w 4 hours ago
    Unfortunately for him, his main competitors don’t fall under the jurisdiction of his government.
    [-]
    - esafak 2 hours ago
      Access and use of it does.
- autoexec 4 hours ago
  All they'll need is hundreds of billions of dollars, more RAM and GPUs than are currently available, and a huge number of environment destroying data centers. We're sure to be spoiled for choice!
- Fordec 4 hours ago
  The internet interprets censorship as damage and routes around it.
- enraged_camel 4 hours ago
  OpenAI is the only real competition. Chinese models are 6-8 months behind Opus 4.8/GPT 5.5, and at least a year or more behind Mythos.
  And it doesn't look like OpenAI will have a good answer to Mythos anytime soon. Based on what their chief scientist wrote to staff recently (https://archive.is/fN2pg), GPT 5.6 is a "meaningful improvement" over 5.5 - in other words, just a normal version bump. And no news or even rumors regarding GPT 6.
guardiangod 5 hours ago
I am using LLM to build some security tool, and I ran into this a few times. I have to come up with a reasoning to convince (?!!) Fable to continue the work without downgrading.
I assume Anthropic will continue to tune the model, so I am not too bothered by this.
[-]