It is actually worse than that. It is at least 30 days. There is an "almost" that is doing a ton of heavy lifting here "deletion after 30 days in almost all cases". My read of that is they can hang onto data for as long as they want, even if they usually won't. And "all traffic" with an agentic harness is basically your entire codebase you work on.
> We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases (see this post for further details). The data will help us defend against complex and novel attacks (including new jailbreaks and attacks that operate across many requests) as well as help us identify and reduce false positives.
That's strange. Even in my hobby-toy app, I have a TOS that I bump whenever the terms meaningfully change, and in my app, it forces a re-acceptance of the new terms before using the app again.
It’s even worse than that. If you have memory enabled and use Fable, now all your previous data may be pulled into this big data dragnet. How can Anthropic possibly think this is okay?
I cannot help wondering if the 'we won't train on your data' applies across the fence over there in pentagon land, where the classified contracts be. Yeah, of course they are not connected. Or..
Present user-llm activity is a goldmine of intel the agencies literally spent lives and billions on getting hardly close to, yet they elect to just let this one slip by..
Maybe. Really, I don't dispute it.
But why? It's what, or precisely what, they always dreamed of.
I don't know why you'd read literally the last 25 years of leaks from mass surveillance programs and think for one moment that they've just, gosh, overlooked the opportunities.
We've already gone through ECHELON, USAPATRIOT, TIA, PRISM, etc.. Either learn from the pattern and and plan accordingly, or be one of the credulous rubes caught off guard in the next wave of leaks.
One of the major attack vectors is distillation, where millions of questions are auto-generated and coordinated to produce training data for new LLMs. Anthropic alleges Minimax, Deepseek and Kimi were trained this way. Deepseek 4 compares favorably to Opus, so they're probably trying to prevent Deepseek 5 from being a bootleg Mythos. https://www.anthropic.com/news/detecting-and-preventing-dist...
It takes a lot of audacity to train on all the data you can without any license, attribution, etc and then act like you can own the outputs of the model so that someone else doesn't make a model from your data without a license. I've lost a lot of respect for Anthropic in the last 24 hours.
Everyone knows it's bullshit but because these companies are being valued at a trillion dollars a piece, it's hard to say that if you were in their shoes you'd do any differently.
This may surprise the cohort on hacker news but there are large amounts of people on this planet that value things beyond money like ethics or having principles. Excusing absolutely repugnant behavior because of money to be made is so deeply antihuman, but then again most people working at LLM companies are deeply antihuman to start with.
Distillation is not an "attack", despite Anthropic themselves coining the self-serving phrase "distillation attack". And as others have noted, it is precisely identical to the sort of "attack" on published works which Anthropic themselves used to train their models.
It's only for this model, not the one you're already using. And they're not training on the data. It's supposedly to detect abuse etc (such as someone retrying repeatedly with different variations to get around their protections)
Maybe, but to do so they'd need to offer new terms of service and we'd have to accept. I believe they'd lose a lot of their core business market if they did so.
Yes, that is your intended purpose of “git push”, it’s to save. And only if you use GitHub.
A better analogy here is probably “every time you use VS Code, the files you edit get sent to Microsoft”.
Some legitimate concerns:
• You have trade secrets. Previously; you can use services like Bedrock, etc, with signed contracts and significant reputations. Your contract is between AWS and you, and stays within your AWS security boundary.
• Security breaches. Remember when Anthropic accidentally published the source tree of Claude code? Or Meta’s recent AI recovery bot that didn’t check if the supplied recovery email was actually the email of the Instagram account? The best way to reduce your exposure is to minimise storage.
• Weaponised T&S. For example what if Anthropic decided to build a classifier for “usage in unsupported regions” that’s super overbearing (as we see with Fable) and vacuums up all context/input/output if there’s Mandarin? Contractually they could now retain it forever, not just 30 days, for ‘trust and safety purposes’ and perhaps have AI scan for any new or interesting ML techniques at scale, for Anthropic’s own use? They say just can’t train Claude models on the data.
A startup that uses agentic coding tools such as Claude Code or Codex is packaging up their entire codebase and sending it directly to their LM provider. Depending on their product, they might be sending it directly to a potential competitor.
people over-rate how much software/IP is useful in running a successful business. There are genuinely very few IP in this world that needs to be protected. Everyone else is running stupid CRUD apps
They also over index fear of LargeCo stealing IP from SmallCo. In fact, LargeCo is typically more scared about even the possibility of any product team looking at competitor internals due to lawsuits.
I’d be more scared of a data leak due to LargeCo being hacked than I would about LargeCo prying into the data.
What I don’t trust LargeCo with is personal information. I’ve heard too many horror stories about Govs and LargeCos swapping customer nudes or stalking ex’s to be comfortable with anything personal on those systems. But that’s a whole different topic.
actuaries look for data. visionaries take leaps in faith.
There was no data proving LLMs will work at scale.
Google waited for the Data. OpenAI and then Anthropic took the leap of faith.
The result is there for all to see.
The core attribute of a successful AI Researcher was were they AGI-pilled and not were they waiting for data for unknown unknowns?
Yes, it certainly is an odd situation when some people believe you cannot use Mythos-class models because security while others believe you must do code reviews with Mythos-class models because security.
they would kill their own product if they did this
it would be like if tsmc started designing their own chips to compete with the people they sell their services to, they have more to gain by limiting their participation to a specific corner
Fortunately I can't use Fable anyway, since their hyperactive content flaggers do not let you work on anything remotely biological or medical related (i.e. parse a CSV with some medical content, nope, you're probably a bioterrorist) and you get downgraded to Opus immediately.
I'm not even working on anything biological/medical, almost all PyTorch work is getting flagged (not even a safety notice and a downgrade, just an outright refusal with "this is against our ToS").
My 2 cents is that doctors people with lots of money and very specific needs who generally don't really go for tech jobs, so they're probably planning to create a separate monetization tier.
That, or alternatively, Mythos is so good at medical stuff, that it cam replace a lot of physician work 90% of the time, pissing off doctors, while the remaining 10% would result in very expensive lawsuits.
Third alternative: Mythos is so catastrophically bad at medical tasks that attempting to use it for medical research would instead create bioweapons. ;)
> That, or alternatively, Mythos is so good at medical stuff, that it cam replace a lot of physician work 90% of the time, pissing off doctors
Well they definitely don’t give a teaspoon of shit about putting people out of work by hawking munged-up versions of those people’s data, which was involuntarily ‘ingested’ for the benefit of society (in a way that happened to fuel a centabillion dollar industry.) So it’s prolly not that one.
Yes! I have hit the same brick wall. What sort of idiots are doing this? Honestly, I have no idea. And just before their IPO. SO far Anthropic marketing has been perfect and spotless. This is serious slipup.
>To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions. With more capable models arriving in the coming months, we’re working to improve our safeguards and reduce false positives as quickly as we can.
They don't want the real risk of someone using it to make biological or genetically targeted weapons, and they don't want the social risk of someone asking it a bunch of leading questions in order to 'prove' some racist thesis or to 'prove' Mythos is woke if it declines to along with their performative inquiry.
Let's face it, if some rando comes up to and asks if you have a few minutes to talk about population biology there's a good chance they're a kook.
Are they really burning good will? For many users this is a deal breaker. But for the general public, politicians, etc they’re stamping “safety” on their brand.
They refused to allow autonomous weapons and domestic surveillance. They were fine with use in weapons with a human in-the-loop and with surveilling non-US nations.
« Trust us, we’re doing this for the good of humanity » (fills pockets with stock value and externalities from data center polloution) « No seriously trust us , at least we’re not Sam Altman »
Update: « Oh and we’re the only ones who will stop AI from turning into SkyNet and eating your babies, you just have to pay us to make sure we invent SkyNet first »
Anthropic is desperate for the IPO and will release a half-baked product that they are so afraid to release, you can literally feel the shiver through the text of their press-release.
Now they want to have any way of either fixing it, or in case someone will actually make a big boo-boo with their model, to be able to blame the guy in the end.
Given the model intelligence plateau and public data exhaustion the only way to improve in customer use cases is by training the model on customer data.
If this is true, than Anthropic, Google and maybe OpenAI models will keep getting better and better and everyone else will be left in the dust - as they won't have access to so much customer data.
If they weren’t storing, they’d be oblivious to what customers are doing, making this kind of detection impossible. What data did they train their classifier on, if not real user (distiller) traffic?
They basically said "Deepseek ran 150,000 requests and here's the gist of one of their prompts". Anthropic doesn't know which accounts are Deepseek proxies beforehand, so definitely sounds like retrospective analysis of broad user logs to me.
Of course Anthropic realizes saying this straight is problematic so they said they examined request metadata, but no, I don't think they can get this kind of insight from metadata (token counts, request time, etc.)
Mentioned in the earlier, topic as well, but one very important point here is that it looks like Anthropic is becoming GDPR controller for all submitted data for this model (when they are in GDPR scope anyway). So data subjects would have Article 15 right to request information about processing and possibly a copy of the data. Latter might be contested under "rights of others", but former is more absolute.
What this means it that if someone makes an Article 15 request, they would be entitled to know if Anthropic holds personal data about them and also from who they received this data at minimum.
If someone wants to do that, I would recommend combining it with Article 18 request to forbid deleting the data for legal claim in case you contest Anthropic's reply. Otherwise they could just delete the data per their retention policy and DPA would find much later that they no longer hold the data.
Another issue here is that their DPA frames everything as controller-to-processor, i.e. they do not appear to have SCCs in place to actually receive this personal data as controller. So the original exporter would likely also be in breach if they send any GDPR covered personal data to this model.
Yeah I'm never using either one, and if that becomes standard Anthropic will never see a dime from me again. I'm going to draw the line in the sand right there.
I guess the better question would be if you are under and NDA and using an online model, are you already violating it but does this violate it further?
Google Workspaces and Dropbox have an IL5-compliant offering, which means they attest that they will not do exactly this (and are audited on that). Not sure about iCloud and Notion.
Your NDAs prohibit emailing a colleague about the e.g. project, or discussing it in a Slack DM with the client, or tracking progress on it in JIRA? You have to do NDA’d work exclusively with local tools or end-to-end encryption? Those are some difficult NDAs!
Oh Lord yes. We have very specific communications channels we're allowed to use about any of our sensitive products, and that's only the unclassified stuff (classified is obviously its own, stricter, beast).
We use inhouse on-premises email, issue tracking, and messaging. Depending on the project, external communication does require E2EE email. Development happens on local hardware and software unless required otherwise by the customer.
I’m pretty sure (even just based on the revenue of various SaaS products) that’s not typical, hence “most NDAs”. I’m also sure some require a SCIF, but that’s not most of them.
No this is still the level below needing a SCIF. The USG really tightened this stuff up in the 2010s and highly restricts what you can do with CUI. That's why there's a whole parallel FedRamp-compliant cloud ecosystem.
But in terms of how common it is, pretty much everybody in Fairfax County works in a company with rules like this; it's a big part of why the tech culture is so different than Austin or SFO.
This could be a big issue for firms with strict GDPR criteria:
"This change only applies to organizations that have set up workspaces with zero data retention (ZDR) in Claude Console, use Claude Code with ZDR in Claude Enterprise, or access Claude through AWS Bedrock, Google Cloud Agent Platform, or Microsoft Foundry with ZDR. The rest of this article applies only to these organizations."
There were two (expensive) exceptions / alternatives so far: Bedrock and Vertex. Their Zero Data Retention was in fact contractually enforced. Now it is all f...d because of these morons at Anthropic. For now I am better off just using DS via their API.
This is just a tragic moment for Tech. We just killed AI privacy. OpenAI already follows this trend and others will do too.
It’s not binary. With AWS previously you have contractual guarantees with a third party, that’s been in business for a couple decades, which explicitly state zero seconds of data retention - only as long as needed for inference.
Consider the security angle too. You now have to rely on Anthropic’s infrastructure security. You did not previously when you used Bedrock/Vertex/etc.
I am definitely for services respecting customer privacy, but I can't help if this is different. I recently saw a thread where a person was bragging that frontier providers were blocking their attempt at what looked like to be social media de-anonymization and blackmailing app.
Maybe this isn't different than using something like Google Sheets to keep a list of people to dox and blackmail, but the leverage certainly makes it feel different.
I mean not just the part 30 days data retention but I think the serious trade of this product is just the token efficiency. They trade it for precision. The claims that they make that it found a 30 year software bug from millions of lines of code is just precision. To human it's looks like a lot but for it it's just the ablity to process (token processing). Let's see how long it runs. Peace.
Reminder: FISA Section 702, aka FAA702, aka PRISM, aka the #1 most used collection source by the US IC, allows *warrantless* realtime access for the US federal government to everything Anthropic, OpenAI, Google, Apple, Microsoft, Amazon, and Meta have on you.
I'm talking about scouring Twitter/LinkedIn and look at posts from employees who say SOTA model is banned. Look at what the business do. Copy it using SOTA. Call their clients with 30% discount and faster turnaround and higher quality product.
It is complicated, but I can get Private Equity of even VCs to fund this idea.
tl;dr -- I'm actually agreeing with you. Anthropic will never copy your business model due to NDA. But there are plenty of fearmongering about they copying you and because of which you won't use their models. If their models are genuinely SOTA you can use that information to your advantage and crush scaredy-cats.
Edit: The fact that these get downvoted is exactly the reason why it's easy to win
All I can say to my team (and my clients): "f...k Anthropic". They've just put both Bedrock and Vertex on slippery slope of "we don't collect your prompts. period. ... comma ... except ..."
Right now we have changed the code of all our agents to data retention mode 'none' (Note: not "default" or "inherited", this is not enough now!) and we are fighting with GCP doco to set similar things for Vertex.
All he pre-publicity from Anthropic was about how it was amazing at finding security vulnerabilities, so it's not a stretch to think that some people would want to exploit that for nefarious purposes.
> We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases (see this post for further details). The data will help us defend against complex and novel attacks (including new jailbreaks and attacks that operate across many requests) as well as help us identify and reduce false positives.
These terms seem to be updated at-will, so I'll take that with a grain of salt however.
Present user-llm activity is a goldmine of intel the agencies literally spent lives and billions on getting hardly close to, yet they elect to just let this one slip by..
Maybe. Really, I don't dispute it.
But why? It's what, or precisely what, they always dreamed of.
When they literally just showed you they are being deceptive by sneaking in the weasel word “almost”?
Secondly, like all contracts I'm sure there will be exceptions for holding data longer than 30 days with reasonable cause, eg a legal hold.
How would you know that? You can only know what they say they will do with the data.
A better analogy here is probably “every time you use VS Code, the files you edit get sent to Microsoft”.
Some legitimate concerns:
• You have trade secrets. Previously; you can use services like Bedrock, etc, with signed contracts and significant reputations. Your contract is between AWS and you, and stays within your AWS security boundary.
• Security breaches. Remember when Anthropic accidentally published the source tree of Claude code? Or Meta’s recent AI recovery bot that didn’t check if the supplied recovery email was actually the email of the Instagram account? The best way to reduce your exposure is to minimise storage.
• Weaponised T&S. For example what if Anthropic decided to build a classifier for “usage in unsupported regions” that’s super overbearing (as we see with Fable) and vacuums up all context/input/output if there’s Mandarin? Contractually they could now retain it forever, not just 30 days, for ‘trust and safety purposes’ and perhaps have AI scan for any new or interesting ML techniques at scale, for Anthropic’s own use? They say just can’t train Claude models on the data.
Odd times we are living in!
They also over index fear of LargeCo stealing IP from SmallCo. In fact, LargeCo is typically more scared about even the possibility of any product team looking at competitor internals due to lawsuits.
What I don’t trust LargeCo with is personal information. I’ve heard too many horror stories about Govs and LargeCos swapping customer nudes or stalking ex’s to be comfortable with anything personal on those systems. But that’s a whole different topic.
I bet if you gave them the Codebase of the Gods, it’d be a heap of hacks inside a couple months.
Indeed, by a couple trillions...
That seems to be a bold statement considering the whole business of this LargeCo is based on stolen IP.
Oh, what a whimsical aphorism.
Literally how LLMs will continue to learn to code and easily replace whatever you build with them.
Incredible that you could so blithely misunderstand this
Your email domain is significantly more important than whatever is in your corporate GitHub repositories.
it would be like if tsmc started designing their own chips to compete with the people they sell their services to, they have more to gain by limiting their participation to a specific corner
That, or alternatively, Mythos is so good at medical stuff, that it cam replace a lot of physician work 90% of the time, pissing off doctors, while the remaining 10% would result in very expensive lawsuits.
Well they definitely don’t give a teaspoon of shit about putting people out of work by hawking munged-up versions of those people’s data, which was involuntarily ‘ingested’ for the benefit of society (in a way that happened to fuel a centabillion dollar industry.) So it’s prolly not that one.
>To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions. With more capable models arriving in the coming months, we’re working to improve our safeguards and reduce false positives as quickly as we can.
Let's face it, if some rando comes up to and asks if you have a few minutes to talk about population biology there's a good chance they're a kook.
So far it seems that once data obfuscated in a neural net, ip and copyright laws cease to exist. Unlike MP3, MP4, PDF.
AWS Bedrock to require sharing data with Anthropic for Mythos and future models - https://news.ycombinator.com/item?id=48473166 - June 2026 (223 comments)
Update: « Oh and we’re the only ones who will stop AI from turning into SkyNet and eating your babies, you just have to pay us to make sure we invent SkyNet first »
Now they want to have any way of either fixing it, or in case someone will actually make a big boo-boo with their model, to be able to blame the guy in the end.
If they weren’t storing, they’d be oblivious to what customers are doing, making this kind of detection impossible. What data did they train their classifier on, if not real user (distiller) traffic?
Of course Anthropic realizes saying this straight is problematic so they said they examined request metadata, but no, I don't think they can get this kind of insight from metadata (token counts, request time, etc.)
What this means it that if someone makes an Article 15 request, they would be entitled to know if Anthropic holds personal data about them and also from who they received this data at minimum.
If someone wants to do that, I would recommend combining it with Article 18 request to forbid deleting the data for legal claim in case you contest Anthropic's reply. Otherwise they could just delete the data per their retention policy and DPA would find much later that they no longer hold the data.
Another issue here is that their DPA frames everything as controller-to-processor, i.e. they do not appear to have SCCs in place to actually receive this personal data as controller. So the original exporter would likely also be in breach if they send any GDPR covered personal data to this model.
I guess the better question would be if you are under and NDA and using an online model, are you already violating it but does this violate it further?
But in terms of how common it is, pretty much everybody in Fairfax County works in a company with rules like this; it's a big part of why the tech culture is so different than Austin or SFO.
This is just a tragic moment for Tech. We just killed AI privacy. OpenAI already follows this trend and others will do too.
The only hope now is ... tada .. Mistral LOL
Consider the security angle too. You now have to rely on Anthropic’s infrastructure security. You did not previously when you used Bedrock/Vertex/etc.
Maybe this isn't different than using something like Google Sheets to keep a list of people to dox and blackmail, but the leverage certainly makes it feel different.
Step 2: Use SOTA models to copy them and crush them
Step 3: Profit.
(Yes, not every business is easily replicable, but you sure can find some)
I'm talking about scouring Twitter/LinkedIn and look at posts from employees who say SOTA model is banned. Look at what the business do. Copy it using SOTA. Call their clients with 30% discount and faster turnaround and higher quality product.
It is complicated, but I can get Private Equity of even VCs to fund this idea.
tl;dr -- I'm actually agreeing with you. Anthropic will never copy your business model due to NDA. But there are plenty of fearmongering about they copying you and because of which you won't use their models. If their models are genuinely SOTA you can use that information to your advantage and crush scaredy-cats.
Edit: The fact that these get downvoted is exactly the reason why it's easy to win
Right now we have changed the code of all our agents to data retention mode 'none' (Note: not "default" or "inherited", this is not enough now!) and we are fighting with GCP doco to set similar things for Vertex.
This is just terrible.
Would you elaborate? Not sure what you're describing