I am a physics professor and often use Gemini to check my papers. It is a formidable tool: it was able to find a clerical error (a missing imaginary unit in a complex mathematical expression) I was not able to find for days, and it often underlines connections between concepts and ideas that I overlooked.
However, it often makes conceptual errors that I can spot only because I have good knowledge of the topic I am discussing. For instance, in 3D Clifford algebras it repeatedly confuses exponential of bivectors and of pseudoscalars.
Good to know that ChatGPT 5.5 Pro can produce a publishable paper, but from what I have seen so far with Gemini, it seems to me that it is better to consider LLMs as very efficient students who can read papers and books in no time but still need a lot of mentoring.
I'm no physics professor but this aligns with the way I use the tools in my "senior engineer" space. I bring the fundamentals to sanity-check the trigger-happy agent and try to imbue other humans with those fundamentals so they can move towards doing the same. It feels like the only way this whole thing will work (besides eventually moving to local models that do less but companies can afford).
This is close to my experience with code. LLMs can pick out small mistakes from giant code changes with surprising accuracy, or slowly narrow down a weird. On the other hand I've seen them bravely shoulder on under completely incorrect conceptual models of what they're working with and churn around in circles consequently, spin up giant piles of slop to re-implement something they decided was necessary, but didn't bother to search for, or outright dismiss important error signals as just 'transient failures'. Unlimited stamina, low wisdom.
I've been watching the automation of things like flight control systems for the past decade, and the evolution of the fallback to a real pilot in the event of a emergency is what's most concerning about where LLMs are being embedded.
Right now, we have a lot of smart people who have trained for decades to understand where these things go wrong and how to nudge them back, but the pool of people are going to slowly be replaced by less knowledgeable.
At some point, a rubicon will be crossed where these systems can't fallback to a human operator and will fail spectacularly.
As a TCS assistant professor from Eastern Europe, I always am a little jealous of the biggest names in math having such an easy access to the expensive, long thinking models.
Paying for Pro from any of my current academic budgets is completely ouf of the field of reality here -- all budgets tend to have restricted uses and software payments fit into very few categories. Effectively, I'd have to ask for a brand new grant and hope the grant rules allow for large software payments and I won't encounter an anti-AI reviewer; such a thing would take one year at least.
As a nail to the coffin, I was "denied" all Claude Opus recently as part of Microsoft's clampdown on individual (and academic) use of Copilot.
(Chagpt 5.5 Plus does not seem sufficient for any deeper investigations into new research topics, I've tried.)
You know what, I'm ashamed that I didn't think of this. I'll sponsor three months. Email in my hn profile. I don't understand the math in the article, but I'd love to help you make progress in it.
It’s a classic example of the best positioned people being in the best position to keep reaping all the rewards.
There’s the example of a poor person and a rich person buying boots. The poor person’s boots wear out and have to be replaced while the rich persons boots last for many years due to higher quality craftsmanship. Over years, the poor person’s boots wear will pay may for boots.
I know the example, but as a counter-argument: often more expensive boots are not more durable. It’s about spending time to learn to spot the quality.
Of course if you are really poor, then you have to take expensive shortcuts, but for most people that shouldn’t be the case. Learning to do more with less money isn’t as bad as many people think. It’s also good for the brain to be a bit more creative.
I fully understand your rant! I pay ~20€/month for the Pro account, as my university has a deal with Microsoft and only seems to recognize Copilot, so it’s very hard to use one own’s funding for paying something else.
The point is not that professors are poor. The point is that if this is used for research, it normally needs to come from an academic budget, not personal money.
And $200/month may look small from a U.S. perspective, but I looked up some average figures for Eastern European assistant professors. In Poland, for example, assistant professor base pay is around 73% of a professor’s base salary, roughly PLN 6,840/month gross, or about €1,500–1,600/month gross. At that level, a $200/month subscription can be around 10–15% of personal monthly income after accounting for taxes and local conditions.
I also work as a freelancer and sometimes work with professors. In my experience, academic budgets are often much tighter than people expect.
Average European salary is around $4000/month, in eastern Europe is half of that. Median is probably lower than that. Makes me want to quit visiting places like reddit where everybody claims to be making 100k+/year
All salary discussions need a cost of living context. Yes in Europe you earn a bit less but the public services are much better than in the US and one emergency (r.g. healthcare) won't ruin you as it's mostly a public system.
I'll take a Euro salary and qualify life over a FIRE-typs salary and daily fear of falling into the abyss any day.
Given the topic and the fact llm providers charge global rates, the absolute take-home money is much more relevant. Even if you live like a king on $1000/mo, 5.5 pro is still $200.
Their loss if they don't move to regional pricing. AI will continue to remain an upper-management luxury then, and won't reach the mass adoption required to justify their outsized valuations.
Regional pricing makes sense for products that don’t have ongoing costs or where most of the input cost can be offset by local labor. You’re not buying server racks nor electricity at 1/3 of the price to serve poorer markets
That’s what most people spend on their phone and Internet connections per month in the US. That’s what the average American family spends on just five days of food.
People spend much more than that on just commuting to work if you can spend $200 a month to supercharge what you do at work and 1000x your productivity it’s a no-brainer.
From what money? Just pause the health insurance for a while? Stop paying the rent? No diapers for the kid?
Your entire story only makes sense if you have many hundreds of dollars/euros of entirely disposable income every month left, after all unavoidable expenses (and maybe a cinema visit one day) have been paid for. I understand that this holds for you and everyone you know but I’d like you to appreciate that for very many people it doesn’t.
37% of Americans would be unable to cover a 400 usd unexpected expense* without using one or more credit cards. 13% would flat out be unable to cover it. [1]
Are you honestly saying most families would be able to justify 200 usd a month for ChatGPT?
There is a significant gap between what academics are paid across European countries, and since most top universities here are public institutions, you are right -- Eastern European government employees tend to be on the poorer side.
There are several other philosophical arguments against what you propose but I do not wish to go down that route.
Bruh, $200/m for most people in the US is also a hard "no!". That's a lot of money. Plus Anthropic isn't doing good deals with orgs that spend less than 250k a month. It's ridiculous.
It's a very long post with a mix of technical (math) and philosophical sections. Here are the most striking points to reflect upon IMHO.
> It seems to me that training beginning PhD students to do research [...] has just got harder, since one obvious way to help somebody get started is to give them a problem that looks as though it might be a relatively gentle one. If LLMs are at the point where they can solve “gentle problems”, then that is no longer an option. The lower bound for contributing to mathematics will now be to prove something that LLMs can’t prove, rather than simply to prove something that nobody has proved up to now and that at least somebody finds interesting.
Training must start from the basics though. Of course everybody's training in math starts with summing small integers, which calculators have been doing without any mistake since a long time.
The point is perhaps confirmed by another comment further down in the post
> by solving hard problems you get an insight into the problem-solving process itself, at least in your area of expertise, in a way that you simply don’t if all you do is read other people’s solutions. One consequence of this is that people who have themselves solved difficult problems are likely to be significantly better at using solving problems with the help of AI, just as very good coders are better at vibe coding than not such good coders
People pay coders to build stuff that they will use to make money and I can happily use an AI to deliver faster and keep being hired. I'm not sure if there is a similar point with math. Again from the post
> suppose that a mathematician solved a major problem by having a long exchange with an LLM in which the mathematician played a useful guiding role but the LLM did all the technical work and had the main ideas. Would we regard that as a major achievement of the mathematician? I don’t think we would.
>So if your aim in doing mathematics is to achieve some kind of immortality, so to speak, then you should understand that that won’t necessarily be possible for much longer — not just for you, but for anybody.
I don't know that it's that disappointing. I doubt most of the great mathematicians were actually doing it to achieve immortality. I suspect most of them were either after (possibly indirect) practical applications (via the math -> physics -> engineering pipeline) or just "for the love of the game", appreciation of the beauty of math and the intellectual joy of doing it. AI might also take over the practical application side, but the other aspects are still there for the taking.
Sports are safe. Machines came after runners (motogp, formula 1) and yet we cheer the winners of the 100 m at the Olympics Games. Fully autonomous bikes and cars won't change that. AIs destroy chess players. We still cheer the world champion.
Robot MotoGP would be amazing to see just how far the limits could be pushed without risking the life of a human though. Or even full size remote control.
After reading this post, I have to admit that I could not understand the mathematical parts at all because they are beyond my current knowledge.
But one thing seems clear to me. If I try to describe the situation in mathematics presented here, it sounds like there were already precedents or existing pieces of knowledge, but humans had not thought to connect them. AI seems to have helped make that connection.
If AI can connect different fields in this way, then perhaps something even more significant could emerge from it.
That said, I could not understand most of the article. And if using LLMs properly requires this level of background knowledge, I honestly worry about whether I can really use them well.
Basically medical science too. My wife was able to diagnose her own anemia that the doctors kept missing, and has since been able to have iron infusions.
The human doctors kept ignoring the signals, kept putting it down to 'diet' and 'exercise' (even though she does plenty of both)
I saw Tim Gowers give a talk at the AMS-MAA joint meeting in Seattle about ten years ago where he predicted that in 100 years humans would no longer be doing research mathematics. I wonder if he’s adjusted his timeline.
At the time I thought the key missing tool was a natural language search that acted like mathoverflow, where you could explain your problem or ideas as you understood them and get references to relevant literature (possibly outside your experience or vocabulary).
I feel like this experiment was successful because those prompting the AI were knowledgeable enough to ask the right questions and verify the output was correct. This shows that there is still a place for expertise, even if the LLM does the actual research.
I feel my input to LLMs is most valuable in the initial idea, big picture design tweaks, and the vast majority of my usefulness is negative feedback. This looks wrong, you've gotten off track, you're cheating with workarounds, you're falling into a rabbithole, etc.
> Here’s a thought experiment: suppose that a mathematician solved a major problem by having a long exchange with an LLM in which the mathematician played a useful guiding role but the LLM did all the technical work and had the main ideas. Would we regard that as a major achievement of the mathematician? I don’t think we would.
This is a cultural choice. It makes sense that in the mathematics culture we currently have, this is alien. But already, other fields, and many individuals, would disagree and say that the human did have a major achievement here.
As long as human-AI collaborations are producing the best results, there is meaningful contribution by the humans, and people that are deeper experts and skilled LLM whisperers should be able to make outsized contributions. The real shoe drops when pure AI beats humans and human-AI collaboration.
I replied to a comment about AI in sports and I build on that.
We praise car drivers despite most of the performance in their sport comes from the car. The driver makes the difference when two cars are close in performance. Brilliances or mistakes. Horse riders too.
In the case of math, the human can lead the LLM on the right track, point it to a problem or to another one. So it deserves some praise.
Then the team that built the car, cared about the horse, built the AI might deserve even more praise but we tend to care more about the single most visible human.
As a graduate student, this piece made me sad. I always believed that my work speaks for itself and transcends beyond my limited time on this cosmic experience. This notion of immortality was just a small intangible bonus I hoped for when I jumped into grad school. AI is making me feel less worthy.
You are worthy. You will hone your skills in grad school and be able to command these AIs better than somebody who hasn’t struggled with hard problems for a long time.
Sorry, I'm reposting a comment I made yesterday that seems fitting:
> This reminds me of Antirez's "Don't fall into the anti-AI hype". In a sentence: These foundation models are really good at optimizing these extremely high level, extremely well defined problem spaces (ie multiply matrices faster). In Antirez's case, it's "make Redis faster".
On complex problems with lengthy proofs, the first step that I would have done is to ask 5.5 pro in a new, unrelated, session, to be very critical, to try to find flaws in the arguments.
And certainly not to send it to a fellow colleague to ask its opinion first.
LLMs are certainly becoming capable to code, find vulnerabilities, solve mathematical problems, but we need to avoid putting their works in production, or in front of other humans, without assessing it by any possible mean.
Otherwise tech leads, maintainers, experts get overwhelmed and this is how the « AI slop » fatigue begins.
To be clear I’m talking about this step:
> That preprint would have been hard for me to read, as that would have meant carefully reading Rajagopal’s paper first, but I sent it to Nathanson, who forwarded it to Rajagopal, who said he thought it looked correct.
> but we need to avoid putting their works in production, or in front of other humans, without assessing it by any possible mean.
I think this is good advice in general, maybe with an emphasis on public vs. private, friendly contact. Having 0 thought AI slop thrown at you out of the blue is rude. "could have been a prompt" indeed. But having a friend/colleague ask for a quick glance at something they know you handle well is another story for me.
If I've worked on a subject for a few years, and know the particulars in and out, I'd have no trouble skimming something that a friend or a colleague sent me. I am sparing those 5-10 minutes for the friend, not for what they sent. And for an expert in a particular domain, often 5 minutes is all it takes for a "lgtm" or "lol no".
I honestly can't say this isn't AGI anymore.
AGI shouldn't be a bar so taboo that it has to be at the extreme capability in every domain. What human is?
This is as AGI as it needs to be to get my vote. And it's scary.
I wish people would stop generating stuff they don't understand only to forward it to someone who does. Something about that really rubs me the wrong way.
M3 module was formalized fully purely from experimental data and from a nudge by earlier versions of codex in 15-30 minutes in a simple write/compile/fix-first-error loop. I was a bit surprised how fast it picked up the pattern but given there was a paper from '70s it became clear why later.
This is certainly interesting, though I would say that based on my understanding of how the current models work combinatorial problems would be an area where they could be particularly successful. They are pretty good at combinatorial creativity - its the exploratory and transformational aspects that are still pretty tricky, and I expect would come to bear in other areas of mathematics.
Undergraduate? No. We've had calculators able to solve undergraduate problems for decades. AI doesn't change the need to understand how calculus works any more than calculators did. The foundations remain valuable.
90% of the final grade are in room examinations with proctors, maybe two sets of exams of midterms and finals that the vast majority of the final grade comes from. This is already how most of East and South Asia does it anyways and it’s probably the best.
For publications and theses, as long as the final results hold and can be replicated and validated, I don’t see why we shouldn’t allow the wholesale use of LLMs
I don’t think it’s just mathematics. We don’t hear enough about this, but if I think back to my undergraduate years, which were less than 10 years ago, every homework assignment and every take-home exam I had would be trivial for LLMs to solve at this point I wonder what is actually happening on the ground.
However, it often makes conceptual errors that I can spot only because I have good knowledge of the topic I am discussing. For instance, in 3D Clifford algebras it repeatedly confuses exponential of bivectors and of pseudoscalars.
Good to know that ChatGPT 5.5 Pro can produce a publishable paper, but from what I have seen so far with Gemini, it seems to me that it is better to consider LLMs as very efficient students who can read papers and books in no time but still need a lot of mentoring.
Right now, we have a lot of smart people who have trained for decades to understand where these things go wrong and how to nudge them back, but the pool of people are going to slowly be replaced by less knowledgeable.
At some point, a rubicon will be crossed where these systems can't fallback to a human operator and will fail spectacularly.
Paying for Pro from any of my current academic budgets is completely ouf of the field of reality here -- all budgets tend to have restricted uses and software payments fit into very few categories. Effectively, I'd have to ask for a brand new grant and hope the grant rules allow for large software payments and I won't encounter an anti-AI reviewer; such a thing would take one year at least.
As a nail to the coffin, I was "denied" all Claude Opus recently as part of Microsoft's clampdown on individual (and academic) use of Copilot.
(Chagpt 5.5 Plus does not seem sufficient for any deeper investigations into new research topics, I've tried.)
Apologies for the rant.
https://pastebin.com/hNYrCjhL
I probably will erase the contents in a few days.
Even if you just drop an email and it doesn't work out, I appreciate this gesture so much. Thank you.
There’s the example of a poor person and a rich person buying boots. The poor person’s boots wear out and have to be replaced while the rich persons boots last for many years due to higher quality craftsmanship. Over years, the poor person’s boots wear will pay may for boots.
Of course if you are really poor, then you have to take expensive shortcuts, but for most people that shouldn’t be the case. Learning to do more with less money isn’t as bad as many people think. It’s also good for the brain to be a bit more creative.
And $200/month may look small from a U.S. perspective, but I looked up some average figures for Eastern European assistant professors. In Poland, for example, assistant professor base pay is around 73% of a professor’s base salary, roughly PLN 6,840/month gross, or about €1,500–1,600/month gross. At that level, a $200/month subscription can be around 10–15% of personal monthly income after accounting for taxes and local conditions.
I also work as a freelancer and sometimes work with professors. In my experience, academic budgets are often much tighter than people expect.
And the situation is better, ten years ago it would have been 80%.
I'll take a Euro salary and qualify life over a FIRE-typs salary and daily fear of falling into the abyss any day.
Your entire story only makes sense if you have many hundreds of dollars/euros of entirely disposable income every month left, after all unavoidable expenses (and maybe a cinema visit one day) have been paid for. I understand that this holds for you and everyone you know but I’d like you to appreciate that for very many people it doesn’t.
Are you honestly saying most families would be able to justify 200 usd a month for ChatGPT?
https://www.federalreserve.gov/publications/2025-economic-we...
There are several other philosophical arguments against what you propose but I do not wish to go down that route.
> It seems to me that training beginning PhD students to do research [...] has just got harder, since one obvious way to help somebody get started is to give them a problem that looks as though it might be a relatively gentle one. If LLMs are at the point where they can solve “gentle problems”, then that is no longer an option. The lower bound for contributing to mathematics will now be to prove something that LLMs can’t prove, rather than simply to prove something that nobody has proved up to now and that at least somebody finds interesting.
Training must start from the basics though. Of course everybody's training in math starts with summing small integers, which calculators have been doing without any mistake since a long time.
The point is perhaps confirmed by another comment further down in the post
> by solving hard problems you get an insight into the problem-solving process itself, at least in your area of expertise, in a way that you simply don’t if all you do is read other people’s solutions. One consequence of this is that people who have themselves solved difficult problems are likely to be significantly better at using solving problems with the help of AI, just as very good coders are better at vibe coding than not such good coders
People pay coders to build stuff that they will use to make money and I can happily use an AI to deliver faster and keep being hired. I'm not sure if there is a similar point with math. Again from the post
> suppose that a mathematician solved a major problem by having a long exchange with an LLM in which the mathematician played a useful guiding role but the LLM did all the technical work and had the main ideas. Would we regard that as a major achievement of the mathematician? I don’t think we would.
This made me a little sad
https://www.youtube.com/watch?v=VVEzgYxDdrc
We care about sports with humans.
But one thing seems clear to me. If I try to describe the situation in mathematics presented here, it sounds like there were already precedents or existing pieces of knowledge, but humans had not thought to connect them. AI seems to have helped make that connection.
If AI can connect different fields in this way, then perhaps something even more significant could emerge from it.
That said, I could not understand most of the article. And if using LLMs properly requires this level of background knowledge, I honestly worry about whether I can really use them well.
The human doctors kept ignoring the signals, kept putting it down to 'diet' and 'exercise' (even though she does plenty of both)
A lot of math research is like that. And, like the blog post suggests, problems one gives PhD students are 95% like that.
Most of what I do is just assemble things that other people have already built.
We used to call that "low hanging fruit."
At the time I thought the key missing tool was a natural language search that acted like mathoverflow, where you could explain your problem or ideas as you understood them and get references to relevant literature (possibly outside your experience or vocabulary).
This is a cultural choice. It makes sense that in the mathematics culture we currently have, this is alien. But already, other fields, and many individuals, would disagree and say that the human did have a major achievement here. As long as human-AI collaborations are producing the best results, there is meaningful contribution by the humans, and people that are deeper experts and skilled LLM whisperers should be able to make outsized contributions. The real shoe drops when pure AI beats humans and human-AI collaboration.
We praise car drivers despite most of the performance in their sport comes from the car. The driver makes the difference when two cars are close in performance. Brilliances or mistakes. Horse riders too.
In the case of math, the human can lead the LLM on the right track, point it to a problem or to another one. So it deserves some praise.
Then the team that built the car, cared about the horse, built the AI might deserve even more praise but we tend to care more about the single most visible human.
> This reminds me of Antirez's "Don't fall into the anti-AI hype". In a sentence: These foundation models are really good at optimizing these extremely high level, extremely well defined problem spaces (ie multiply matrices faster). In Antirez's case, it's "make Redis faster".
And certainly not to send it to a fellow colleague to ask its opinion first.
LLMs are certainly becoming capable to code, find vulnerabilities, solve mathematical problems, but we need to avoid putting their works in production, or in front of other humans, without assessing it by any possible mean.
Otherwise tech leads, maintainers, experts get overwhelmed and this is how the « AI slop » fatigue begins.
To be clear I’m talking about this step:
> That preprint would have been hard for me to read, as that would have meant carefully reading Rajagopal’s paper first, but I sent it to Nathanson, who forwarded it to Rajagopal, who said he thought it looked correct.
I think this is good advice in general, maybe with an emphasis on public vs. private, friendly contact. Having 0 thought AI slop thrown at you out of the blue is rude. "could have been a prompt" indeed. But having a friend/colleague ask for a quick glance at something they know you handle well is another story for me.
If I've worked on a subject for a few years, and know the particulars in and out, I'd have no trouble skimming something that a friend or a colleague sent me. I am sparing those 5-10 minutes for the friend, not for what they sent. And for an expert in a particular domain, often 5 minutes is all it takes for a "lgtm" or "lol no".
This is as AGI as it needs to be to get my vote. And it's scary.
Does the author know about CAISc 2026 [0]?
[0]: https://caisc2026.github.io
https://github.com/vjeranc/fixed-rtrt
M3 module was formalized fully purely from experimental data and from a nudge by earlier versions of codex in 15-30 minutes in a simple write/compile/fix-first-error loop. I was a bit surprised how fast it picked up the pattern but given there was a paper from '70s it became clear why later.
Graduate? Yes.
For publications and theses, as long as the final results hold and can be replicated and validated, I don’t see why we shouldn’t allow the wholesale use of LLMs
Maybe if you find AI to be doing stuff you find impressive, the stuff you were doing wasn't that impressive? Worth ruminating on your priors at least.