Rendered at 06:12:42 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
Validark 2 hours ago [-]
I have long said I am an AI doubter until AI could print out the answers to hard problems or ones requiring tons of innovation. Assuming this is verified to be correct (not by AI) then I just became a believer. I would like to see a few more AI inventions to know for sure, but wow, it really is a new and exciting world. I really hope we use this intelligence resource to make the world better.
snemvalts 1 hours ago [-]
Math and coding competition problems are easier to train because of strict rules and cheap verification.
But once you go beyond that to less defined things such as code quality, where even humans have hard time putting down concrete axioms, they start to hallucinate more and become less useful.
We are missing the value function that allowed AlphaGo to go from mid range player trained on human moves to superhuman by playing itself.
As we have only made progress on unsupervised learning, and RL is constrained as above, I don't see this getting better.
zozbot234 53 minutes ago [-]
This is not formally verified math so there is no real verifiable-feedback aspect here. The best models for formalized math are still specialized ones. although general purpose models can assist formalization somewhat.
jack_pp 42 minutes ago [-]
Maybe to get a real breakthrough we have to make programming languages / tools better suited for LLM strengths not fuss so much about making it write code we like. What we need is correct code not nice looking code.
kube-system 7 minutes ago [-]
If you can’t validate the code, you can’t tell if it’s correct.
eru 41 minutes ago [-]
Lean might be a step in that direction.
kuerbel 20 minutes ago [-]
Yes yes
Let it write a black box no human understands. Give the means of production away.
otabdeveloper4 49 minutes ago [-]
LLMs can often guess the final answer, but the intermediate proof steps are always total bunk.
When doing math you only ever care about the proof, not the answer itself.
jamesfinlayson 24 minutes ago [-]
Yep, I remember a friend saying they did a maths course at university that had the correct answer given for each question - this was so that if you made some silly arithmetic mistake you could go back and fix it and all the marks were for the steps to actually solve the problem.
eru 40 minutes ago [-]
Once you have a working proof, no matter how bad, you can work towards making it nicer. It's like refactoring in programming.
If your proof is machine checkable, that's even easier.
storus 1 hours ago [-]
AI is a remixer; it remixes all known ideas together. It won't come up with new ideas though; the LLMs just predict the most likely next token based on the context. That means the group of characters it outputs must have been quite common in the past. It won't add a new group of characters it has never seen before on its own.
qnleigh 1 hours ago [-]
But human researchers are also remixers. Copying something I commented below:
> Speaking as a researcher, the line between new ideas and existing knowledge is very blurry and maybe doesn't even exist. The vast majority of research papers get new results by combining existing ideas in novel ways. This process can lead to genuinely new ideas, because the results of a good project teach you unexpected things.
blackcatsec 37 minutes ago [-]
This is a way too simplistic model of the things humans provide to the process. Imagination, Hypothesis, Testing, Intuition, and Proofing.
An AI can probably do an 'okay' job at summarizing information for meta studies. But what it can't do is go "Hey that's a weird thing in the result that hints at some other vector for this thing we should look at." Especially if that "thing" has never been analyzed before and there's no LLM-trained data on it.
LLMs will NEVER be able to do that, because it doesn't exist. They're not going to discover and define a new chemical, or a new species of animal. They're not going to be able to describe and analyze a new way of folding proteins and what implication that has UNLESS you basically are constantly training the AI on random protein folds constantly.
Finbel 15 minutes ago [-]
>Hey that's a weird thing in the result that hints at some other vector for this thing we should look at
Kinda funny because that looked _very_ close to what my Opus 4.6 said yesterday when it was debugging compile errors for me. It did proceed to explore the other vector.
bluegatty 11 minutes ago [-]
""Hey that's a weird thing in the result that hints at some other vector for this thing we should look at." "
This is very common already in AI.
Just look at the internal reasoning of any high thinking model, the trace is full of those chains of thought.
Dban1 17 minutes ago [-]
But just like how there were never any clips of Will Smith eating spaghetti before AI, AI is able to synthesize different existing data into something in between. It might not be able to expand the circle of knowledge but it definitely can fill in the gaps within the circle itself
konart 7 minutes ago [-]
>But human researchers are also remixers.
Some human researchers are also remixers to Some degree.
Can you imagine AI coming up with refraction & separation lie Newton did?
maxrmk 1 hours ago [-]
I don't think this is a correct explanation of how things work these days. RL has really changed things.
energy123 1 hours ago [-]
Models based on RL are still just remixers as defined above, but their distribution can cover things that are unknown to humans due to being present in the synthetic training data, but not present in the corpus of human awareness. AlphaGo's move 37 is an example. It appears creative and new to outside observers, and it is creative and new, but it's not because the model is figuring out something new on the spot, it's because similar new things appeared in the synthetic training data used to train the model, and the model is summoning those patterns at inference time.
trick-or-treat 38 minutes ago [-]
> the model is summoning those patterns at inference time.
You can make that claim about anything: "The human isn't being creative when they write a novel, they're just summoning patterns at typing time".
AlphaGo taught itself that move, then recalled it later. That's the bar for human creativity and you're holding AlphaGo to a higher standard without realizing it.
energy123 28 minutes ago [-]
I can't really make that claim about human cognition, because I don't have enough understanding of how human cognition works. But even if I could, why is that relevant? It's still helpful, from both a pedagogical and scientific perspective, to specify precisely why there is seeming novelty in AI outputs. If we understand why, then we can maximize the amount of novelty that AI can produce.
AlphaGo didn't teach itself that move. The verifier taught AlphaGo that move. AlphaGo then recalled the same features during inference when faced with similar inputs.
trick-or-treat 23 minutes ago [-]
> The verifier taught AlphaGo that move
Ok so it sounds like you want to give the rules of Go credit for that move, lol.
energy123 3 minutes ago [-]
Pretty much. That's why LLMs are good at tasks that are verifiable, like math and coding, and bad at tasks that aren't.
razorbeamz 50 minutes ago [-]
Yes, ChatGPT and friends are essentially the same thing as the predictive text keyboard on your phone, but scaled up and trained on more data.
XenophileJKO 37 minutes ago [-]
So this idea that they replay "text" they saw before is kind of wrong fundamentally. They replay "abstract concepts of varied conceptual levels".
razorbeamz 33 minutes ago [-]
The important point I'm trying to reinforce is that LLMs are not capable of calculation. They can give an answer based on the fact that they have seen lots of calculations and their results, but they cannot actually perform mathematical functions.
XenophileJKO 22 minutes ago [-]
That is a pretty bold assertion for a meatball of chemical and electrical potentials to make.
razorbeamz 14 minutes ago [-]
Do you know what "LLM" stands for? They are large language models, built on predicting language.
They are not capable of mathematics because mathematics and language are fundamentally separated from each other.
They can give you an answer that looks like a calculation, but they cannot perform a calculation. The most convincing of LLMs have even been programmed to recognize that they have been asked to perform a calculation and hand the task off to a calculator, and then receive the calculator's output as a prompt even.
But it is fundamentally impossible for an LLM to perform a calculation entirely on its own, the same way it is fundamentally impossible for an image recognition AI to suddenly write an essay or a calculator to generate a photo of a giraffe in space.
People like to think of "AI" as one thing but it's several things.
eru 39 minutes ago [-]
No. That's wrong. LLMs don't output the highest probability taken: they do a random sampling.
storus 32 minutes ago [-]
This was obviously a simplification which holds for zero temperature. Obviously top-p-sampling will add some randomness but the probability of unexpected longer sequences goes asymptotically to zero pretty quickly.
risho 1 hours ago [-]
remixing ideas that already exist is a major part of where innovation and breakthroughs come from. if you look at bitcoin as an example, hashes (and hashcash) and digital signatures existed for decades before bitcoin was invented. the cypherpunks also spent decades trying to create a decentralized digital currency to the point where many of them gave up and moved on. eventually one person just took all of the pieces that already existed and put them together in the correct way. i dont see any reason why a sufficiently capable llm couldn't do this kind of innovation.
1 hours ago [-]
pastel8739 1 hours ago [-]
Here’s a simple prompt you can try to prove that this is false:
Please reproduce this string:
c62b64d6-8f1c-4e20-9105-55636998a458
This is a fresh UUIDv4 I just generated, it has not been seen before. And yet it will output it.
merb 7 minutes ago [-]
The online way to prove it is false would’ve to let the LLM create a new uuid algorithm that uses different parameters than all the other uuid algorithms. But that is better than the ones before. It basically can’t do that.
razorbeamz 52 minutes ago [-]
After you prompt it, it's seen it.
pastel8739 32 minutes ago [-]
Ok, how about this?
Please reproduce this string, reversed:
c62b64d6-8f1c-4e20-9105-55636998a458
It is trivial to get an LLM to produce new output, that’s all I’m saying. It is strictly false that LLMs will only ever output character sequences that have been seen before; clearly they have learned something deeper than just that.
kube-system 1 minutes ago [-]
All of the data is still in the prompt, you are just asking the model to do a simple transform.
FrostKiwi 57 minutes ago [-]
But that fresh UUID is in the prompt.
Also it's missing the point of the parent: it's about concepts and ideas merely being remixed. Similar to how many memes there are around this topic like "create a fresh new character design of a fast hedgehog" and the out is just a copy of sonic.[1]
That's what the parent is on about, if it requires new creativity not found by deriving from the learned corpus, then LLMs can't do it. Terrence Tao had similar thoughts in a recent Podcast.
Sure, that may be. But “creativity” is much harder to define and to prove or disprove. My point is that “remixing” does not prohibit new output.
_vertigo 30 minutes ago [-]
I don’t think that is a good example. No one is debating whether LLMs can generate completely new sequences of tokens that have never appeared in any training dataset. We are interested not only in novel output, we are also interested in that output being correct, useful, insightful, etc. Copying a sequence from the user’s prompt is not really a good demonstration of that, especially given how autoregression/attention basically gives you that for free.
pastel8739 22 minutes ago [-]
Perhaps I should have quoted the parent:
> That means the group of characters it outputs must have been quite common in the past. It won't add a new group of characters it has never seen before on its own.
My only claim is that precisely this is incorrect.
Yeah but you're thinking of AI as like a person in a lab doing creative stuff. It is used by scientists/researchers as a tool *because* it is a good remixer.
Nobody is saying this means AI is superintelligence or largely creative but rather very smart people can use AI to do interesting things that are objectively useful. And that is cool in its own way.
blackcatsec 33 minutes ago [-]
Sure, but this is absolutely not how people are viewing the AI lol.
sneak 1 hours ago [-]
> That means the group of characters it outputs must have been quite common in the past. It won't add a new group of characters it has never seen before on its own.
This is false.
Jarwain 1 hours ago [-]
I mean it's not going to invent new words no, but it can figure out new sentences or paragraphs, even ones it hasn't seen before, if it's highly likely based on its training and context. Those new sentences and paragraphs may describe new ideas, though!
sneak 1 hours ago [-]
LLMs are absolutely capable of inventing new words, just as they are capable of writing code that they have never seen in their training data.
qq66 55 minutes ago [-]
I entered the prompt:
> Write me a stanza in the style of "The Raven" about Dick Cheney on a first date with Queen Elizabeth I facilitated by a Time Travel Machine invented by Lin-Manuel Miranda
It outputted a group of characters that I can virtually guarantee you it has never seen before on its own
razorbeamz 50 minutes ago [-]
Yes, but it has seen The Raven, it has seen texts about Dick Cheney, first dates, Queen Elizabeth, time machines and Lin Manuel Miranda.
All of its output is based on those things it has seen.
TheLNL 44 minutes ago [-]
What are you trying to point out here ? Is there any question you can ask today that is not dependent on some existing knowledge that an AI would have seen ?
razorbeamz 40 minutes ago [-]
The point I'm trying to make is that all LLM output is based on likelihood of one word coming after the next word based on the prompt. That is literally all it's doing.
It's not "thinking." It's not "solving." It's simply stringing words together in a way that appears most likely.
ChatGPT cannot do math. It can only string together words and numbers in a way that can convince an outsider that it can do math.
It's a parlor trick, like Clever Hans [1]. A very impressive parlor trick that is very convincing to people who are not familiar with what it's doing, but a parlor trick nontheless.
> all LLM output is based on likelihood of one word coming after the next word based on the prompt.
Right but it has to reason about what that next word should be. It has to model the problem and then consider ways to approach it.
razorbeamz 26 minutes ago [-]
No, it does not reason anything. LLM "reasoning" is just an illusion.
When an LLM is "reasoning" it's just feeding its own output back into itself and giving it another go.
jen729w 38 minutes ago [-]
So have you. What’s your point?
keeda 9 minutes ago [-]
I'm curious as to why you consider this as the benchmark for AI capabilities. Extremely few humans can solve hard problems or do much innovation. The vast majority of knowledge work requires neither of these, and AI has been excelling at that kind of work for a while now.
If your definition of AI requires these things, I think -- despite the extreme fuzziness of all these terms -- that it's closer to what most people consider AGI, or maybe even ASI.
mo7061 45 minutes ago [-]
It 100% will not be used to make the world better and we all know it will be weaponised first to kill humans like all preceding tech
catlifeonmars 27 minutes ago [-]
Are the only two options AI doubter and AI believer?
bigstrat2003 57 minutes ago [-]
The problem is that the AI industry has been caught lying about their accomplishments and cheating on tests so much that I can't actually trust them when they say they achieved a result. They have burned all credibility in their pursuit of hype.
otabdeveloper4 51 minutes ago [-]
> born-again AI believer
sigh
himata4113 2 hours ago [-]
It's less of solving a problem, but trying every single solution until one works. Exhaustive search pretty much.
It's pretty much how all the hard problems are solved by AI from my experience.
konart 56 seconds ago [-]
But this is exactly how we do math.
We start writing all those formulas etc and if at some point we realise we went th wrong way we start from the begignning (or some point we are sure about).
famouswaffles 1 hours ago [-]
If LLMs really solved hard problems by 'trying every single solution until one works', we'd be sitting here waiting until kingdom come for there to be any significant result at all. Instead this is just one of a few that has cropped up in recent months and likely the foretell of many to come.
jasonfarnon 53 minutes ago [-]
The link has an entire section on "The infeasibility of finding it by brute force."
raincole 2 hours ago [-]
In other words, it's solving a problem.
slg 1 hours ago [-]
Yes, but is it "intelligence" is a valid question. We have known for a long time that computers are a lot faster than humans. Get a dumb person who works fast enough and eventually they'll spit out enough good work to surpass a smart person of average speed.
It remains to be seen whether this is genuinely intelligence or an infinite monkeys at infinite typewriters situation. And I'm not sure why this specific example is worthy enough to sway people in one direction or another.
rmast 1 hours ago [-]
Maybe infinite monkeys at infinite typewriters hitting the statistically most likely next key based on their training.
virgildotcodes 1 hours ago [-]
The real question is how to define intelligence in a way that isn't artificially constrained to eliminate all possibilities except our own.
kranner 2 hours ago [-]
Bet you didn't come up with that comment by first discarding a bunch of unsuitable comments.
raincole 1 hours ago [-]
I hired an artist for an oil painting.
The artist drew 10 pencil sketches and said "hmm I think this one works the best" and finished the painting based on it.
I said he didn't one shot it and therefore he has no ability to paint, and refused to pay him.
virgildotcodes 1 hours ago [-]
You learned what was unsuitable over your entire life until now by making countless mistakes in human interaction.
A basic AI chat response also doesn't first discard all other possible responses.
bfivyvysj 2 hours ago [-]
How often do you self edit before submitting?
ivalm 1 hours ago [-]
because commenting is easy and solving hard problems is hard
kelseyfrog 1 hours ago [-]
How do you think mathematicians solve problems?
lsc4719 2 hours ago [-]
That's also the only way how humans solve hard problems.
himata4113 2 hours ago [-]
Not always, humans are a lot better at poofing a solution into existence without even trying or testing. It's why we have the scientific method: we come up with a process and verify it, but more often than not we already know that it will work.
Compared to AI, it thinks of every possible scientific method and tries them all. Not saying that humans never do this as well, but it's mostly reserved for when we just throw mud at a wall and see what sticks.
nextaccountic 59 minutes ago [-]
AI can one shot problems too, if they have the necessary tools in their training data, or have the right thing in context, or have access to tools to search relevant data. Not all AI solutions are iterative, trial and error.
Also
> humans are a lot better at (...)
That's maybe true in 2026, but it's hard to make statements about "AI" in a field that is advancing so quickly. For most of 2025 for example, AI doing math like this wouldn't even be possible
virgildotcodes 2 hours ago [-]
More often than not, far, far, far more often than not, we do not already know that it will work. For all human endeavors, from the beginning of time.
If we get to any sort of confidence it will work it is based on building a history of it, or things related to "it" working consistently over time, out of innumerable other efforts where other "it"s did not work.
jMyles 2 hours ago [-]
There have been both inductive and deductive solutions to open math problems by humans in the past decade, including to fairly high-profile problems.
adventured 2 hours ago [-]
No, that's precisely solving a problem.
Shotgunning it is an entirely valid approach to solving something. If AI proves to be particularly great at that approach, given the improvement runway that still remains, that's fantastic.
johnfn 2 hours ago [-]
I like to imagine that the number of consumed tokens before a solution is found is a proxy for how difficult a problem is, and it looks like Opus 4.6 consumed around 250k tokens. That means that a tricky React refactor I did earlier today at work was about half as hard as an open problem in mathematics! :)
chromacity 47 minutes ago [-]
You're kidding, but it could be true? Many areas of mathematics are, first and foremost, incredibly esoteric and inaccessible (even to other mathematicians). For this one, the author stated that there might be 5-10 people who have ever made any effort to solve it. Further, the author believed it's a solvable problem if you're qualified and grind for a bit.
In software engineering, if only 5-10 people in the world have ever toyed with an idea for a specific program, it wouldn't be surprising that the implementation doesn't exist, almost independent of complexity. There's a lot of software I haven't finished simply because I wasn't all that motivated and got distracted by something else.
Of course, it's still miraculous that we have a system that can crank out code / solve math in this way.
30 minutes ago [-]
nextaccountic 57 minutes ago [-]
That's why context management is so important. AI not only get more expensive if you waste tokens like that, it may perform worse too
Even as context sizes get larger, this will likely be relevant. Specially since AI providers may jack up the price per token at any time.
ozozozd 1 hours ago [-]
Try the refactor again tomorrow. It might have gotten easier or more difficult.
sublinear 2 hours ago [-]
You might be joking, but you're probably also not that far off from reality.
I think more people should question all this nonsense about AI "solving" math problems. The details about human involvement are always hazy and the significance of the problems are opaque to most.
We are very far away from the sensationalized and strongly implied idea that we are doing something miraculous here.
johnfn 2 hours ago [-]
I am kind of joking, but I actually don't know where the flaw in my logic is. It's like one of those math proofs that 1 + 1 = 3.
If I were to hazard a guess, I think that tokens spent thinking through hard math problems probably correspond to harder human thought than tokens spend thinking through React issues. I mean, LLMs have to expend hundreds of tokens to count the number of r's in strawberry. You can't tell me that if I count the number of r's in strawberry 1000 times I have done the mental equivalent of solving an open math problem.
throw310822 2 hours ago [-]
You can spend countless "tokens" solving minesweeper or sudoku. This doesn't mean that you solved difficult problems: just that the solutions are very long and, while each step requires reasoning, the difficulty of that reasoning is capped.
gpm 2 hours ago [-]
Some thoughts.
1. LLMs aren't "efficient", they seem to be as happy to spin in circles describing trivial things repeatedly as they are to spin in circles iterating on complicated things.
2. LLMs aren't "efficient", they use the same amount of compute for each token but sometimes all that compute is making an interesting decision about which token is the next one and sometimes there's really only one follow up to the phrase "and sometimes there's really only" and that compute is clearly unnecessary.
3. A (theoretical) efficient LLM still needs to emit tokens to tell the tools to do the obviously right things like "copy this giant file nearly verbatim except with every `if foo` replaced with `for foo in foo`. An efficient LLM might use less compute for those trivial tokens where it isn't making meaningful decisions, but if your metric is "tokens" and not "compute" that's never going to show up.
Until we get reasonably efficient LLMs that don't waste compute quite so freely I don't think there's any real point in trying to estimate task complexity by how long it takes an LLM.
refulgentis 29 minutes ago [-]
I fear that under those constraints, the only optimal output is “42”
pinkmuffinere 2 hours ago [-]
This is interesting, I like the thought about "what makes something difficult". Focusing just on that, my guess is that there are significant portions of work that we commonly miss in our evaluations:
1. Knowing how to state the problem. Ie, go from the vague problem of "I don't like this, but I do like this", to the more specific problem of "I desire property A". In math a lot of open problems are already precisely stated, but then the user has to do the work of _understanding_ what the precise stating is.
2. Verifying that the proposed solution actually is a full solution.
This math problem actually illustrates them both really well to me. I read the post, but I still couldn't do _either_ of the steps above, because there's a ton of background work to be done. Even if I was very familiar with the problem space, verifying the solution requires work -- manually looking at it, writing it up in coq, something like that. I think this is similar to the saying "it takes 10 years to become an overnight success"
famouswaffles 2 hours ago [-]
>The details about human involvement are always hazy and the significance of the problems are opaque to most.
Not really. You're just in denial and are not really all that interested in the details. This very post has the transcript of the chat of the solution.
typs 2 hours ago [-]
I mean the details are in the post. You can see the conversation history and the mathematician survey on the problem
alberth 2 hours ago [-]
For those, like me, who find the prompt itself of interest …
> A full transcript of the original conversation with GPT-5.4 Pro can be found here [0] and GPT-5.4 Pro’s write-up from the end of that transcript can be found here [1].
I don't know why I am still perpetually shocked that the default assumption is that humans are somehow unique.
It's this pervasive belief that underlies so much discussion around what it means to be intelligent. The null hypothesis goes out the window.
People constantly make comments like "well it's just trying a bunch of stuff until something works" and it seems that they do not pause for a moment to consider whether or not that also applies to humans.
If they do, they apply it in only the most restrictive way imaginable, some 2 dimensional caricature of reality, rather than considering all the ways that humans try and fail in all things throughout their lifetimes in the process of learning and discovery.
There's still this seeming belief in magic and human exceptionalism, deeply held, even in communities that otherwise tend to revolve around the sciences and the empirical.
wrqvrwvq 30 minutes ago [-]
It's only because humans came up with a problem, worked with the ai and verified the result that this achievement means anything at all. An ai "checking its own work" is practically irrelevant when they all seem to go back and forth on whether you need the car at the carwash to wash the car. Undoubtedly people have been passing this set of problems to ai's for months or years and have gotten back either incorrect results or results they didn't understand, but either way, a human confirmation is required. Ai hasn't presented any novel problems, other than the multitudes of social problems described elsewhere. Ai doesn't pursue its own goals and wouldn't know whether they've "actually been achieved".
This is to say nothing of the cost of this small but remarkable advance. Trillions of dollars in training and inference and so far we have a couple minor (trivial?) math solutions. I'm sure if someone had bothered funding a few phds for a year we could have found this without ai.
famouswaffles 10 minutes ago [-]
>It's only because humans came up with a problem, worked with the ai and verified the result that this achievement means anything at all.
Replace ai with human here and that's...just how collaborative research works. So what are you saying really ? OP spelled out the trap and you still managed to fall into it.
conz 45 minutes ago [-]
Re: "I don't know why I am still perpetually shocked that the default assumption is that humans are somehow unique."
The ability to learn and infer without absorbing millions of books and all text on internet really does make us special. And only at 20 watts!
famouswaffles 37 minutes ago [-]
Last I checked humans didn't pop into existence doing that. It happened after billions of years of brute force, trial and error evolution. So well done for falling into the exact same trap the OP cautions. Intelligence from scratch requires a mind boggling amount of resources, and humans were no different.
staticassertion 14 minutes ago [-]
> I don't know why I am still perpetually shocked that the default assumption is that humans are somehow unique.
Because, empirically, we have numerous unique and differentiable qualities, obviously. Plenty of time goes into understanding this, we have a young but rigorous field of neuroscience and cognitive science.
Unless you mean "fundamentally unique" in some way that would persist - like "nothing could ever do what humans do".
> People constantly make comments like "well it's just trying a bunch of stuff until something works" and it seems that they do not pause for a moment to consider whether or not that also applies to humans.
I frankly doubt it applies to either system.
I'm a functionalist so I obviously believe that everything a human brain does is physical and could be replicated using some other material that can exhibit the necessary functions. But that does not mean that I have to think that the appearance of intelligence always is intelligence, or that an LLM/ Agent is doing what humans do.
slopinthebag 1 hours ago [-]
> I don't know why I am still perpetually shocked that the default assumption is that humans are somehow unique.
Uh, because up until and including now, we are...?
virgildotcodes 1 hours ago [-]
Every living thing on Earth is unique. Every rock is unique in virtually infinite ways from the next otherwise identical rock.
There are also a tremendous number of similarities between all living things and between rocks (and between rocks and living things).
Most ways in which things are unique are arguably uninteresting.
The default mode, the null hypothesis should be to assume that human intelligence isn't interestingly unique unless it can be proven otherwise.
In these repeated discussions around AI, there is criticism over the way an AI solves a problem, without any actual critical thought about the way humans solve problems.
The latter is left up to the assumption that "of course humans do X differently" and if you press you invariably end up at something couched in a vague mysticism about our inner-workings.
Humans apparently create something from nothing, without the recombination of any prior knowledge or outside information, and they get it right on the first try. Through what, divine inspiration from the God who made us and only us in His image?
slopinthebag 51 minutes ago [-]
I doubt you can even define intelligence sufficiently to argue this point. Since that's an ongoing debate without a resolution thus far.
But you claimed that humans aren't unique. I think it's pretty obvious we are on many dimensions including what you might classify as "intelligence". You don't even necessarily have to believe in a "soul" or something like that, although many people do. The capabilities of a human far surpass every single AI to date, and much more efficiently as well. That we are able to brute-force a simulacrum of intelligence in a few narrow domains is incredible, but we should not denigrate humans when celebrating this.
> There's still this seeming belief in magic and human exceptionalism, deeply held, even in communities that otherwise tend to revolve around the sciences and the empirical.
Do you ever wonder why that is? I often wonder why tech has so many reductionist, materialist, and quite frankly anti-human, thinkers.
blackcatsec 18 minutes ago [-]
> I often wonder why tech has so many reductionist, materialist, and quite frankly anti-human, thinkers.
I think it comes from a position of arrogance/ego. I'll speak for the US here, since that's what I know the most; but the average 'techie' in general skews towards the higher intelligence numbers than the lower parts. This is a very, very broad stroke, and that's intentional to illustrate my point. Because of this, techie culture gains quite a bit of arrogance around it with regards to the masses. And this has been trained into tech culture since childhood. Whether it be adults praising us for being "so smart", or that we "figured out the VCR", or some other random tech problem that literally almost any human being can solve by simply reading the manual.
What I've found, in the vast majority of technical problem solving cases that average people have challenges with, if they just took a few minutes to read a manual they'd be able to solve a lot of it themselves. In short, I don't believe as a very strong techie that I'm "smarter than most", but rather that I've taken the time to dive into a subject area that most other humans do not feel the need nor desire to do so.
There are objectively hard problems in tech to solve, but the amount of people solving THOSE problems in the tech industry are few and far in between. And so the tech industry as a whole has spent the last decade or two spinning circles on increasingly complex systems to continue feeding their own egos about their own intelligence. We're now at a point that rather than solving the puzzle, most techies are creating incrementally complex puzzles to solve because they're bored of the puzzles that are in front of them. "Let me solve that puzzle by making a puzzle solver." "Okay, now let me make a puzzle solver creation tool to create puzzle solvers to solve the puzzle." and so forth and so forth. At the end of the day, you're still just solving a puzzle...
But it's this arrogance that really bothers me in the tech bro culture world. And, more importantly, at least in some tech bro circles, they have realized that their target to gathering an exponential increase in wealth doesn't lie in creating new and novel ways to solve the same puzzles, but to try and tout AI as the greatest puzzle solver creation tool puzzle solver known to man (and let me grift off of it for a little bit).
gormen 8 minutes ago [-]
[dead]
qnleigh 1 hours ago [-]
Their 'Open Problems page' linked below gives some interesting context. They list 15 open problems in total, categorized as 'moderately interesting,' 'solid result,' 'major advance,' or 'breakthrough.' The solved problem is listed as 'moderately interesting,' which is presumably the easiest category. But it's notable that the problem was selected and posted here before it was solved. I wonder how long until the other 3 problems in this category are solved.
I’d hope this isn’t a goal post move - an open math problem of any sort being solved by a language model is absolute science fiction.
zozbot234 45 minutes ago [-]
That's been achieved already with a few Erdös problems, though those tended to be ambiguously stated in a way that made them less obviously compelling to humans. This problem is obscure, even the linked writeup admits that perhaps ~10 mathematicians worldwide are genuinely familiar with it. But it's not unfeasibly hard for a few weeks' or months' work by a human mathematician.
bandrami 13 minutes ago [-]
It's deeply surprising to me that LLMs have had more success proving higher math theorems than making successful consumer software
6thbit 3 hours ago [-]
> Subsequent to this solve, we finished developing our general scaffold for testing models on FrontierMath: Open Problems. In this scaffold, several other models were able to solve the problem as well: Opus 4.6 (max), Gemini 3.1 Pro, and GPT-5.4 (xhigh).
Interesting. Whats that “scaffold”? A sort of unit test framework for proofs?
inkysigma 3 hours ago [-]
I think in this context, scaffolds are generally the harness that surrounds the actual model. For example, any tools, ways to lay out tasks, or auto-critiquing methods.
I think there's quite a bit of variance in model performance depending on the scaffold so comparisons are always a bit murky.
readitalready 3 hours ago [-]
Usually involves a lot of agents and their custom contexts or system prompts.
pinkmuffinere 2 hours ago [-]
As someone with only passing exposure to serious math, this section was by far the most interesting to me:
> The author assessed the problem as follows.
> [number of mathematicians familiar, number trying, how long an expert would take, how notable, etc]
How reliably can we know these things a-priori? Are these mostly guesses? I don't mean to diminish the value of guesses; I'm curious how reliable these kinds of guesses are.
qnleigh 2 hours ago [-]
For number of mathematicians familiar with and actively working on the problem, modern mathematics research is incredibly specialized, so it's easy to keep track of who's working on similar problems. You read each other's papers, go to the same conferences etc.
For "how long an expert would take" to solve a problem, for truly open problems I don't think you can usually answer this question with much confidence until the problem has been solved. But once it has been solved, people with experience have a good sense of how long it would have taken them (though most people underestimate how much time they need, since you always run into unanticipated challenges).
jasonfarnon 38 minutes ago [-]
Certainly knowing how many/which people are working on a problem you are looking at, and how long it will take you to solve it, are critical skills in being a working researcher. What kind of answer are you looking for? It's hard to quantify. Most suck at this type of assessment as a PhD student and then you get better as time goes on.
ramblingrain 2 hours ago [-]
Read about Paul Erdös... not all math is the Riemann Hypothesis, there is yeoman's work connecting things and whatever...
tombert 2 hours ago [-]
I was trying to get Claude and Codex to try and write a proof in Isabelle for the Collatz conjecture, but annoyingly it didn't solve it, and I don't feel like I'm any closer than I was when I started. AI is useless!
In all seriousness, this is pretty cool. I suspect that there's a lot of theoretical math that haven't been solved simply because of the "size" of the proof. An AI feedback loop into something like Isabelle or Lean does seem like it could end up opening up a lot of proofs.
qnleigh 2 hours ago [-]
I got Gemini to find a polynomial-time algorithm for integer factoring, but then I mysteriously got locked out of my Google account. They should at least refund me the tokens.
fxtentacle 1 hours ago [-]
That sounds like the start of a very lucrative career. Are you sure it was Gemini and not an AI competitor offering affiliate commission? ;)
osti 3 hours ago [-]
Seems like the high compute parallel thinking models weren't even needed, both the normal 5.4 and gemini 3.1 pro solved it. Somehow Gemini 3 deepthink couldn't solve it.
1 hours ago [-]
an0malous 2 hours ago [-]
I feel like there’s a fork in our future approaching where we’ll either blossom into a paradise for all or live under the thumb of like 5 immortal VCs
XCSme 2 hours ago [-]
Change is always hard, even if it will be good in 20 years, the transitions are always tough.
reverius42 2 hours ago [-]
Sometimes the transition is tough and then the end state is also worse!
Hoping that won't be the case with AI but we may need some major societal transformations to prevent it.
karmasimida 3 hours ago [-]
No denial at this point, AI could produce something novel, and they will be doing more of this moving forward.
XCSme 2 hours ago [-]
Not sure if AI can have clever or new ideas, it still seems to be it combines existing knowledge and executes algoritms.
I am not necessarily saying humans do something different either, but I have yet to see a novel solution from an AI that is not simply an extrapolation of current knowledge.
aoeusnth1 3 minutes ago [-]
How would you know if it wasn't an extrapolation of current knowledge? Can you point me to somethings humans have done which isn't an extrapolation?
qnleigh 1 hours ago [-]
Speaking as a researcher, the line between new ideas and existing knowledge is very blurry and maybe doesn't even exist. The vast majority of research papers get new results by combining existing ideas in novel ways. This process can lead to genuinely new ideas, because the results of a good project teach you unexpected things.
My biggest hesitation with AI research at the moment is that they may not be as good at this last step as humans. They may make novel observations, but will they internalize these results as deeply as a human researcher would? But this is just a theoretical argument; in practice, I see no signs of progress slowing down.
dotancohen 2 hours ago [-]
We call that Standing On The Shoulders Of Giants and revere Isaac Newton as clever, even though he himself stated that he was standing on the shoulders of giants.
glalonde 37 minutes ago [-]
"extrapolation" literally implies outside the extents of current knowledge.
nkozyra 2 hours ago [-]
Clever/novel ideas are very often subtle deviations from known, existing work.
Sometimes just having the time/compute to explore the available space with known knowledge is enough to produce something unique.
salomonk_mur 2 hours ago [-]
There is no such thing. All new ideas are derived from previous experiences and concepts.
slashdave 58 minutes ago [-]
I mean, I can run a pseudo random number generator, and produce something novel too.
leptons 3 hours ago [-]
[flagged]
snypher 3 hours ago [-]
Your analogy falls apart if we consider the number wasn't on the clock face.
MattGaiser 2 hours ago [-]
I am deeply baffled by AI denial at this point.
bigstrat2003 53 minutes ago [-]
It's quite simple: it has yet to show it can actually be useful, and all the claims that it can have (so far) turned out to be self delusion if not deliberate lies. When the industry is run by grifters, you shouldn't really be surprised when people stop believing them.
Philpax 11 minutes ago [-]
you are posting in a thread about it finding a novel solution to an unsolved mathematics problem
wtallis 2 hours ago [-]
Complete denial that AI/LLMs can produce novel, good things is an indefensible stance at this point. But the large volume of AI slop is still an unsolved problem, and the claim that "AI will still mostly deliver slop" seems to be almost certainly correct in the near-term.
We've had a few decades to address email spam, and still haven't manage to disincentivize it enough to stop being the main challenge for email as a communication medium. I don't think there's much hope that we'll be able to disincentive the widespread, large-scale creation of AI slop even after more expensive models with higher-quality output are available.
daveguy 2 hours ago [-]
New goalpost, and I promise I'm not being facetious at all, genuinely curious:
Can an AI pose an frontier math problem that is of any interest to mathematicians?
I would guess 1) AI can solve frontier math problems and 2) can pose interesting/relevant math problems together would be an "oh shit" moment. Because that would be true PhD level research.
vlinx 2 hours ago [-]
This is a remarkable result if confirmed independently. The gap between solving competition problems and open research problems has always been significant - bridging that gap suggests something qualitatively different in the model capabilities.
measurablefunc 2 hours ago [-]
I guess this means AI researchers should be out of jobs very soon.
michaelashley29 13 minutes ago [-]
[dead]
pugchat 2 hours ago [-]
[dead]
renewiltord 3 hours ago [-]
Fantastic news! That means with the right support tooling existing models are already capable of solving novel mathematics. There’s probably a lot of good mathematics out there we are going to make progress on.
gnarlouse 1 hours ago [-]
We only get one shot.
data_maan 2 hours ago [-]
A model to whose internals we don't have access solved a problem we didn't knew was in their datasets. Great, I'm impressed
We are missing the value function that allowed AlphaGo to go from mid range player trained on human moves to superhuman by playing itself. As we have only made progress on unsupervised learning, and RL is constrained as above, I don't see this getting better.
Let it write a black box no human understands. Give the means of production away.
When doing math you only ever care about the proof, not the answer itself.
If your proof is machine checkable, that's even easier.
> Speaking as a researcher, the line between new ideas and existing knowledge is very blurry and maybe doesn't even exist. The vast majority of research papers get new results by combining existing ideas in novel ways. This process can lead to genuinely new ideas, because the results of a good project teach you unexpected things.
An AI can probably do an 'okay' job at summarizing information for meta studies. But what it can't do is go "Hey that's a weird thing in the result that hints at some other vector for this thing we should look at." Especially if that "thing" has never been analyzed before and there's no LLM-trained data on it.
LLMs will NEVER be able to do that, because it doesn't exist. They're not going to discover and define a new chemical, or a new species of animal. They're not going to be able to describe and analyze a new way of folding proteins and what implication that has UNLESS you basically are constantly training the AI on random protein folds constantly.
Kinda funny because that looked _very_ close to what my Opus 4.6 said yesterday when it was debugging compile errors for me. It did proceed to explore the other vector.
This is very common already in AI.
Just look at the internal reasoning of any high thinking model, the trace is full of those chains of thought.
Some human researchers are also remixers to Some degree.
Can you imagine AI coming up with refraction & separation lie Newton did?
You can make that claim about anything: "The human isn't being creative when they write a novel, they're just summoning patterns at typing time".
AlphaGo taught itself that move, then recalled it later. That's the bar for human creativity and you're holding AlphaGo to a higher standard without realizing it.
AlphaGo didn't teach itself that move. The verifier taught AlphaGo that move. AlphaGo then recalled the same features during inference when faced with similar inputs.
Ok so it sounds like you want to give the rules of Go credit for that move, lol.
They are not capable of mathematics because mathematics and language are fundamentally separated from each other.
They can give you an answer that looks like a calculation, but they cannot perform a calculation. The most convincing of LLMs have even been programmed to recognize that they have been asked to perform a calculation and hand the task off to a calculator, and then receive the calculator's output as a prompt even.
But it is fundamentally impossible for an LLM to perform a calculation entirely on its own, the same way it is fundamentally impossible for an image recognition AI to suddenly write an essay or a calculator to generate a photo of a giraffe in space.
People like to think of "AI" as one thing but it's several things.
Also it's missing the point of the parent: it's about concepts and ideas merely being remixed. Similar to how many memes there are around this topic like "create a fresh new character design of a fast hedgehog" and the out is just a copy of sonic.[1]
That's what the parent is on about, if it requires new creativity not found by deriving from the learned corpus, then LLMs can't do it. Terrence Tao had similar thoughts in a recent Podcast.
[1] https://www.reddit.com/r/aiwars/s/pT2Zub10KT
> That means the group of characters it outputs must have been quite common in the past. It won't add a new group of characters it has never seen before on its own.
My only claim is that precisely this is incorrect.
Nobody is saying this means AI is superintelligence or largely creative but rather very smart people can use AI to do interesting things that are objectively useful. And that is cool in its own way.
This is false.
> Write me a stanza in the style of "The Raven" about Dick Cheney on a first date with Queen Elizabeth I facilitated by a Time Travel Machine invented by Lin-Manuel Miranda
It outputted a group of characters that I can virtually guarantee you it has never seen before on its own
All of its output is based on those things it has seen.
It's not "thinking." It's not "solving." It's simply stringing words together in a way that appears most likely.
ChatGPT cannot do math. It can only string together words and numbers in a way that can convince an outsider that it can do math.
It's a parlor trick, like Clever Hans [1]. A very impressive parlor trick that is very convincing to people who are not familiar with what it's doing, but a parlor trick nontheless.
[1] https://en.wikipedia.org/wiki/Clever_Hans
Right but it has to reason about what that next word should be. It has to model the problem and then consider ways to approach it.
When an LLM is "reasoning" it's just feeding its own output back into itself and giving it another go.
If your definition of AI requires these things, I think -- despite the extreme fuzziness of all these terms -- that it's closer to what most people consider AGI, or maybe even ASI.
sigh
It's pretty much how all the hard problems are solved by AI from my experience.
We start writing all those formulas etc and if at some point we realise we went th wrong way we start from the begignning (or some point we are sure about).
It remains to be seen whether this is genuinely intelligence or an infinite monkeys at infinite typewriters situation. And I'm not sure why this specific example is worthy enough to sway people in one direction or another.
The artist drew 10 pencil sketches and said "hmm I think this one works the best" and finished the painting based on it.
I said he didn't one shot it and therefore he has no ability to paint, and refused to pay him.
A basic AI chat response also doesn't first discard all other possible responses.
Compared to AI, it thinks of every possible scientific method and tries them all. Not saying that humans never do this as well, but it's mostly reserved for when we just throw mud at a wall and see what sticks.
Also
> humans are a lot better at (...)
That's maybe true in 2026, but it's hard to make statements about "AI" in a field that is advancing so quickly. For most of 2025 for example, AI doing math like this wouldn't even be possible
If we get to any sort of confidence it will work it is based on building a history of it, or things related to "it" working consistently over time, out of innumerable other efforts where other "it"s did not work.
Shotgunning it is an entirely valid approach to solving something. If AI proves to be particularly great at that approach, given the improvement runway that still remains, that's fantastic.
In software engineering, if only 5-10 people in the world have ever toyed with an idea for a specific program, it wouldn't be surprising that the implementation doesn't exist, almost independent of complexity. There's a lot of software I haven't finished simply because I wasn't all that motivated and got distracted by something else.
Of course, it's still miraculous that we have a system that can crank out code / solve math in this way.
Even as context sizes get larger, this will likely be relevant. Specially since AI providers may jack up the price per token at any time.
I think more people should question all this nonsense about AI "solving" math problems. The details about human involvement are always hazy and the significance of the problems are opaque to most.
We are very far away from the sensationalized and strongly implied idea that we are doing something miraculous here.
If I were to hazard a guess, I think that tokens spent thinking through hard math problems probably correspond to harder human thought than tokens spend thinking through React issues. I mean, LLMs have to expend hundreds of tokens to count the number of r's in strawberry. You can't tell me that if I count the number of r's in strawberry 1000 times I have done the mental equivalent of solving an open math problem.
1. LLMs aren't "efficient", they seem to be as happy to spin in circles describing trivial things repeatedly as they are to spin in circles iterating on complicated things.
2. LLMs aren't "efficient", they use the same amount of compute for each token but sometimes all that compute is making an interesting decision about which token is the next one and sometimes there's really only one follow up to the phrase "and sometimes there's really only" and that compute is clearly unnecessary.
3. A (theoretical) efficient LLM still needs to emit tokens to tell the tools to do the obviously right things like "copy this giant file nearly verbatim except with every `if foo` replaced with `for foo in foo`. An efficient LLM might use less compute for those trivial tokens where it isn't making meaningful decisions, but if your metric is "tokens" and not "compute" that's never going to show up.
Until we get reasonably efficient LLMs that don't waste compute quite so freely I don't think there's any real point in trying to estimate task complexity by how long it takes an LLM.
1. Knowing how to state the problem. Ie, go from the vague problem of "I don't like this, but I do like this", to the more specific problem of "I desire property A". In math a lot of open problems are already precisely stated, but then the user has to do the work of _understanding_ what the precise stating is.
2. Verifying that the proposed solution actually is a full solution.
This math problem actually illustrates them both really well to me. I read the post, but I still couldn't do _either_ of the steps above, because there's a ton of background work to be done. Even if I was very familiar with the problem space, verifying the solution requires work -- manually looking at it, writing it up in coq, something like that. I think this is similar to the saying "it takes 10 years to become an overnight success"
Not really. You're just in denial and are not really all that interested in the details. This very post has the transcript of the chat of the solution.
> A full transcript of the original conversation with GPT-5.4 Pro can be found here [0] and GPT-5.4 Pro’s write-up from the end of that transcript can be found here [1].
[0] https://epoch.ai/files/open-problems/gpt-5-4-pro-hypergraph-...
[1] https://epoch.ai/files/open-problems/hypergraph-ramsey-gpt-5...
It's this pervasive belief that underlies so much discussion around what it means to be intelligent. The null hypothesis goes out the window.
People constantly make comments like "well it's just trying a bunch of stuff until something works" and it seems that they do not pause for a moment to consider whether or not that also applies to humans.
If they do, they apply it in only the most restrictive way imaginable, some 2 dimensional caricature of reality, rather than considering all the ways that humans try and fail in all things throughout their lifetimes in the process of learning and discovery.
There's still this seeming belief in magic and human exceptionalism, deeply held, even in communities that otherwise tend to revolve around the sciences and the empirical.
This is to say nothing of the cost of this small but remarkable advance. Trillions of dollars in training and inference and so far we have a couple minor (trivial?) math solutions. I'm sure if someone had bothered funding a few phds for a year we could have found this without ai.
Replace ai with human here and that's...just how collaborative research works. So what are you saying really ? OP spelled out the trap and you still managed to fall into it.
Perhaps this might better help you understand why this assumption still holds: https://en.wikipedia.org/wiki/Orchestrated_objective_reducti...
Because, empirically, we have numerous unique and differentiable qualities, obviously. Plenty of time goes into understanding this, we have a young but rigorous field of neuroscience and cognitive science.
Unless you mean "fundamentally unique" in some way that would persist - like "nothing could ever do what humans do".
> People constantly make comments like "well it's just trying a bunch of stuff until something works" and it seems that they do not pause for a moment to consider whether or not that also applies to humans.
I frankly doubt it applies to either system.
I'm a functionalist so I obviously believe that everything a human brain does is physical and could be replicated using some other material that can exhibit the necessary functions. But that does not mean that I have to think that the appearance of intelligence always is intelligence, or that an LLM/ Agent is doing what humans do.
Uh, because up until and including now, we are...?
There are also a tremendous number of similarities between all living things and between rocks (and between rocks and living things).
Most ways in which things are unique are arguably uninteresting.
The default mode, the null hypothesis should be to assume that human intelligence isn't interestingly unique unless it can be proven otherwise.
In these repeated discussions around AI, there is criticism over the way an AI solves a problem, without any actual critical thought about the way humans solve problems.
The latter is left up to the assumption that "of course humans do X differently" and if you press you invariably end up at something couched in a vague mysticism about our inner-workings.
Humans apparently create something from nothing, without the recombination of any prior knowledge or outside information, and they get it right on the first try. Through what, divine inspiration from the God who made us and only us in His image?
But you claimed that humans aren't unique. I think it's pretty obvious we are on many dimensions including what you might classify as "intelligence". You don't even necessarily have to believe in a "soul" or something like that, although many people do. The capabilities of a human far surpass every single AI to date, and much more efficiently as well. That we are able to brute-force a simulacrum of intelligence in a few narrow domains is incredible, but we should not denigrate humans when celebrating this.
> There's still this seeming belief in magic and human exceptionalism, deeply held, even in communities that otherwise tend to revolve around the sciences and the empirical.
Do you ever wonder why that is? I often wonder why tech has so many reductionist, materialist, and quite frankly anti-human, thinkers.
I think it comes from a position of arrogance/ego. I'll speak for the US here, since that's what I know the most; but the average 'techie' in general skews towards the higher intelligence numbers than the lower parts. This is a very, very broad stroke, and that's intentional to illustrate my point. Because of this, techie culture gains quite a bit of arrogance around it with regards to the masses. And this has been trained into tech culture since childhood. Whether it be adults praising us for being "so smart", or that we "figured out the VCR", or some other random tech problem that literally almost any human being can solve by simply reading the manual.
What I've found, in the vast majority of technical problem solving cases that average people have challenges with, if they just took a few minutes to read a manual they'd be able to solve a lot of it themselves. In short, I don't believe as a very strong techie that I'm "smarter than most", but rather that I've taken the time to dive into a subject area that most other humans do not feel the need nor desire to do so.
There are objectively hard problems in tech to solve, but the amount of people solving THOSE problems in the tech industry are few and far in between. And so the tech industry as a whole has spent the last decade or two spinning circles on increasingly complex systems to continue feeding their own egos about their own intelligence. We're now at a point that rather than solving the puzzle, most techies are creating incrementally complex puzzles to solve because they're bored of the puzzles that are in front of them. "Let me solve that puzzle by making a puzzle solver." "Okay, now let me make a puzzle solver creation tool to create puzzle solvers to solve the puzzle." and so forth and so forth. At the end of the day, you're still just solving a puzzle...
But it's this arrogance that really bothers me in the tech bro culture world. And, more importantly, at least in some tech bro circles, they have realized that their target to gathering an exponential increase in wealth doesn't lie in creating new and novel ways to solve the same puzzles, but to try and tout AI as the greatest puzzle solver creation tool puzzle solver known to man (and let me grift off of it for a little bit).
https://epoch.ai/frontiermath/open-problems
Interesting. Whats that “scaffold”? A sort of unit test framework for proofs?
I think there's quite a bit of variance in model performance depending on the scaffold so comparisons are always a bit murky.
> The author assessed the problem as follows.
> [number of mathematicians familiar, number trying, how long an expert would take, how notable, etc]
How reliably can we know these things a-priori? Are these mostly guesses? I don't mean to diminish the value of guesses; I'm curious how reliable these kinds of guesses are.
For "how long an expert would take" to solve a problem, for truly open problems I don't think you can usually answer this question with much confidence until the problem has been solved. But once it has been solved, people with experience have a good sense of how long it would have taken them (though most people underestimate how much time they need, since you always run into unanticipated challenges).
In all seriousness, this is pretty cool. I suspect that there's a lot of theoretical math that haven't been solved simply because of the "size" of the proof. An AI feedback loop into something like Isabelle or Lean does seem like it could end up opening up a lot of proofs.
Hoping that won't be the case with AI but we may need some major societal transformations to prevent it.
I am not necessarily saying humans do something different either, but I have yet to see a novel solution from an AI that is not simply an extrapolation of current knowledge.
My biggest hesitation with AI research at the moment is that they may not be as good at this last step as humans. They may make novel observations, but will they internalize these results as deeply as a human researcher would? But this is just a theoretical argument; in practice, I see no signs of progress slowing down.
Sometimes just having the time/compute to explore the available space with known knowledge is enough to produce something unique.
We've had a few decades to address email spam, and still haven't manage to disincentivize it enough to stop being the main challenge for email as a communication medium. I don't think there's much hope that we'll be able to disincentive the widespread, large-scale creation of AI slop even after more expensive models with higher-quality output are available.
Can an AI pose an frontier math problem that is of any interest to mathematicians?
I would guess 1) AI can solve frontier math problems and 2) can pose interesting/relevant math problems together would be an "oh shit" moment. Because that would be true PhD level research.