OpenAI published a result in which one of their models disproved a long-standing conjecture in discrete geometry — a counterexample that human mathematicians had not produced. The novelty here is less that “AI did mathematics” and more that the proof artefact is a concrete object that mathematicians can now verify and build on. That is a different mode of contribution from the usual “passes the benchmark” framing.
A complementary thread runs through the week’s arXiv drop. “Variance Reduction for Expectation with Diffusion Teachers” works on the numerical side of the same problem: improving the efficiency with which models estimate quantities they are nominally good at, by using diffusion-trained teachers as control variates. It is a small, sharp paper — the kind that is easy to overlook between training-run announcements but that quietly changes what is cheap to compute.
The second thread is theoretical. “LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws” reframes scaling-law work in information-theoretic terms, and uses that lens to talk about phenomena that have been treated as separate puzzles — catastrophic overtraining, quantisation-induced degradation, and the various non-monotonic behaviours people have noticed at the edges of training runs. Whether the framework survives empirical scrutiny is the next question; right now it is an organising story rather than a settled theory. Alongside it, “Good Token Hunting: A Hitchhiker’s Guide to Token Selection for Visual Geometry Transformers” is the more applied version of the same impulse — reducing input sequence length so visual-geometry transformers can predict multiple 3D attributes in a single forward pass without the usual quadratic blow-up in compute.
The third thread is applied agents — specifically the part of the agent literature that bothers to be evaluable. “PhotoFlow: Agentic 3D Virtual Photography Missions” asks an agent to infer suitable camera shots from scene information and natural-language intent. The interesting part is not that the agent can do it, but that the task is well-specified enough to measure. A lot of agent papers right now have the opposite shape: vague task, hand-waved evaluation, an evocative demo video. PhotoFlow has edges.
Put the three together and the week reads as a small step away from “bigger model, bigger benchmark” toward concrete artefacts — a real counterexample, a tighter theoretical lens, a task with a precise success criterion. None of it is a phase change on its own. The interesting question is whether the next three months keep producing this kind of work, or whether the next training-run announcement resets the conversation.