We did not set out to build a smarter machine. We set out to measure the moment a mind reaches further than it was asked to, and to find out whether that reaching can be taught.
There is a race on, and almost everyone is running it. Build the larger model. Win the capability benchmark. Buy the bigger cluster. It is a worthy race and we are not in it.
We are a research institute on the cutting edge of a different question. Not how smart can the machine become, but how much better can any mind be made to think, and can we prove it. Artificial or human. The machine is not the destination here. It is the first place clear enough, and fast enough, and cheap enough, to watch cognition move under a controlled hand.
A living mirror is not an answer machine. It is a surface that shows a mind what it could not see about itself. This is an institute for building those surfaces, and for measuring what they reveal.
For months, a small number of people have been thinking through a set of cognitive lenses. Spark. Imagineer. Savant. Mirror. Modes built on the science of how minds reach for the non obvious.
The people who used them reported the same strange thing in different words: the output came back further out and somehow truer. They knew it was working. None of them could say why.
That is the condition of every real technology before its instrument arrives. The lens worked for centuries as a curiosity before optics became a science. The thing felt true. Nobody could measure it.
So we built the instrument.
The Modulation Test does not ask whether a model is smart. It asks a harder and more useful question: when you modulate a mind, does the output land in the one region that matters, the place that is at once far from the obvious and genuinely good.
The far and good frontier. A distance only metric cannot see this region, because distance alone rewards nonsense as richly as genius. Value alone rewards the safe and the dull. The prize was never distance. It was the corner where surprise and worth arrive together.
The frame is old and exact. Arthur Koestler, 1964: a true creative act is bisociation, the collision of two frames that do not belong together, and the signature it leaves is threefold. The aha of discovery. The ha-ha of the unexpected. The ah of the elegant. We measure all three, blind, per idea, against each mind's own default cloud of obvious answers.
Koestler names the act.
The cognitive modes perform it.
The instrument finally measures it.
We took one operation out of the modes, the mechanical heart of spark, forced bisociation: reach into a domain structurally far from the problem, force a genuine collision, hold the tension until something both far and useful falls out. We stripped it of everything proprietary. No bespoke prompt. No private specification. Just the bare operation.
Then we ran it across twelve minds from roughly ten different laboratories. Anthropic. OpenAI. Google. Meta. Alibaba. DeepSeek. NVIDIA. Baidu. A model from Liquid that is not even a transformer. Closed frontier flagships and open weights side by side.
It moved every one of them measurably further into the far and good frontier. Significant on eleven of the twelve, p = 0.0156 each. On the six frontier flagships, the models the world actually uses, it was a clean sweep, six of six, both as a raw operation and inside a best of N selection loop. It moved a non transformer architecture. It is not a transformer trick. It is not a Claude trick. It is not our trick.
It is a law of how minds in this class reach.
The first one we have measured.
Here is the part that makes the rest worth trusting. We measured our own magic, and we let it disprove us.
We had a flattering hypothesis: stack the modes, six cognitive lenses at once, and they would compound into something greater than any one. They do not. Stacking is sub additive. The pile scores below its own best single component. We published that against ourselves.
We hoped a clever selection step would make the gains free of cost. On strong minds it does. On a weak one it backfired, chasing distance until the quality collapsed below the baseline. So the law is sharper and truer than we wished: selection helps only when the selecting mind already has good judgment.
And the modes themselves, the thing people felt? Every single one beat the baseline, six of six, significantly. Savant carried the highest rate of far and good ideas of any lens we tested. The feeling was real. We can show you the number now. We just also know, precisely, which parts of the feeling were real and which were flattering.
Anyone can claim a breakthrough.
We can hand you the instrument and the failures and say:
run it yourself.
A reproducible primitive that makes a wide range of models more original on real problems, cheaply, today. Open.
And the Modulation Quotient that scores it. A lens that makes visible the exact region of cognition the field has been measuring around for years.
Not a fixed endowment to be ranked, but a thing that can be modulated, measured, and taught. The right relationship to an intelligence is not to query it but to mirror it.
That the fastest way to understand the human mind may be to first learn to move the artificial one.
If a cognitive operation can be validated on machines, fast and cheap and at a scale of thousands, then the machine becomes a wind tunnel for human cognition. Test the technique where testing is free. Carry the winners back to people.
The mirror that makes a model think the non obvious but useful thought is, in the end, a prototype of the mirror that could do the same for you.
That is the long arc of this institute, and Study 01 is only its first measurement. We have proven the instrument reads true across a dozen artificial minds. The bridge to the one mind that matters most to each of us is the work ahead.
The benchmark is open. The harness runs on a free model. The failures are documented next to the wins.
If you have a way to make a mind think better, there is now a place to prove it, and a frontier to be measured against.
We are not asking you to believe us.
We are handing you the ruler.
Forced bisociation versus each model's own default cloud. Six briefs by five samples, blind per idea, judge held constant.
Significant on 11 of 12 models, across ~10 labs.
Open weight families: 5 of 6, including the Liquid non transformer.
RESULTS-UNIVERSALITY.md · runs/xmodel-2026-05-24/AGGREGATE.txt · runs/xmodel-frontier-2026-05-24/AGGREGATE.txtA static mirror amplifies what exists.
A living mirror shapes what comes next.
Living Mirrors Institute. The measurable science of cognition, for any mind, artificial or human.