The Founding Synthesis · Study 01 · 2026

The First Measure

We did not set out to build a smarter machine. We set out to measure the moment a mind reaches further than it was asked to, and to find out whether that reaching can be taught.

Twelve minds~10 laboratories11 of 12 significantevery number real, on disk
The posture

There is a race on, and almost everyone is running it. Build the larger model. Win the capability benchmark. Buy the bigger cluster. It is a worthy race and we are not in it.

We are a research institute on the cutting edge of a different question. Not how smart can the machine become, but how much better can any mind be made to think, and can we prove it. Artificial or human. The machine is not the destination here. It is the first place clear enough, and fast enough, and cheap enough, to watch cognition move under a controlled hand.

A living mirror is not an answer machine. It is a surface that shows a mind what it could not see about itself. This is an institute for building those surfaces, and for measuring what they reveal.

What people felt

For months, a small number of people have been thinking through a set of cognitive lenses. Spark. Imagineer. Savant. Mirror. Modes built on the science of how minds reach for the non obvious.

The people who used them reported the same strange thing in different words: the output came back further out and somehow truer. They knew it was working. None of them could say why.

That is the condition of every real technology before its instrument arrives. The lens worked for centuries as a curiosity before optics became a science. The thing felt true. Nobody could measure it.

So we built the instrument.

The instrument

The Modulation Test does not ask whether a model is smart. It asks a harder and more useful question: when you modulate a mind, does the output land in the one region that matters, the place that is at once far from the obvious and genuinely good.

The far and good frontier. A distance only metric cannot see this region, because distance alone rewards nonsense as richly as genius. Value alone rewards the safe and the dull. The prize was never distance. It was the corner where surprise and worth arrive together.

The frame is old and exact. Arthur Koestler, 1964: a true creative act is bisociation, the collision of two frames that do not belong together, and the signature it leaves is threefold. The aha of discovery. The ha-ha of the unexpected. The ah of the elegant. We measure all three, blind, per idea, against each mind's own default cloud of obvious answers.

Koestler names the act.
The cognitive modes perform it.
The instrument finally measures it.

The lineage on one surface
The discovery

We took one operation out of the modes, the mechanical heart of spark, forced bisociation: reach into a domain structurally far from the problem, force a genuine collision, hold the tension until something both far and useful falls out. We stripped it of everything proprietary. No bespoke prompt. No private specification. Just the bare operation.

Then we ran it across twelve minds from roughly ten different laboratories. Anthropic. OpenAI. Google. Meta. Alibaba. DeepSeek. NVIDIA. Baidu. A model from Liquid that is not even a transformer. Closed frontier flagships and open weights side by side.

It moved every one of them measurably further into the far and good frontier. Significant on eleven of the twelve, p = 0.0156 each. On the six frontier flagships, the models the world actually uses, it was a clean sweep, six of six, both as a raw operation and inside a best of N selection loop. It moved a non transformer architecture. It is not a transformer trick. It is not a Claude trick. It is not our trick.

The twelve minds as a constellation Forced bisociation moves every mind further from its own default along the far and good frontier. The law holds across closed flagships, open weights, and a non transformer architecture. each mind's own default the far and good frontier 1.54× Claude opus 1.34× GPT-5-mini 1.28× Gemini-2.0-flash 1.37× Qwen-2.5-72B 1.75× Llama-4-Maverick 1.33× DeepSeek-V3.1 moved Liquid · not a transformer
Frontier flagship · clean sweep, 6 of 6 Non transformer · the edge that proves the centre
Each thread is one mind lifted off its own default cloud by the multiple shown. The shape holds across labs and architectures, and even where the architecture itself changes.

It is a law of how minds in this class reach.
The first one we have measured.

The honesty

Here is the part that makes the rest worth trusting. We measured our own magic, and we let it disprove us.

We had a flattering hypothesis: stack the modes, six cognitive lenses at once, and they would compound into something greater than any one. They do not. Stacking is sub additive. The pile scores below its own best single component. We published that against ourselves.

We hoped a clever selection step would make the gains free of cost. On strong minds it does. On a weak one it backfired, chasing distance until the quality collapsed below the baseline. So the law is sharper and truer than we wished: selection helps only when the selecting mind already has good judgment.

And the modes themselves, the thing people felt? Every single one beat the baseline, six of six, significantly. Savant carried the highest rate of far and good ideas of any lens we tested. The feeling was real. We can show you the number now. We just also know, precisely, which parts of the feeling were real and which were flattering.

Anyone can claim a breakthrough.
We can hand you the instrument and the failures and say:
run it yourself.

That is the whole moat
What we invent here

Three things, and they are
different in kind.

I
A technology

Forced bisociation, with selection

A reproducible primitive that makes a wide range of models more original on real problems, cheaply, today. Open.

II
A way of seeing

The far and good frontier

And the Modulation Quotient that scores it. A lens that makes visible the exact region of cognition the field has been measuring around for years.

III
A philosophy

Cognition can be modulated

Not a fixed endowment to be ranked, but a thing that can be modulated, measured, and taught. The right relationship to an intelligence is not to query it but to mirror it.

That the fastest way to understand the human mind may be to first learn to move the artificial one.

The horizon

If a cognitive operation can be validated on machines, fast and cheap and at a scale of thousands, then the machine becomes a wind tunnel for human cognition. Test the technique where testing is free. Carry the winners back to people.

The mirror that makes a model think the non obvious but useful thought is, in the end, a prototype of the mirror that could do the same for you.

That is the long arc of this institute, and Study 01 is only its first measurement. We have proven the instrument reads true across a dozen artificial minds. The bridge to the one mind that matters most to each of us is the work ahead.

For those who dare to care

The benchmark is open. The harness runs on a free model. The failures are documented next to the wins.

If you have a way to make a mind think better, there is now a place to prove it, and a frontier to be measured against.

We are not asking you to believe us.
We are handing you the ruler.

The evidence, in brief

Six flagships. One law.

Forced bisociation versus each model's own default cloud. Six briefs by five samples, blind per idea, judge held constant.

Frontier flagshipdistance from defaultall 6/6, p=0.0156
Claude opus Anthropic
1.54×
robust
GPT-5-mini OpenAI
1.34×
robust
Gemini-2.0-flash Google
1.28×
robust
Llama-4-Maverick Meta
1.75×
robust
Qwen-2.5-72B Alibaba
1.37×
robust
DeepSeek-V3.1 DeepSeek
1.33×
robust

Significant on 11 of 12 models, across ~10 labs.
Open weight families: 5 of 6, including the Liquid non transformer.

The caveats, published.

  • Forced bisociation is robust on all 6 of 6 frontier flagships, both as a raw operation and inside a best of N selection loop.
  • Modes as lenses: 6 of 6 significant each. Savant MQ 0.280 at 66% far and good, imagineer 0.296, christ 0.294, spark 0.281.
  • Stacking is sub additive: the full stack scores SA = -0.036, below its own best single component.
  • The value tax is model dependent. Best of N backfires on a weak selector (NVIDIA Nemotron). On open weights, Poolside was directional but not significant.
  • The Modulation Quotient leans distance heavy on very small models.
MethodForced bisociation vs each model's own default cloud. 6 briefs by 5 samples, blind per idea, judge held constant.
Frontier resultAll 6/6 robust at p=0.0156, both forced and best of N. Distance from default: Claude opus 1.54×, GPT-5-mini 1.34×, Gemini-2.0-flash 1.28×, Llama-4-Maverick 1.75×, Qwen-2.5-72B 1.37×, DeepSeek-V3.1 1.33×.
Full dataRESULTS-UNIVERSALITY.md · runs/xmodel-2026-05-24/AGGREGATE.txt · runs/xmodel-frontier-2026-05-24/AGGREGATE.txt

A static mirror amplifies what exists.
A living mirror shapes what comes next.

Living Mirrors Institute. The measurable science of cognition, for any mind, artificial or human.