The Modulation Test · We measure how to make any AI think better

What we measure

Far and good.

Every idea is two things at once. How far from the obvious it is. How good it is. Most AI gives you near and safe. Push it to "be creative" and you get far and useless. The corner that matters is far AND good. We measure how often a mind lands there.

Useful↑the y axis↓Useless

Near · Good

The safe default

What AI hands you anyway.

obvious, fine,
forgettable

✦

Far · Good

Surprising AND useful

The only corner that counts.

far + good
= the move

Near · Bad

Junk

Obvious and still wrong.

no distance,
no value

Far · Bad

Creative noise

Weird for the sake of weird.

far, but
useless

ObviousFar

FAR + GOOD = THE MOVE

One corner is the whole game. We measure how often a mind reaches far and good. That is the test.

The move, made visible

This is bisociation.

Two clusters drift apart. One is your problem. One is a far-off domain. Force a bridge between them and watch the idea fall out of the collision. Keep what is both far and useful.

Your problem A far-off domain

click to force a collision

Bisociation. Force a collision between your problem and a distant domain. Keep what is both far and useful.

What we found

One move. It is a law.

The field said the frontier of original-and-useful output barely moves. We made the modes that move it.

The move

what we tested

1

move
forced bisociation

The scale

where we tested it

12

AI minds
generated the ideas

~10

distinct labs
open, frontier, and one non-transformer

The proof

what held up

11 / 12

significant
p = 0.0156

6 / 6

frontier flagships swept
GPT-5, Claude, Gemini, Llama 4, Qwen, DeepSeek

1

even a non-transformer
not a transformer quirk

The move

Collide the problem with a far-off domain.
Keep what is both far and useful.

Forced bisociation, preamble-free. Run it on twelve minds across roughly ten labs. GPT-5, Claude, Gemini, Llama 4, Qwen, DeepSeek. All swept clean. Even a non-transformer. It is a law, not a trick.

The reset, made visible

Over-generate. Then keep the best.

Generate many forced collisions at once. The far-and-good ones brighten. The rest fade. This is best-of-N selection. It works when the selector is smart enough to tell which is which.

click to generate, then select

The reset. Over-generate forced collisions. Keep only the far-and-good. Works when the selector is smart.

The category

Cognitive modulation measurement.

Everyone else ranks how creative a model is. We rank the move that makes any model think better. We own the category because we measure the thing nobody else does.

	What it measures	Whose creativity	Works on closed models	Tests modulation techniques	Cross-lab law	Reports failures	Human transfer
MMLU / HumanEvalcapability benchmarks	knowledge, code	model	yes	no	no	n/a	no
LiveIdeaBench / EQ-Benchmodel creativity rankings	creative output	model	yes	no	no	partial	no
Novelty-FrontierarXiv 2504.09389, closest prior art	original + high-quality	model + a few prompts	no, needs open data	a few	3 open families	yes	no
The Modulation Testcognitive modulation measurement	far-and-good lift	the technique, on any mind	yes	yes, its whole point	12 models / ~10 labs	yes, loudly	yes, by design

We did not invent the original-and-useful frontier. The closest prior work measured it and concluded you mostly cannot move it. We found the move that does, model-agnostic, technique-first, built to cross to humans.

Why this is the standard

Four absolutes. And one more.

I

Model-agnostic

We measure distance from each model's own default cloud, semantic and embedding-based. It works on closed frontier models too, not only open-data ones.

II

Technique-first

A standard for the moves that improve any mind. Not a leaderboard of models. We rank what works, on whatever you point it at.

III

Proven as a law

Same move, 12 models, roughly 10 labs, 11 of 12 significant, a 6 of 6 frontier sweep, even a non-transformer. Nobody else tests for universality across labs.

IV

Honest by construction

We publish where it fails. Stacking does not compound. Selection backfires on a weak model. The honesty is the product.

V

Built to cross to humans

The only one designed as a wind-tunnel. Validate a thinking move on machines cheaply, carry the winners to people. No creativity benchmark does this.

✶

The honest part

We measured our own magic.
Then we let it disprove us.

Stacking six cognitive modes does not compound. It dilutes. Selection backfires on a weak model. We published all of it.

We are not asking you to believe us. We are handing you the ruler.

We measure how to
make any AI think better.

Far and good.

This is bisociation.

One move. It is a law.

Collide the problem with a far-off domain.
Keep what is both far and useful.

Over-generate. Then keep the best.

Cognitive modulation measurement.

Four absolutes. And one more.

Model-agnostic

Technique-first

Proven as a law

Honest by construction

Built to cross to humans

We measured our own magic.
Then we let it disprove us.

If you have a way to make a mind
think better, there is now a place to prove it.

Far and good.

This is bisociation.

One move. It is a law.

Collide the problem with a far-off domain.Keep what is both far and useful.

Over-generate. Then keep the best.

Cognitive modulation measurement.

Four absolutes. And one more.

Model-agnostic

Technique-first

Proven as a law

Honest by construction

Built to cross to humans

We measured our own magic.Then we let it disprove us.

If you have a way to make a mindthink better, there is now a place to prove it.

Collide the problem with a far-off domain.
Keep what is both far and useful.

We measured our own magic.
Then we let it disprove us.

If you have a way to make a mind
think better, there is now a place to prove it.