Not which model is smartest. How to make any mind more original, and still useful, on demand. We made the modes that move it.
Every idea is two things at once. How far from the obvious it is. How good it is. Most AI gives you near and safe. Push it to "be creative" and you get far and useless. The corner that matters is far AND good. We measure how often a mind lands there.
One corner is the whole game. We measure how often a mind reaches far and good. That is the test.
Two clusters drift apart. One is your problem. One is a far-off domain. Force a bridge between them and watch the idea fall out of the collision. Keep what is both far and useful.
Bisociation. Force a collision between your problem and a distant domain. Keep what is both far and useful.
The field said the frontier of original-and-useful output barely moves. We made the modes that move it.
Forced bisociation, preamble-free. Run it on twelve minds across roughly ten labs. GPT-5, Claude, Gemini, Llama 4, Qwen, DeepSeek. All swept clean. Even a non-transformer. It is a law, not a trick.
Generate many forced collisions at once. The far-and-good ones brighten. The rest fade. This is best-of-N selection. It works when the selector is smart enough to tell which is which.
The reset. Over-generate forced collisions. Keep only the far-and-good. Works when the selector is smart.
Everyone else ranks how creative a model is. We rank the move that makes any model think better. We own the category because we measure the thing nobody else does.
| What it measures |
Whose creativity |
Works on closed models |
Tests modulation techniques |
Cross-lab law |
Reports failures |
Human transfer |
|
|---|---|---|---|---|---|---|---|
| MMLU / HumanEvalcapability benchmarks | knowledge, code | model | yes | no | no | n/a | no |
| LiveIdeaBench / EQ-Benchmodel creativity rankings | creative output | model | yes | no | no | partial | no |
| Novelty-FrontierarXiv 2504.09389, closest prior art | original + high-quality | model + a few prompts | no, needs open data | a few | 3 open families | yes | no |
| The Modulation Testcognitive modulation measurement | far-and-good lift | the technique, on any mind | yes | yes, its whole point | 12 models / ~10 labs | yes, loudly | yes, by design |
We did not invent the original-and-useful frontier. The closest prior work measured it and concluded you mostly cannot move it. We found the move that does, model-agnostic, technique-first, built to cross to humans.
We measure distance from each model's own default cloud, semantic and embedding-based. It works on closed frontier models too, not only open-data ones.
A standard for the moves that improve any mind. Not a leaderboard of models. We rank what works, on whatever you point it at.
Same move, 12 models, roughly 10 labs, 11 of 12 significant, a 6 of 6 frontier sweep, even a non-transformer. Nobody else tests for universality across labs.
We publish where it fails. Stacking does not compound. Selection backfires on a weak model. The honesty is the product.
The only one designed as a wind-tunnel. Validate a thinking move on machines cheaply, carry the winners to people. No creativity benchmark does this.
Stacking six cognitive modes does not compound. It dilutes. Selection backfires on a weak model. We published all of it.
We are not asking you to believe us. We are handing you the ruler.
The harness runs on a free model. The ruler is yours.
Prefer the full story? Read the founding essay.