Mathematics Is Experiencing the Biggest Disruption in 2,000 Years
When Machines Learn to Reason: How Mathematical AI Breaks Economic Structures
The Three Inflection Points
In 2015, researchers at the Allen Institute posed a deceptively simple challenge: Can AI pass elementary school science and math exams?
Nine years later, the answer reveals something economists have missed entirely.
It’s not that AI got better at math. It’s that the bottleneck of knowledge economy just shifted locations…and nobody noticed because everyone was watching the wrong layer.
Part 1: The First Inflection (2015)
Peter Clark’s Aristo Challenge seems quaint now. Fourth-grade questions like:
“A student puts two identical plants in the same soil. She gives them the same water. She puts one near a sunny window and the other in a dark room. This experiment tests how plants respond to: (A) light (B) air (C) water (D) soil”
Clark’s observation was surgical: Information retrieval fails here. You can’t just pattern-match “plant” + “sunny” to the answer. You need world knowledge, modeling, reasoning.
The question requires what Clark called a “test bed for intuition”—the ability to construct a hypothesis about causality, verify it mentally, and distinguish it from noise.
This, in 2015, AI couldn’t do reliably.
Why does this matter?
Because elementary school science is not an outlier. It represents the boundary of applied reasoning in the real world. Biotech labs, trading floors, and energy facilities don’t ask different questions…they ask more complex versions of the same question.
The real economy runs on Aristo-level reasoning at scale.
Part 2: The Second Inflection (2021)
Six years later, DeepMind published something that looked like a footnote but was actually the entire game board shifting.
Researchers used machine learning to guide human mathematicians to prove new theorems.
Not solve existing problems. Not verify proofs. Discover new relationships that had never been proven before.
Here’s the mechanism they built:
Mathematicians propose a hypothesis: “Does relationship X exist between objects A and B?”
Train a supervised learning model to detect patterns in data
Use attribution techniques (gradient saliency) to understand why the model detected patterns
Let the attribution output guide mathematical intuition toward new conjectures
Humans formalize the proof
The results weren’t academic; they were consequential. The team discovered:
A new connection between algebraic and geometric invariants in knot theory, bridging two mathematical disciplines that had been separate for decades
A resolution path for the 40-year-old combinatorial invariance conjecture in representation theory
The key insight: The AI didn’t solve the problem. The AI expanded the search space that human intuition could meaningfully explore.
This is different from 2015 in a specific way: The bottleneck moved from “can AI understand?” to “can AI accelerate discovery in problems humans haven’t solved yet?”
Aristo said: “AI needs reasoning.”
DeepMind showed: “Reasoning AI becomes infrastructure for discovery.”
Part 3: The Third Inflection (2024)
Now comes the part that breaks economic models.
In 2024, researchers generated a new math dataset by deliberately combining two different skills from existing problems. Not mixing easy versions—but forcing AI to compose skills that rarely interact.
For instance: A geometry question that requires number theory. A probability problem that needs modular arithmetic. Compositional problems where success requires fluency in two distinct, unrelated domains.
What happened?
Models collapsed.
If a model had 50% success on standard MATH problems, it had ~25% success on MATH2 (composed problems).
50% → 25%. That’s not a performance decline. That’s a qualitative change.
Then researchers discovered something stranger. When they plotted the relationship between MATH performance (X) and MATH2 performance (Y):
Y ≈ X²
This isn’t noise. It’s a pattern. And it has an implication they buried in a footnote:
“This reasoning in fact suggests that our pipeline has created questions that genuinely required applying two very distinct skills (as opposed to, say, requiring primarily skill i, and mildly using skill j).”
What the data is saying: True reasoning problems scale differently than pattern recognition problems.
For pattern-matching: difficulty is additive. For compositional reasoning: difficulty is multiplicative.
This is the third inflection point. And it’s the one that breaks economics.
Part 4: The Economic Restructuring
Let me connect the dots in a way that matters to incentives:
Layer 1: Knowledge Ceases to Be Scarce
Aristo (2015) posed the question: “Can AI reason?”
By 2024, the answer is clearly: “On circumscribed problems, yes.”
GPT-4 can pass bar exams. Claude solves undergraduate physics. Specialized models beat humans on narrow tasks.
The economic implication: Raw knowledge no longer creates moat.
A biotech startup can’t compete on “knowing more chemistry.” Knowledge is indexed. It’s accessible. The asymmetry is gone.
Layer 2: Discovery Becomes the Constraint
But here’s where it gets interesting.
DeepMind’s 2021 result shows that AI doesn’t replace discovery—it scales discovery. The mathematicians still had to form the hypothesis. The AI expanded the region they could search.
This matters because discovery is the one thing that cannot be commoditized.
You can commoditize:
Information retrieval (Google did this)
Pattern recognition (neural networks do this)
Execution (cloud compute does this)
You cannot commoditize genuine discovery. There is no algorithm for asking the right questions in domains where no one knows the answer yet.
Until now, discovery was gated by:
How many PhDs can you hire?
How much lab time can you afford?
How many experiments can you run before capital runs out?
Reasoning AI changes all three constraints.
Layer 3: Composition Becomes Rare
The MATH2 dataset reveals the hard part: Compositional reasoning.
Not mastery of skill A or skill B independently.
The ability to apply skill A and skill B together in a novel context to solve a problem no one has solved before.
This is where the multiplicative scaling matters.
If a biotech company needs to combine:
Protein folding (skill A)
Computational chemistry (skill B)
Drug delivery optimization (skill C)
Success rate for someone who’s 80% competent in each: 0.8 × 0.8 × 0.8 = 51%.
With reasoning AI that can explore the composition space and guide researchers: success rate approaches 80-90%.
The multiplicative model means compositional reasoning AI has extreme leverage in domains where problems require multiple specialties.
This is why it matters for:
Biotech (protein structure + chemistry + pharmacokinetics)
Materials science (physics + chemistry + manufacturing constraints)
Nuclear/Energy (reactor design + materials + control systems)
Finance (market microstructure + macroeconomics + risk systems)
Not because these fields need “more AI.”
Because they need AI that can reason about composition—the hard part.
Part 5: What Gets Destroyed (And Built)
Here’s the economic restructuring:
What Dies
Talent-based competition in knowledge domains; If competitive advantage came from “hiring the smartest people who know domain X,” that moat evaporates when machines reason about domain X better than humans.
Slow discovery cycles; R&D timelines that take 5-10 years compress. This kills the venture capital model of “wait 8 years, hope you hit the inflection.” It also kills regulatory models built on “rare expertise takes time to develop.”
Information asymmetries; The companies that hoarded proprietary data because “only we have 100 years of experimental results” lose that advantage when reasoning systems can extract pattern from that data and generalize faster than humans can.
What Gets Built
AI Reasoning Infrastructure as Strategic Asset; Not “AI tools.” Infrastructure. The companies that own reasoning models that can compose across domains own the discovery layer.
New Kinds of Scientist; Not “human replaced by machine.” Instead: hybrid workflows where the human asks the right compositional questions and the AI explores the space exhaustively.
Vertical Concentration; Every domain that requires compositional reasoning will see winner-take-most dynamics. If one biotech company has reasoning AI that cuts discovery timelines by 60%, they’ll capture the entire discovery pipeline in their category.
National AI Capability as Economic Weapon; Countries that own reasoning systems own scientific discovery. This is not hyperbole. It’s the logical conclusion of what DeepMind and MATH2 prove.
Part 6: The Flywheel (This Is Where It Gets Interesting)
Here’s the self-reinforcing loop:
Company A owns reasoning AI that can compose protein folding + chemistry + manufacturing
Company A discovers drugs 5x faster than competitors
More capital flows to Company A because success rate is demonstrable
More experimental data feeds Company A’s reasoning system
Reasoning system gets better, discovery cycles compress further
Competitors can’t catch up because they don’t have the data moat
This is different from previous tech moats because:
Google’s moat was infrastructure (search index)
AWS’s moat was compute (data centers)
This moat is discovery velocity
And discovery velocity is the one thing that can’t be replicated by hiring talent or throwing compute at a problem. You need the reasoning system + the data + the institutional knowledge + the hybrid workflow.
This creates winner-take-most dynamics at the discovery layer.
Part 7: Why This Changes Everything
The traditional economic model of industries like biotech:
Talent scarcity drives value
Drug discovery is slow, so patents have value
Incumbent pharma dominates because they have institutional knowledge
Regulatory barriers protect incumbents
Reasoning AI rewrites this:
Talent becomes a multiplier on reasoning systems, not the constraint
Drug discovery accelerates (which either speeds up revenue or compresses patent windows)
Institutional knowledge becomes a training dataset that reasoning systems can leverage
Regulatory barriers become less relevant if discovery timelines compress by 60%
The companies that win are the ones that:
Have access to reasoning infrastructure (DeepMind, OpenAI capability, or owned internally)
Have compositional domain expertise (scientists who know how to ask the right questions)
Have data (experimental results, simulations, proprietary insights)
Have institutional willingness to restructure workflows around AI-human collaboration
Not talent. Not capital. Access to reasoning + ability to compose domain expertise + data moat.
Part 8: The Unasked Question
Here’s what nobody in biotech, energy, or materials science is asking yet:
What happens when reasoning timescales compress to weeks?
Current model: 5-year R&D cycles → licensing deals → 20-year patent protection
New model: 6-month discovery → 6-month optimization → scale
If discovery timescales compress, the entire venture capital, pharma licensing, and patent economics collapse.
Not incrementally. Structurally.
You can’t sustain:
$500M clinical trial budgets if discovery only took 6 months
Patent-driven pricing models if patents only protect for 5 years of a compressed cycle
Acquisition-based business models if internal teams can out-discover acquisitions
The entire economic layer reorganizes.
The Bottom Line
Aristo (2015) asked: “Can machines reason?”
DeepMind (2021) answered: “Yes. And they can guide humans to discover things no human would have found.”
MATH2 (2024) revealed: “The limiting constraint isn’t intelligence anymore. It’s composition. And compositional reasoning scales multiplicatively.”
The companies and countries that recognize this first—that the bottleneck has moved from “knowledge access” to “reasoning infrastructure to compositional discovery”—will own the next 20 years of scientific progress.
Everyone else will be explaining why their 50-year-old talent still matters in a world where the machine learned to reason about compositions they never imagined.
The economic operating system of the AI era isn’t “automate tasks.”
It’s “own discovery infrastructure.”
And the inflection point is already here.
Sources
Clark, P. et al. (2015). “Elementary School Science and Math Tests as a Driver for AI: Take the Aristo Challenge!” Proceedings of the Twenty-Seventh Conference on Innovative Applications of Artificial Intelligence, AAAI.
Davies, A., Veličković, P., Buesing, L., et al. (2021). “Advancing mathematics by guiding human intuition with AI.” Nature, 600, 70–74. https://doi.org/10.1038/s41586-021-04086-x
Shah, V., Yu, D., Lyu, K., et al. (2024). “AI-Assisted Generation of Difficult Math Questions.” arXiv:2407.21009v4. Preprint submitted for review.

