UPLABS · DISPATCH 060 · AI ATLAS '26 FEB 2026 · 32 MIN · 60 PP

The year
physical AI
ate the roadmap.

A 60-page field survey of every shipping deep-tech model in production right now. Where the moats actually are. Where they aren't. What the next twelve months will and won't do.

// authorsM. Okafor, J. Liu, R. Hartwell + lab team

// published04 Feb 2026

// dataset412 models · 117 papers · 38 deployments

// pages60 (this is the web edition)

// citationUPLABS, AI Atlas '26, dispatch #060

// table of contents

Executive summaryp.03
Method & datasetp.07
The five categories that matterp.11
Where the moats arep.19
Where the moats aren'tp.27
The deployment gapp.34
Twelve calls for the next twelve monthsp.42
What we got wrong last yearp.51
Closingp.57
References & reproduction notesp.59

// section · executive summary

Executive
summary.

p. 03 — 06

The defining shift of 2025 was not a new model release. It was the slow, mostly unannounced movement of physical-AI systems out of demos and into production lines, hospital wards, energy substations, and warehouses. We tracked 412 such systems. Thirty-eight are now load-bearing — meaning a real customer pays a real bill that depends on the model not failing in physical space.

This report is the synthesis of twelve months of fieldwork. We sat in on hundreds of integration meetings, ran our own benchmark across every model whose weights we could obtain, interviewed 84 operators, and pressure-tested the headline claims of every shipping vendor we could get on a call. The picture that emerges is sharper, and more uneven, than the trade press has suggested.

Three observations frame everything else in the report.

// observation one

The model is no longer the moat. The cost of training a competitive base model in any given physical domain dropped between 2.4× and 11× depending on category over the year. Where defensibility now lives — without exception, in every operator interview — is in the data-collection rig, the calibration recipe, the deployment harness, and the liability framework. The bits between the model and the world are now the bits worth paying for.

// observation two

The deployment gap is widening, not narrowing. Of the 412 models we tracked, 89 reported in-domain benchmark performance within 5% of state-of-the-art. Only 38 have a real customer in production. The gap between "passes the benchmark" and "doesn't break the customer" is the dominant cost in every program we examined — and it grew, year-on-year, in all five physical-AI categories.

// observation three

The simulation layer is the new platform fight. Whoever owns the simulators that produce training and evaluation data for physical AI will own the substrate of the next decade in the way that whoever owned PyTorch and CUDA owned the last one. We count seven serious contenders for that layer right now. Three are open. Four are not. The next eighteen months will likely halve that field.

If 2024 was the year the demos got real, 2025 was the year the integration costs got real with them. 2026 is the year the survivors find out which is harder.

// HEADLINE NUMBERS — AI ATLAS '26n=412

412

models tracked

in production

2.4×

cost-to-train drop

categories that matter

// why this report exists

We publish the Atlas every February for one reason: every founder, operator, and investor we work with reaches roughly the same conclusion in roughly the same order, and we can shorten that loop by writing it down. The Atlas is what we wish we had read when we were last starting from scratch.

It is not a market map. We have no opinion about which logos belong in which quadrant of which graphic. It is a field report — the things our researchers and venture partners learned by taking the meetings, reproducing the benchmarks, and watching the systems break in the wild.

Read it as one input, not five. The numbers are precise where they need to be. The judgements are ours.

// methodology, dataset definitions, inclusion criteria, and the full annotated bibliography are published in §02 and §10 of this document. The companion machine-readable dataset is at uplabs.lab/atlas-26/data under CC-BY-SA.

// section · method & dataset

How we
built it.

p. 07 — 10

A field report is only as good as its inclusion criteria. Ours were strict, and we threw out roughly a third of the models we initially shortlisted. Here is how we built the dataset and what we explicitly chose to leave out.

// inclusion criteria

To enter the Atlas a model had to satisfy all five conditions:

Physical-domain claim. The model is positioned by its authors as operating on, predicting, or controlling something in the physical world — not the document, image, or text domain.
Shipped artefact. Either the weights or a reproducible training recipe must be published, or the model must be available behind a commercial endpoint a paying customer can hit.
Non-trivial baseline. The model must beat, or claim to beat, a domain-specific baseline that is not itself a foundation model. (No comparing two foundation models to each other and calling it an advance.)
2025 evidence. The reported performance must come from a 2025 paper, blog, technical report, deployment writeup, or public benchmark run.
Reproducible eval. We must be able to either re-run the eval ourselves or convince ourselves a third party did.

Four hundred and twelve models passed all five gates. Roughly two hundred more were excluded — mostly for failing condition 4 (no 2025 evidence), with a long tail of failures on condition 3 (no non-foundation-model baseline) and condition 5 (eval not reproducible).

// ATLAS '26 — DOMAIN BREAKDOWNn=412

Robotics

Bio & chemistry

Climate & weather

Industrial sensing

Mobility & traffic

Medical imaging

Energy systems

Other

domains assigned by primary application — multi-domain models (n=22) counted in their dominant category

// what we explicitly excluded

The Atlas is a survey of physical AI. We deliberately left out three large adjacent categories:

General-purpose language and multimodal models not specifically positioned for a physical domain. These are well-tracked elsewhere.
Document-AI and back-office automation systems. These are real businesses but they are not what the Atlas is about.
Recommender and ad systems, however large the parameter count.

We also excluded all systems where the only evidence was a press release or a cinematic demo video. That excluded more applicants than every other criterion combined.

// pressure tests

For every model in the Atlas we attempted at least one of the following pressure tests, in priority order: independent benchmark reproduction, operator interview about real-world performance, paper-trail audit of training data, and physical-deployment site visit. We managed at least one for 387 of the 412 models. The remaining 25 are flagged in the dataset as UNVERIFIED and excluded from any quantitative claim in the body of this report.

Where our results disagreed with the original authors' published numbers, we wrote to the authors. Forty-one teams responded. In most cases the discrepancy was a configuration difference, which we documented and then averaged. In seven cases the discrepancy was unresolved; we report both numbers.

// the full pressure-test log, including the seven unresolved disagreements, is in the companion dataset.

// section · five categories

Five categories
that matter.

p. 11 — 18

The Atlas sorts the 412 models into five categories. The categories are deliberately use-case-shaped, not architecture-shaped, because the architectural diversity within each category is much smaller than the people inside it want to admit, and the deployment diversity is much larger than the press has acknowledged.

1 · Embodied controllers

Models that produce control signals for a robot, vehicle, or actuator. Within this we split locomotion (legged, wheeled, aerial) from manipulation (arm-and-gripper) — they share less than the literature suggests and are diverging architecturally as the year went on.

The defining shift of 2025: the move from per-task imitation-learning models to multi-task models trained on cross-embodiment data. The seven serious players in this category have all converged on roughly the same recipe — vision-language-action backbones at 3B to 30B params, trained on a mix of teleoperation, simulation, and YouTube — and the differences between them are now almost entirely on the data-collection side.

2 · Perception and world models

Models that consume sensor streams and emit either a labelled scene, an occupancy grid, or a predictive rollout. The split inside this category is whether the model is asked to recognise or to predict — and the predictive subset is where almost all the 2025 progress was. Recognition has been a solved-shaped problem since 2023; the thing that changed in 2025 is that you can now ask a model "what is going to happen in the next 6 seconds in this scene" and get an answer that is good enough to act on, in narrow domains.

3 · Scientific surrogates

Models that approximate an expensive simulator — fluid dynamics, molecular dynamics, plasma physics, weather. This is the category with the largest quantitative gains in 2025 and the smallest deployment count. Surrogates are 100 to 10,000× faster than first-principles simulation and now within engineering tolerance for an expanding list of problems. The bottleneck is regulatory, not technical: the customer wants a simulation result with a regulatory pedigree, and the surrogate doesn't have one yet.

4 · Generative design

Models that produce candidate designs — molecules, antibodies, materials, mechanical parts, circuit layouts — conditioned on a target spec. The wave of 2024 was generative chemistry; the wave of 2025 was generative engineering, with serious work shipping in mechanical CAD, ASIC layout, and packaging-engineering. The defensibility here is in the eval: a generated molecule means nothing without a wet-lab to test it, and the labs are the moat.

5 · Operational forecasting

Models that predict an operational variable in a physical system — grid demand, traffic flow, hospital admissions, supply-chain lead times. The dullest category and the largest by deployment count. These models have been shipping for years; what changed in 2025 was the move from per-customer hand-tuned models to general-purpose foundation models in the category, which finally crossed the cost-to-deploy threshold for mid-market customers.

category	count	in production	headline ∆ '25	moat live in
embodied controllers	112	4	+38% benchmark	data rig
perception & world	97	11	+22% predictive	sensor stack
scientific surrogates	71	3	up to 10,000×	regulatory
generative design	69	6	+44% hit-rate	wet-lab
operational forecasting	63	14	+11% MAPE	integration

The architectural diversity inside each category is smaller than its proponents admit; the deployment diversity is larger than its critics admit. — Atlas '26, §03 framing note

Each category gets a chapter of its own deeper in the report, with the dataset, the live deployments, the open problems, and our best read of the eighteen-month trajectory.

// section · where the moats are

Where the moats
actually are.

p. 19 — 26

The single most reliable predictor that a physical-AI program would still be running in twelve months was not its benchmark score, its parameter count, its founding team's pedigree, or its capital raised. It was whether the team owned its own data-collection rig.

We tested this claim three ways. We looked at the 38 models in production and asked which of them had bespoke collection infrastructure: 34. We looked at the dead programs — there are 47 of those in our dataset — and asked the same: 6. And we looked at the funded-but-dormant middle, where the answer was almost exactly even.

The point is not that owning a data rig causes survival. The point is that the willingness to invest two years of capex into a sensor harness, a simulator, or a teleop fleet correlates extremely tightly with the operational seriousness needed to ship a physical-AI product at all.

The four real moats

// moat one — the data rig

Every shipping embodied controller we tracked had at least one of: a teleoperation fleet of more than 100 units, a domain-specific simulator with calibration loops back to the physical world, or a long-running sensor deployment producing labelled data at rate. The cost of this infrastructure is in the high seven figures and the time-to-stand-up is twelve to eighteen months. That is the moat. It is not glamorous and it is not transferable.

// moat two — the deployment harness

The unsexy infrastructure that wraps the model: the watchdogs, the fallback policies, the human-in-the-loop interfaces, the failure-mode taxonomy, the on-call rota, the simulator-replay system that lets you debug last Thursday's incident at 10× speed. None of this is in the paper. All of it is in the moat. The teams that survived 2025 had built this; the teams that didn't, hadn't.

// moat three — the regulatory pedigree

Especially in surrogate models for engineering and in any clinical or safety-critical setting. A model that has a paper trail through the regulator is worth meaningfully more than one that doesn't, because the customer's compliance department is spending months they don't have on every new vendor. We watched a competitively inferior model win a major hospital deal on this axis alone.

// moat four — the eval

For generative-design especially: whoever owns the wet lab, the test track, the wind tunnel, the synthesis pipeline, owns the velocity of the design loop. The model is the cheap part. The closing-the-loop infrastructure is not.

// CASE NOTE · A038EMBODIED CONTROL — WAREHOUSE

The teleop fleet that built the moat.

One of the leading mobile-manipulation programs we tracked has spent more on its teleoperation fleet over the last two years than on model training. The fleet — roughly 280 units across three sites — is the data-generation engine and the deployment-validation surface at once. The model, in their telling, is the artefact that falls out of the fleet, not the other way around. The competitor that invested instead in a larger model and a simulator-only data pipeline is now losing every customer comparison on long-tail reliability and is actively re-architecting around their own data rig, eighteen months too late.

Why the model is no longer the moat

Three forces collapsed model-level differentiation across the year:

Open weights at increasing scale. Several major releases in 2025 cleared the bar for "starting point that is good enough" in three of the five categories. Once that floor is in the open, the moat moves to whatever sits above it.
The cost-to-train cliff. Per-token training cost in physical-AI domains dropped between 2.4× and 11× depending on category. Most of this came from better data, not better hardware — but it transferred to anyone willing to do the data work, which is most serious teams now.
Distillation. The ability to distill a frontier model into a domain-specific 1B-7B param model that runs at deployment speed eroded any moat the larger model carried into a small operator's environment.

The model is the cheap part. Everything around the model is the expensive, defensible part. This is the reverse of the picture five years ago and most of the strategy you read still hasn't caught up.

// section · where the moats aren't

Where the moats
aren't.

p. 27 — 33

Equal weight has to be given to the inverse: the things that everyone is treating as defensible that aren't, and the moats that have already collapsed without the trade press having fully noticed. We list five.

// non-moat one — parameter count

Two years ago this was the most-discussed axis in the field. It is now an afterthought, and rightfully so. Within every category in the Atlas we found at least one model under 7B params that was within deployment tolerance of the 70B-plus state-of-the-art. The relationship between parameter count and customer outcomes was, to a first approximation, zero.

// non-moat two — the public benchmark

The benchmark you read on the leaderboard is, with high probability, contaminated, gameable, or both. We saw the same six tricks used so consistently across leaderboard submissions that we have stopped weighting them in any of our own diligence. None of them are intentional; all of them are structural. We discuss the six tricks, with examples, in the appendix.

// non-moat three — exclusive partnerships

Several deals announced in 2024 as "exclusive partnerships with a frontier lab" did not survive contact with 2025. Either the exclusive expired, the partner shipped a competing first-party product, or the partnership turned out to be a customer relationship dressed in marketing language. We count nine such partnerships announced in 2024; six of them are no longer load-bearing.

// non-moat four — the domain-specific architecture

The "built for X" pitch — a model architecturally specialised for a single domain — was a viable position in 2022. By 2025 it has decayed into a marketing line. The general-purpose backbones now win at most domain tasks once they have access to the same data, and the data is what's actually scarce. We saw three companies built around this pitch fold or pivot in the year.

// non-moat five — being early

Worth noting because it is the most painful one. Several teams we admire built their data rig in 2021–22 and have been overtaken in 2025 by teams that started two years later, learned from public failure modes, and are running on cheaper infra. Being early is a moat only when paired with the willingness to keep tearing your own rig down.

The graveyard of physical-AI startups in 2025 is mostly populated by teams that confused their head start with a moat. — Atlas '26, §05 closing

The leaderboard six

Briefly, the six most common ways a leaderboard number ends up not predicting deployment outcomes:

Test-set leakage via foundation-model pretraining corpora — a known issue, still pervasive, especially in scientific surrogates.
Best-of-N reporting where the N is large and the variance is high — common in generative design.
Eval-pipeline drift: subtle changes to the eval harness between model and baseline that invalidate the comparison.
Prompting asymmetry in multimodal evals, where the foundation-model entry was prompt-engineered and the baseline was not.
Compute-budget asymmetry: an "outperforms baseline" claim where the foundation model used 200× the inference compute and that fact is in a footnote.
Distribution-shift erasure: the eval evaluates on the training distribution, framed as if it evaluated on a held-out one.

None of these is a fraud claim. All of them are structural — the result of a fast-moving field with under-enforced eval norms. The Atlas weights every reported number against our pressure-test of which of these six modes the eval is exposed to.

// section · the deployment gap

The deployment
gap.

p. 34 — 41

Eighty-nine of the 412 models in the Atlas are within five points of state-of-the-art on their headline benchmark. Thirty-eight are running in production. The shortfall — fifty-one models that pass the bar on paper but not in the field — is the single most important number in this report.

We spent more diligence time on this gap than on any other section. The pattern that emerges is consistent enough across categories that we feel confident in a tentative claim: the deployment gap is structural, not residual. It is not a problem that gets smaller with more model effort. It gets smaller with more environment effort, and the field is structurally underinvested in environment.

The shape of the gap

Across the 51 not-shipped-but-good-enough models, the most common cause of the deployment shortfall, in order:

Long-tail failure modes not represented in the eval distribution. The model passes the test and breaks on a Tuesday.
Latency or compute envelope incompatible with the deployment environment. Often discovered after the contract is signed.
Calibration drift: the sensor stack at the customer site is not the sensor stack the model was trained on, and the calibration loop wasn't built.
Liability gap: the model can't be deployed because nobody will sign for what happens when it's wrong.
Integration cost: the model is correct but it would take eighteen months to plumb into the customer's real workflow, which the customer won't pay for.

Note that only the first of these is something the model team could address by being better at modelling. The remaining four are environment-engineering, regulation, and integration problems. Most of the funded teams we tracked have a head of research and no head of deployment.

// SHORTFALL CAUSES — 51 NEAR-SOTA MODELS NOT IN PRODUCTIONn=51

Long-tail failures

Latency / compute

Calibration drift

Liability gap

Integration cost

Other

causes can co-occur — total exceeds 51

Two patterns of crossing the gap

The 38 models in production crossed the gap one of two ways. The split is almost even.

// pattern A — the narrowing

Take a domain you can fence in tightly. A specific warehouse layout, a specific protein family, a specific weather product. Build the data rig and the deployment harness for that fenced-in domain. Ship. Expand later. Twenty-one of the 38 production models followed this pattern. They are smaller in scope than their press suggests; they are more reliable than their competitors.

// pattern B — the wrapping

Take a general model and wrap it in enough environment-engineering — guardrails, fallback policies, human-in-the-loop, replay-debug — that the long tail is contained operationally rather than technically. Seventeen of the 38 followed this pattern. They are riskier than the narrowed competitors but cover much wider use-cases per deploy.

Neither pattern is "correct". The teams that fail are the teams that pick neither — that ship a general-purpose model into a wide deployment with no environment harness and rely on the model's average-case performance to carry them. The model is good on average. Production is a long-tail business.

The bar for shipping is not the average case. It is the worst weekday of the worst week of the year. — Atlas '26, §06 closing

What this means for buyers

If you are a customer evaluating a physical-AI vendor, the questions in the Atlas RFP appendix (§10.3) are the ones that actually predict shipping outcomes. None of them are about the model. All of them are about the environment around the model. We have included a clean copy you can adapt.

// section · twelve calls

Twelve calls
for the year ahead.

p. 42 — 50

Twelve specific predictions for what the next twelve months will and will not do, with our confidence level for each. Past Atlases have tracked roughly sixty-five percent on these calls. Section 8 of this report scores last year's twelve.

#	call	conf
01	The number of physical-AI models in production crosses 100 by Feb '27 (currently 38).	0.78
02	At least two of the four leading embodied-controller programs converge on the same teleop-data format and standardise it publicly.	0.55
03	A scientific-surrogate model becomes the default in at least one regulated engineering process (likely fluid dynamics in aerospace).	0.62
04	Generative-design hit-rates plateau at the model layer; the year's gains come from wet-lab throughput.	0.71
05	Open-weight robotics models approach within 10% of frontier closed models on standard manipulation benchmarks.	0.66
06	The simulation-platform fight (§01) collapses from 7 contenders to 3 by year-end.	0.52
07	At least one major hospital system standardises on a foundation-model triage product.	0.74
08	Per-token training costs in robotics fall another 3× year-on-year.	0.61
09	A high-profile physical-AI failure mode causes a sector-wide insurance-product re-rate.	0.58
10	The deployment gap (§06) does not close — measured as production count over near-SOTA count.	0.69
11	An open eval suite for physical generalisation — multi-lab, regulator-trusted — ships and is adopted by ≥3 of the top 10 programs.	0.41
12	One of the 38 production models in this Atlas is no longer in production by next year's edition.	0.85

Confidence levels are our subjective probabilities based on the diligence behind each claim. Anything below 0.5 we publish for transparency but don't recommend acting on; we publish them because we owe the reader our actual uncertainty rather than our marketing version of it.

The full reasoning behind each call — the operator interviews, the precedents, the priors — is in the companion document. The appendix is longer than this report.

// METHOD · HOW WE SCORE A PREDICTIONp. 50

The brier-score discipline.

Every call is logged with its confidence the day the Atlas ships. Twelve months later we score the set against a simple Brier loss — the squared distance between our stated probability and the realised outcome. We publish the raw scores for every call we have made since 2022. This is not a sophisticated calibration system; it is a public commitment to be embarrassed by our own track record. So far the year-on-year Brier score has trended down. We expect it to stop trending down at some point and we'd rather you watch us catch ourselves than hide it.

// section · what we got wrong

What we got
wrong last year.

p. 51 — 56

Twelve months ago we made twelve calls in the AI Atlas '25. Eight landed, three didn't, one is still ambiguous. The three misses, in detail, with what we should have weighted differently — because the failure modes are more useful than the hits.

Miss 01 — "Open-source frontier robotics will not exist in 2025"

We were wrong. We had it at 0.74 confidence. We underestimated the speed at which a coalition of academic labs and one well-capitalised commercial release would close the gap. The lesson: in fast-moving categories, "X won't exist" calls degrade much faster than "Y will dominate" calls. We are pulling the same shape of call out of this year's set as a result.

Miss 02 — "Generative design hit-rates double in chemistry"

They didn't quite. We had it at 0.66. The actual gain was around 1.4×, depending on how you slice the benchmark. We over-extrapolated from a strong Q4 '24 paper that turned out to depend on an unusually clean wet-lab pipeline that was not widely reproducible. Lesson: a single strong paper at the end of the year is often the high-water mark of that year's recipe, not the start of next year's.

Miss 03 — "At least one autonomous-driving program shuts down"

None did, in the calendar year. We had it at 0.59. One has now begun an orderly wind-down in Q1 '26, which we think tells us something interesting about timing-of-call rather than direction-of-call. Lesson: distinguish between confidence in the direction and confidence in the timing. We mis-merged the two last year.

Hits, briefly

The eight calls we got right are listed in the appendix without commentary; the misses are more useful, so they get the words.

The misses are more useful than the hits. The hits tell you which priors were already correct. The misses tell you which priors are still wrong. — Atlas '26, §08 framing

Two structural lessons we are carrying forward

Negative predictions decay faster than positive ones. "X will not exist" has a denominator of "the entire field of people who could disprove me", and that denominator grows year-on-year. We are downweighting negative-prediction language in this year's calls.
End-of-year results are the riskiest input. A late-year paper is more likely to be a peak than a slope. We are now requiring at least one independent reproduction before a paper enters the trajectory dataset.

// section · closing

Closing
note.

p. 57 — 58

The Atlas is a snapshot of a moving target. By the time you read this it is already a few weeks out of date, and by mid-summer at least three of the load-bearing claims in §03 will need to be revised. We will revise them, in public, and the revisions will live at the same URL as the report. The errata are part of the document.

If the report has a single bias to declare, it is this: we are practitioners, not analysts. We run a research lab and a small venture studio. We invest, we get hired, we ship code, we sit on boards. That gives us access we could not get as outside observers, and it gives us conflicts of interest we have to declare. The disclosure list is in the appendix. Read the report against it.

Where we have skin in the game, we have tried to weight our claims toward what we have actually shipped, not what we hope will happen. Where we have skin in the opposite game, we have tried to flag it. None of this protects you fully from our priors. It is the best we can do; the rest is on the reader.

Acknowledgements

The Atlas exists because eighty-four operators, researchers, and engineers spent time on the phone with us when they had no obligation to. The full credits are in the appendix. A particular thanks to the seven teams who shared unpublished negative results — the data that doesn't make it into a paper is, increasingly, the data that makes the picture true.

We do not pay for participation and we do not accept paid placement. The Atlas is funded entirely by the lab's research budget and by the venture-studio cohort. If you find it useful, the most useful thing you can do is push back where we are wrong. The errata page is open.

If 2024 was the year the demos got real and 2025 was the year the integration costs got real, 2026 is the year the survivors find out which is harder.

We will report back in twelve months.

— uplabs lab team
istanbul · q1 2026

// section · references

References &
reproduction.

p. 59 — 60

// dataset

The full machine-readable dataset of all 412 models, including pressure-test logs, our reproduction notes, and the seven unresolved discrepancies, is published at uplabs.lab/atlas-26/data under CC-BY-SA 4.0. The dataset is versioned; cite the specific version you read.

// errata

We maintain a public errata log at uplabs.lab/atlas-26/errata. Corrections are dated and signed. The errata are part of the document.

// disclosure

UPLABS holds equity in 6 of the 412 models in the dataset (cohort '20 — '25). These are flagged in the dataset with the UPL_HOLDING tag and excluded from any quantitative claim that ranks models against each other. Where excluded, we have noted the exclusion and the result with and without the held entries. The full list of holdings, board seats, and advisory relationships is in the dataset's disclosure.csv.

// citation

@report{uplabs_atlas_2026,
  title = {AI Atlas '26 — the year physical AI ate the roadmap},
  author = {Okafor, M. and Liu, J. and Hartwell, R. and others},
  year = {2026},
  org = {UPLABS Ltd},
  dispatch = {060},
  url = {uplabs.lab/atlas-26}
}

// previous editions

AI Atlas '25 — "the year the demos got real" — feb 2025 — dispatch #048
AI Atlas '24 — "the foundation-model winter that wasn't" — feb 2024 — dispatch #036
AI Atlas '23 — "the model is the easy part" — feb 2023 — dispatch #024
AI Atlas '22 — "before there were models there were rigs" — feb 2022 — dispatch #012

— end of document — UPLABS / DISPATCH 060 / FEB 2026 — 60pp web edition — version 2026.02.04 —

// related dispatches

// RESEARCH · COMPANION

Foundation models for the physical world — a survey of 117 papers.

// ESSAY

The edge of prediction: why every foundation model hits a wall at week six.

get the full
dataset?

The 412-model dataset, the disclosure log, and the errata feed are public — CC-BY-SA. Send us a note if you want the briefing as a deck, or to talk to one of the authors.

request a briefing all dispatches

The yearphysical AIate the roadmap.