AI Revolution – May 12, 2026

Daily AI briefing — frontier models, research, and infrastructure.

Episode Summary

Today's episode covers 8 stories across 5 topic areas, including: Thinking Machines Lab ships its first model and argues interactivity is what OpenAI gets wrong about voice; Your Next AI Query May Travel Where the Power Is; Baidu's Ernie 5.1 cuts 94 percent of pre-training costs while competing with top models.

Stories Covered

• Model_Release

Thinking Machines Lab ships its first model and argues interactivity is what OpenAI gets wrong about voice

The Decoder · May 12 · Relevance: ████████░░ 8/10

Why it matters: Mira Murati's Thinking Machines debuts its first 'interaction model' that processes audio, video, and text in parallel 200ms chunks — a fundamentally different architecture from the turn-based approach used by GPT Realtime and Gemini Live. This could redefine real-time multimodal AI if the latency and quality claims hold up.

Thinking Machines Lab (founded by ex-OpenAI CTO Mira Murati) has shipped its first model
The model processes audio, video, and text simultaneously in 200-millisecond chunks rather than using turn-based conversation
Targets OpenAI's GPT Realtime 2 and Google's Gemini Live on interaction quality

📖 Read full article

Baidu's Ernie 5.1 cuts 94 percent of pre-training costs while competing with top models

The Decoder · May 11 · Relevance: ████████░░ 8/10

Why it matters: Ernie 5.1's 'Once-For-All' training approach — extracting multiple sub-models from a single training run at 6% of comparable pre-training cost and one-third the parameters — is a potentially significant efficiency breakthrough that challenges the scaling-maximalist paradigm if the benchmark results hold.

Ernie 5.1 uses one-third the parameters of its predecessor and cost only 6% of comparable model pre-training costs
Uses a 'Once-For-All' approach that extracts smaller sub-models from a single training run
Ranks 4th globally on Search Arena leaderboard, behind two Claude Opus variants and GPT-5.5 Search

📖 Read full article

• Infrastructure

Your Next AI Query May Travel Where the Power Is

IEEE Spectrum AI · May 12 · Relevance: ████████░░ 8/10

Why it matters: Nvidia's pilot to build ~25 micro data centers (5-20MW each) at utility substations and dynamically shift inference workloads based on power availability represents a novel distributed compute architecture that could reshape how AI infrastructure scales around energy constraints.

Nvidia is piloting approximately 25 micro data centers (5-20MW each) co-located with utility substations across 5 US utilities
Compute workloads shift dynamically based on real-time power availability and grid conditions
Nvidia is partnering with InfraPartners to develop the fleet, with construction planned for later this year

📖 Read full article

Data center guzzled 30 million gallons of water, and nobody noticed for months

Ars Technica AI · May 11 · Relevance: ██████░░░░ 6/10

Why it matters: A data center consuming 30 million gallons of water undetected for months underscores the growing and poorly monitored environmental costs of AI infrastructure expansion, adding fuel to regulatory and community pushback against data center siting.

A data center consumed 30 million gallons of water without initially being detected or paying
The incident went unnoticed for months
Highlights inadequate monitoring infrastructure for data center resource consumption

📖 Read full article

• Industry

Nvidia pumps over 40 billion dollars into AI partners so far in 2026

The Decoder · May 11 · Relevance: ████████░░ 8/10

Why it matters: Nvidia investing $40B+ into AI partners in 2026 alone cements its transformation from GPU vendor to the AI ecosystem's dominant financial backer and kingmaker, creating deep vertical integration that competitors will struggle to match.

Nvidia has invested more than $40 billion in AI companies in 2026 so far
This positions Nvidia as the single largest financial backer in the AI industry
The investments span Nvidia's partner ecosystem, deepening its influence beyond hardware

📖 Read full article

OpenAI's DeployCo subsidiary adopts Palantir's playbook, building a moat from workflows no lab can simulate

The Decoder · May 11 · Relevance: ███████░░░ 7/10

Why it matters: OpenAI creating a majority-controlled consulting subsidiary to embed AI into enterprise workflows mirrors Palantir's integration-as-moat strategy and signals that frontier labs see enterprise deployment — not just model capability — as the key competitive battleground.

DeployCo is a majority-controlled OpenAI subsidiary focused on enterprise AI integration
The unit operates as a consulting and implementation business, similar to Palantir's approach
Strategy aims to build competitive moat through deep workflow integration rather than model superiority alone

📖 Read full article

• Applications

OpenAI just released its answer to Claude Mythos

The Verge · May 11 · Relevance: ███████░░░ 7/10

Why it matters: OpenAI's Daybreak initiative uses its Codex Security agent to automate threat modeling, vulnerability discovery, and detection — a significant move toward AI-driven proactive security that could change how organizations approach AppSec at scale.

Daybreak builds on OpenAI's Codex Security AI agent launched in March
The system creates threat models from an organization's codebase, validates likely vulnerabilities, and automates detection
Positioned as OpenAI's competitive response to Anthropic's Claude Mythos security capabilities

📖 Read full article

• Policy

The EU wants to regulate AI but needs OpenAI and Anthropic to let regulators through the door

The Decoder · May 11 · Relevance: ███████░░░ 7/10

Why it matters: The divergence between OpenAI granting EU access to GPT-5.5 Cyber for security review while Anthropic resists access to Mythos exposes a critical structural weakness in the EU AI Act's enforcement — regulators remain dependent on voluntary cooperation from the very companies they regulate.

OpenAI has offered EU Commission direct access to GPT-5.5 Cyber for security review
Anthropic has not provided regulator access to its Mythos model after 4-5 meetings
The gap highlights EU AI Act enforcement depends heavily on voluntary cooperation from frontier labs

📖 Read full article

Full Transcript

Click to expand full episode transcript

Sam: Mira Murati's new lab shipped its first model today, and the architecture is worth understanding. Most voice AI systems — GPT Realtime, Gemini Live — operate on a turn-based model. You speak, the system detects end-of-turn, processes your input, generates a response, plays it back. That sequential pipeline is why there's always a beat of latency, and why interrupting feels awkward. Thinking Machines Lab is doing something different: processing audio, video, and text simultaneously in 200-millisecond chunks, continuously, without waiting for a turn boundary. That's a fundamentally different interaction model, and if the latency and quality claims hold up in real conditions, it changes what real-time multimodal AI actually feels like to use.

Priya: Welcome to AI Revolution for Tuesday, May 12, 2026. I'm Priya Nair, here with Sam Kim. Today we're digging into that Thinking Machines debut and what the architecture actually implies, Baidu's surprisingly efficient Ernie 5.1 and the training technique behind it, Nvidia's distributed micro data center pilot, and a few stories on how the big labs are repositioning — OpenAI's Daybreak security initiative, the DeployCo consulting play, and some uncomfortable questions about EU AI regulation. Let's get into it.

Sam: So let's stay with Thinking Machines for a minute because I think the architectural distinction is worth really unpacking. The turn-based approach isn't just a design choice — it's load-bearing. When you wait for end-of-turn detection, you know the full utterance before you generate. You can do proper ASR, run it through your language model, generate a complete response. The tradeoff is that the system is fundamentally reactive. Chunked parallel processing flips that. You're running inference continuously on overlapping windows of sensory input — audio, video, text — and producing output in an ongoing stream. The challenge is that you're essentially doing online inference without the luxury of a complete context. You need the model to be coherent across chunks without seeing the full input first.

Priya: Which is a genuinely hard problem. It's closer to how humans actually process conversation — we're not waiting for someone to stop talking before we start forming a response — but replicating that in a model architecture requires solving for state continuity across those 200-millisecond windows. How do you maintain context? How do you handle interruptions gracefully without losing the thread of what was being said?

Sam: Right, and we don't have full details on how they're solving that. The 200ms chunk size is interesting — that's roughly the lower bound of human auditory processing for speech perception, so it's not arbitrary. But the real test is going to be in degraded conditions: background noise, overlapping speech, low-bandwidth video. Turn-based systems at least have the advantage of processing a clean, complete segment. Chunked systems have to be more robust to partial information. So this is genuinely promising, but I'd want to see independent evaluation before drawing strong conclusions.

Priya: And the competitive context matters here. OpenAI and Google have had a year-plus head start on real-time voice. Murati's team is claiming they've identified a structural limitation in how those systems work, not just a capability gap. Whether that framing holds up is going to become clear pretty quickly once people start putting the model through its paces.

Sam: Moving to Baidu's Ernie 5.1, because the efficiency numbers here are striking and the technique is worth explaining. One-third the parameters of its predecessor, six percent of comparable pre-training costs. The mechanism is something they're calling Once-For-All, and it's a clever idea. Rather than training separate models at different scales — which is what most labs do when they want a family of models — you train a single large model in a way that allows you to extract sub-models from it at different parameter counts. The sub-models share weights with the parent model and are essentially carved out during training rather than trained independently.

Priya: So you get multiple models for roughly the cost of one training run.

Sam: Roughly, yes. The technique has roots in earlier neural architecture search work — there was a paper called Once-for-All from MIT back in 2020 that explored this for efficient inference on edge devices — but applying it at this scale for frontier model training is a different thing entirely. The benchmark results are interesting: fourth on the Search Arena leaderboard, behind two Claude Opus variants and GPT-5.5 Search. That's a credible position, and if you got there at six percent of the training cost, that's a meaningful signal about the efficiency of the approach.

Priya: The caveat I'd apply is that Search Arena is one benchmark, and Baidu is self-reporting the cost figures. We don't have independent verification of either the training cost claims or the architectural details. But even directionally, if Once-For-All training produces competitive models at a fraction of the compute, that has real implications for who can afford to be in this game.

Sam: Nvidia's distributed inference pilot is a story I find genuinely interesting from an infrastructure design standpoint. The basic idea: instead of building massive centralized data centers and then figuring out how to power them, build roughly 25 smaller facilities — five to twenty megawatts each — co-located with utility substations, and dynamically route inference workloads to wherever power is available on the grid at a given moment.

Priya: The power constraint is real. Large data center builds have been stalling for years because you can't get the grid interconnect capacity fast enough. A 500-megawatt campus takes years to permit and connect. A five-megawatt facility at an existing substation is a completely different procurement and construction timeline.

Sam: And the dynamic routing piece is where this gets technically interesting. Inference workloads — unlike training — are actually amenable to this kind of geographic distribution. Training requires tight synchronization across accelerators; you can't easily split a training run across facilities with network latency between them. But inference requests are largely stateless and independent. You can route a query to wherever compute is available without much penalty, as long as the latency to the end user is acceptable.

Priya: The engineering challenge is the orchestration layer. You need to know, in real time, which facilities have available capacity and available power, and route accordingly, while keeping response latency within user-acceptable bounds. That's a non-trivial distributed systems problem. But it's a solvable one, and Nvidia has both the hardware and the software stack to attempt it.

Sam: The water story from Ars Technica connects here. A data center consumed 30 million gallons of water for cooling before anyone noticed or paid for it. That's the kind of externality that's starting to generate real regulatory and community pushback on data center siting. Distributed, smaller facilities don't automatically solve the water problem, but they do distribute the footprint in ways that might be more politically viable than another gigawatt campus in a water-stressed region.

Priya: Let's do the OpenAI stories quickly because there are two and they're related. Daybreak is their security-focused initiative built on the Codex Security agent — it ingests your codebase, builds a threat model, identifies likely attack paths, validates vulnerabilities, and automates detection. This is the same category of work Anthropic's Mythos is doing. The underlying capability is code understanding at scale combined with security domain knowledge, applied proactively rather than reactively.

Sam: The interesting architectural question is how much of the threat modeling is emergent from the model's general code understanding versus fine-tuned on security-specific data. Both probably. But the practical implication is that organizations can now run something close to continuous automated red-teaming against their own codebase. Whether the false positive rate is manageable in production is going to determine whether this is actually useful or just another alert-generator.

Priya: DeployCo is OpenAI's Palantir move — majority-controlled subsidiary, consulting and implementation focus, designed to embed deeply into enterprise workflows. The strategic logic is straightforward: model capability is increasingly commoditized, but institutional knowledge of how a specific organization's data and processes work is not. If you've spent six months integrating into someone's workflow, you've built a switching cost that no model release can easily overcome.

Sam: And the EU regulation story is worth flagging briefly because it illustrates a structural problem. OpenAI has offered EU regulators direct access to GPT-5.5 Cyber. Anthropic, after four or five meetings about Mythos, still hasn't provided access. The AI Act's enforcement model relies heavily on voluntary cooperation from the companies it's trying to regulate. When companies decline to cooperate, regulators don't currently have strong compulsory mechanisms. OpenAI's cooperation here is probably strategic — you get credit for transparency while your competitor's non-cooperation becomes a story. But the underlying governance gap is real.

Priya: Nvidia's $40 billion in partner investments so far in 2026 — that's the headline number and it speaks for itself in terms of how Nvidia is positioning. Hardware vendor to ecosystem financier. When you're the largest financial backer in the industry you're also supplying the compute for, the strategic leverage compounds.

Sam: So what does today actually point toward? The Thinking Machines architecture and the Ernie efficiency story are pointing in the same direction, from different angles — the assumption that bigger, more expensive, turn-based systems are the ceiling is getting challenged. Chunked parallel processing is a bet that interaction quality matters as much as benchmark scores. Once-For-All training is a bet that you can get competitive performance at a fraction of the compute cost.

Priya: The infrastructure story is pointing toward a world where AI compute is geographically distributed in ways that track energy availability rather than fiber concentration. That's a different topology than we've had, and it has implications for latency, regulation, and who can build in which markets.

Sam: The question I'm watching: how quickly can independent researchers validate these efficiency and interaction quality claims? Baidu's cost figures and Thinking Machines' latency claims are both self-reported right now. The field tends to replicate fast when the techniques are real. If Once-For-All is as effective as claimed, we should see other labs experimenting with variants within months.

Priya: And on the governance side — the EU situation is an early test of whether voluntary cooperation is a durable enforcement model. I suspect the answer is no, and we'll see that play out over the next year as more capable models get deployed in regulated contexts.

Sam: That's what we've got for today. Show notes and links to every story we covered are at cleartext.fm. We'll be back tomorrow.

Priya: Thanks for listening.

AI Revolution is an automated daily podcast covering AI advancements. Generated 2026-05-12.

Sources: MIT Technology Review, VentureBeat AI, The Verge, Wired, TechCrunch AI, Ars Technica, IEEE Spectrum, The Decoder, The Gradient, Hugging Face Blog, Google AI Blog, AI News, SemiAnalysis, and The Register.

AI Revolution – May 12, 2026

Show Notes

AI Revolution – May 12, 2026

Episode Summary

Stories Covered

• Model_Release

Thinking Machines Lab ships its first model and argues interactivity is what OpenAI gets wrong about voice

Baidu's Ernie 5.1 cuts 94 percent of pre-training costs while competing with top models

• Infrastructure

Your Next AI Query May Travel Where the Power Is

Data center guzzled 30 million gallons of water, and nobody noticed for months

• Industry

Nvidia pumps over 40 billion dollars into AI partners so far in 2026

OpenAI's DeployCo subsidiary adopts Palantir's playbook, building a moat from workflows no lab can simulate

• Applications

OpenAI just released its answer to Claude Mythos

• Policy

The EU wants to regulate AI but needs OpenAI and Anthropic to let regulators through the door

Further Reading

Full Transcript