Cleartext logocleartext_
AI Briefing

AI Revolution – May 07, 2026

Thursday, May 7, 2026·9:40

AI Revolution – May 07, 2026
9:40·6.1 MB

Enjoy the show? Subscribe to never miss an episode.

Show Notes

AI Revolution – May 07, 2026

Daily AI briefing — frontier models, research, and infrastructure.

🎧 Listen to this episode

Episode Summary

Today's episode covers 9 stories across 6 topic areas, including: OpenAI built a networking protocol with AMD, Broadcom, Intel, Microsoft, and NVIDIA to fix AI supercomputer bottlenecks; Anthropic taps SpaceX's Colossus-1 data center for 220,000 GPUs to power Claude; Google's Gemma 4 AI models get 3x speed boost by predicting future tokens.

Stories Covered

• Infrastructure

OpenAI built a networking protocol with AMD, Broadcom, Intel, Microsoft, and NVIDIA to fix AI supercomputer bottlenecks

The Decoder · May 06 · Relevance: █████████░ 9/10

Why it matters: A new open-source networking protocol (MRC) that enables multi-path GPU communication across 100K+ GPUs with only two switch layers is a fundamental infrastructure breakthrough that could reshape how AI supercomputers are built and reduce both cost and power consumption at scale.

  • MRC sends data across hundreds of paths simultaneously between GPUs, replacing traditional 3-4 switch layer architectures with just 2 layers
  • Developed jointly by OpenAI, AMD, Broadcom, Intel, Microsoft, and NVIDIA as an open-source protocol
  • Already running on OpenAI's Stargate supercomputer and can connect over 100,000 GPUs

📖 Read full article

Anthropic taps SpaceX's Colossus-1 data center for 220,000 GPUs to power Claude

The Decoder · May 06 · Relevance: █████████░ 9/10

Why it matters: Anthropic taking over the entirety of xAI's Colossus-1 facility — 300+ MW and 220K GPUs — is one of the largest single compute deals in AI history and signals the extreme infrastructure arms race among frontier labs, while also creating a surprising cross-pollination between Musk's and Anthropic's AI ecosystems.

  • Anthropic is taking over the full computing capacity of SpaceX/xAI's Colossus-1 data center with 220,000+ NVIDIA GPUs
  • The facility provides over 300 megawatts of compute capacity, expected online within a month
  • Anthropic is also doubling rate limits for Claude Code and significantly raising API limits for Opus models

📖 Read full article

• Model_Release

Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Ars Technica AI · May 06 · Relevance: ████████░░ 8/10

Why it matters: Gemma 4 achieving up to 3x inference speedup through native speculative decoding without quality loss is a significant practical advancement for open-weight models, potentially making high-quality local and edge inference far more viable for production deployments.

  • Gemma 4 uses speculative decoding to predict future tokens, achieving up to 3x faster inference
  • Speed improvement comes with no reported loss in output quality
  • These are open-weight models from Google, making the speedup available to the broader developer community

📖 Read full article

• Industry

DeepSeek could hit $45B valuation from its first investment round

TechCrunch AI · May 06 · Relevance: ████████░░ 8/10

Why it matters: DeepSeek hitting a $45B valuation on its first external funding round validates the efficiency-first approach to model training and underscores that the competitive landscape for frontier AI is now genuinely global, with Chinese labs commanding valuations on par with leading U.S. companies.

  • DeepSeek is reportedly targeting a $45 billion valuation in its first-ever investment round
  • The company gained prominence by training competitive LLMs using a fraction of the compute and cost of U.S. counterparts
  • This would be one of the highest first-round valuations for any AI lab globally

📖 Read full article

• Applications

Claude's new "Dreaming" feature is designed to let AI agents learn from their mistakes

The Decoder · May 07 · Relevance: ███████░░░ 7/10

Why it matters: Anthropic's Dreaming feature introduces asynchronous offline learning for managed agents — reviewing past sessions, deduplicating memory, and distilling insights — which represents a meaningful step toward persistent, self-improving agentic systems that get better over time without retraining.

  • Dreaming is an asynchronous process that reviews past agent sessions and distills new insights
  • Launched alongside Outcomes and Multiagent Orchestration features, both now in public beta
  • Designed to clean up duplicate/outdated memory entries and enable cross-session learning

📖 Read full article

Google Announces GKE Agent Sandbox and Hypercluster at Next '26, Positioning Kubernetes as AI Agent

InfoQ AI/ML · May 07 · Relevance: ███████░░░ 7/10

Why it matters: GKE Agent Sandbox using gVisor kernel isolation at 300 sandboxes per second and Hypercluster managing a million chips from a single control plane represent significant infrastructure for securely deploying AI agents at enterprise scale — Google is the first major cloud provider to offer native agent sandboxing.

  • Agent Sandbox uses gVisor kernel isolation and can spin up 300 sandboxes per second for secure agent code execution
  • Hypercluster manages up to one million chips from a single Kubernetes control plane
  • GKE Agent Sandbox is built as an open-source Kubernetes SIG Apps subproject — first native agent sandbox among major hyperscalers

📖 Read full article

• Policy

US government increases AI suppliers and rethinks Anthropic’s role

AI News · May 06 · Relevance: ███████░░░ 7/10

Why it matters: The Pentagon expanding its approved AI supplier list to include Microsoft, Amazon, Nvidia, and Reflection AI for classified operations signals the rapid institutionalization of frontier AI in defense and the growing competitive dynamics among labs for government contracts.

  • Microsoft, Reflection AI, Amazon, and Nvidia signed agreements allowing their AI products to be used on classified Pentagon operations
  • They join OpenAI, xAI, and Google as approved government AI suppliers
  • Anthropic's role in government AI is being reconsidered

📖 Read full article

The US and China are considering formal talks on AI

The Decoder · May 07 · Relevance: ███████░░░ 7/10

Why it matters: Formal US-China AI talks would mark the first structured bilateral engagement on AI governance between the two leading AI powers, with potential implications for export controls, safety standards, and the trajectory of the global AI race.

  • The US and China are exploring official bilateral talks specifically focused on artificial intelligence
  • Reported by the Wall Street Journal
  • Comes amid ongoing tensions over chip export controls and AI competition between the two nations

📖 Read full article

• Research

AI models follow their values better when they first learn why those values matter

The Decoder · May 07 · Relevance: ███████░░░ 7/10

Why it matters: This Anthropic Fellows research finding that teaching models the rationale behind values before behavioral training significantly improves generalization to novel situations has direct implications for alignment methodology and could change how safety teams approach model training curricula.

  • Training models on texts explaining intended values before behavioral training leads to significantly better value adherence
  • Improved generalization extends to situations never encountered during training
  • Research conducted through the Anthropic Fellows Program

📖 Read full article


Further Reading


Full Transcript

Click to expand full episode transcript

Sam: OpenAI and a coalition of major hardware and software companies just published an open-source networking protocol called MRC. It lets a cluster of over 100,000 GPUs communicate using only two switch layers instead of the three or four that conventional fat-tree architectures require. That's not a minor optimization — the number of switch layers directly determines network depth, latency, cost, and power draw at scale. It's already running on OpenAI's Stargate supercomputer.

Priya: Welcome to AI Revolution for Thursday, May 7th, 2026. I'm Priya Nair, here with Sam Kim. Today we're deep in infrastructure — two stories that show just how aggressively compute capacity is being built out right now. We've also got Gemma 4 doing something genuinely interesting with inference speed, a new learning mechanism for Claude agents, some alignment research worth paying attention to, and a handful of industry and policy developments. Let's get into it.

Sam: So let's start with MRC, because the technical idea here is worth unpacking. Traditional GPU cluster networking uses a fat-tree topology — you have spine switches, aggregation switches, and top-of-rack switches. Each hop adds latency and cost, and the more GPUs you're connecting, the more switch layers you need to avoid bottlenecks. The standard answer has been three or four layers for anything approaching 100K GPUs.

Priya: And the reason that matters is that GPU utilization in training runs is heavily dependent on how fast gradients and activations can move between accelerators. If your network is the bottleneck, your GPUs are waiting. And at 100K+ GPUs, even small inefficiencies compound.

Sam: Right. What MRC does differently is multi-path routing — it sends data across hundreds of paths simultaneously rather than routing packets down a single path through the hierarchy. By spreading traffic that broadly, you can flatten the topology. Two switch layers becomes sufficient because you're not creating hot spots on any single path. The bandwidth aggregates across all those parallel routes.

Priya: And the fact that this is open-source, with AMD, Broadcom, Intel, Microsoft, and NVIDIA all involved, matters a lot. This isn't one vendor's proprietary fabric. If this gets adopted broadly, it could affect how every large AI cluster gets built — and the power savings from eliminating a whole switch layer at scale are meaningful. We're talking about facilities drawing hundreds of megawatts.

Sam: Which brings us directly to the second infrastructure story. Anthropic is taking over the entire Colossus-1 data center — the one SpaceX built for xAI in Memphis. We're talking 220,000 NVIDIA GPUs, over 300 megawatts of capacity, expected online within a month.

Priya: The sheer scale of this is worth sitting with. 300 megawatts is roughly the output of a mid-sized power plant, dedicated to running one lab's models. And Anthropic says they're immediately using this to double rate limits for Claude Code and significantly raise API limits on Opus. So this isn't a build-for-the-future story — it's addressing a current capacity crunch.

Sam: The cross-company dimension is also notable. Anthropic leasing xAI's facility is a pragmatic compute deal that cuts across what you'd normally think of as competitive boundaries in this space. It tells you something about how tight GPU supply still is — you take capacity where you can get it.

Priya: Let's move to Gemma 4, because the speculative decoding story is one I've been waiting to see land in open-weight models at this scale. Sam, can you walk through what speculative decoding actually is?

Sam: Sure. Standard autoregressive inference generates one token at a time — the model does a full forward pass to produce each token sequentially. Speculative decoding changes this by using a smaller, faster draft model to predict several tokens ahead, then using the full model to verify those predictions in parallel. If the predictions are right, you've effectively generated multiple tokens per full model pass. If they're wrong, you fall back. The math works out to significant speedups when the draft model's accuracy is high enough.

Priya: And the key word there is "native" — Google baked this into Gemma 4's architecture rather than bolting it on afterward. That's what makes the three-times speedup credible. The draft model and the verification model were co-designed.

Sam: Three times is at the high end of what you'd expect — you typically see one-and-a-half to two-and-a-half in practice depending on the task. The claim of no quality loss is plausible if the draft model is well-calibrated, but that's something developers will need to validate on their specific workloads. The broader point is that this substantially changes the economics of running open-weight models locally or at the edge. You can now get throughput that was previously only achievable with larger serving infrastructure.

Priya: And open-weight means developers can actually inspect and customize this, which matters for production deployments where you want predictability.

Sam: Shifting to the Claude agent updates — Anthropic announced something they're calling Dreaming, which is an asynchronous process that runs after agent sessions end. It reviews what happened, cleans up redundant or outdated memory entries, and distills new insights that carry forward into future sessions.

Priya: The analogy to biological consolidation is obvious and probably intentional. The idea is that agents currently forget everything between sessions, or they accumulate memory stores that get noisy over time. Dreaming is a maintenance pass that curates that memory. It shipped alongside Outcomes — which lets agents evaluate their own task performance — and Multiagent Orchestration, now in public beta.

Sam: The combination of those three is interesting architecturally. Outcomes gives the agent a signal about whether it succeeded. Dreaming uses that signal to update what it remembers and how it weights future decisions. Orchestration lets multiple agents coordinate. That's a fairly complete loop for persistent, improving agentic systems — without needing to retrain the underlying model.

Priya: Though we should be clear that this is learned behavior within the memory layer, not weight updates. The base model isn't changing. The question for practitioners is how well the distillation step actually generalizes and whether the memory consolidation introduces new failure modes.

Sam: There's also a piece of alignment research out of the Anthropic Fellows Program that connects to this. The finding is that training models on texts explaining the rationale behind intended values — before training them on specific behaviors — leads to significantly better adherence to those values in novel situations. Not just in distribution, but in cases the model was never explicitly trained on.

Priya: This is a meaningful result for alignment methodology. The conventional approach has been behavioral: you show the model examples of good behavior, it learns the pattern. What this suggests is that understanding the "why" behind a value gives the model something it can actually reason from when it encounters an edge case.

Sam: It's analogous to how you'd want to train a human engineer. Teaching them principles produces more reliable behavior than just teaching them rules, because rules don't cover every case. The question is whether this scales and whether the effect holds under adversarial pressure — but it's a concrete, empirical finding that safety teams can act on.

Priya: Quick note on DeepSeek: the lab is reportedly targeting a $45 billion valuation in its first external funding round. To put that in context, they came to prominence in early 2025 training competitive frontier models at a fraction of the compute cost of US counterparts. A $45 billion first-round valuation would put them in the same tier as established US frontier labs. The efficiency-first approach has real market validation now.

Sam: On the policy front, two things worth flagging. The Pentagon has added Microsoft, Amazon, NVIDIA, and Reflection AI to its approved supplier list for classified operations — they join OpenAI, xAI, and Google. Reflection AI is notable because they haven't released a publicly available model yet, but they've apparently cleared the bar for classified use. Anthropic's government role is described as being reconsidered, which is a conspicuous data point given the Colossus deal and rate limit expansions they're focused on.

Priya: And separately, the US and China are reportedly exploring formal bilateral talks specifically on AI, per the Wall Street Journal. This would be the first structured diplomatic engagement between the two countries focused specifically on AI governance. Given where export controls and chip restrictions stand, even getting to a formal table would be significant. Nothing is confirmed yet, but it's worth watching.

Sam: One more technical item before we look ahead — Google announced at Cloud Next that GKE is getting an Agent Sandbox built on gVisor kernel isolation. It can spin up 300 sandboxes per second, which is the right order of magnitude for dynamic agent workloads. The companion announcement is Hypercluster, which manages up to a million chips from a single Kubernetes control plane.

Priya: The sandbox piece matters for anyone thinking seriously about deploying agents in production. Agents that execute code need real isolation — not container-level, but kernel-level. gVisor gives you that with much lower overhead than a full VM. The fact that this is open-source and being proposed as a Kubernetes SIG subproject means it could become infrastructure that the whole ecosystem builds on.

Sam: So stepping back — today feels like a day where infrastructure is doing a lot of the talking. MRC flattens the network layer for massive GPU clusters. Anthropic absorbs 220,000 GPUs to address an immediate capacity constraint. Google is building out both the compute management layer and the secure execution layer for agents.

Priya: The thread I keep pulling on is: what does this level of infrastructure investment unlock that wasn't possible six months ago? The answer seems to be longer-running, more persistent agents with more reliable memory — and inference that's fast and cheap enough to use those agents interactively at scale.

Sam: The questions I'm watching: does MRC get adopted outside of Stargate, or does it stay proprietary in practice despite being open-source? How quickly does the Colossus capacity translate into observable improvements in Claude's capabilities and availability? And on the alignment research — does the "teach the why first" finding replicate across different model families?

Priya: And on the geopolitical side, whether those US-China AI talks actually materialize into formal talks is worth following closely. The structural tensions around compute access and safety standards haven't gone away.

Sam: That's going to do it for today. Thanks for listening to AI Revolution. Show notes and links to everything we covered today are at cleartext.fm. We'll be back tomorrow.

Priya: See you then.


AI Revolution is an automated daily podcast covering AI advancements. Generated 2026-05-07.

Sources: MIT Technology Review, VentureBeat AI, The Verge, Wired, TechCrunch AI, Ars Technica, IEEE Spectrum, The Decoder, The Gradient, Hugging Face Blog, Google AI Blog, AI News, SemiAnalysis, and The Register.