AI Revolution – June 03, 2026
Wednesday, June 3, 2026·10:06
Enjoy the show? Subscribe to never miss an episode.
Show Notes
AI Revolution – June 03, 2026
Daily AI briefing — frontier models, research, and infrastructure.
Episode Summary
Today's episode covers 8 stories across 5 topic areas, including: Build 2026: Microsoft tops Google in image generation while playing catch-up on reasoning; Anthropic scales Claude Mythos to critical infrastructure in 15+ countries; Microsoft offers devs a better way to control AI agent behavior.
Stories Covered
• Model_Release
Build 2026: Microsoft tops Google in image generation while playing catch-up on reasoning
The Decoder · Jun 03 · Relevance: ████████░░ 8/10
Why it matters: Microsoft releasing seven in-house AI models including its first reasoning model signals a major shift away from OpenAI dependency and introduces a new competitive vector in the frontier model race. The new tuning method and autonomous background agent expand the agentic capability surface that engineering teams will need to evaluate.
- Microsoft announced seven new in-house AI models at Build 2026, including its first reasoning model
- Microsoft's image generation benchmark surpasses Google's current offerings
- A new autonomous background agent and novel tuning method were also introduced
Perplexity announces hybrid AI system that decides what runs locally or in the cloud
The Decoder · Jun 03 · Relevance: ███████░░░ 7/10
Why it matters: An intelligent orchestration layer that dynamically routes inference between local and cloud models based on task characteristics is a significant architectural pattern — it addresses latency, privacy, and cost trade-offs simultaneously and could set a template for hybrid edge-cloud AI deployment.
- Perplexity's orchestrator automatically determines whether a task should run on a local model or a cloud model
- The system combines on-device AI with powerful cloud backends in a single user-facing interface
- The approach has direct implications for privacy-sensitive workloads that benefit from local execution
Holo3.1: Fast & Local Computer Use Agents
Hugging Face Blog · Jun 02 · Relevance: ██████░░░░ 6/10
Why it matters: A fast, locally-runnable computer use agent model is technically significant because it lowers the barrier to deploying autonomous desktop automation without cloud dependency, expanding the attack surface and deployment scenarios for agentic AI on end-user hardware.
- Holo3.1 is a computer use agent model optimized for speed and local execution
- Released via Hugging Face, indicating open or accessible model weights
- Targets on-device agentic automation without requiring cloud inference
• Applications
Anthropic scales Claude Mythos to critical infrastructure in 15+ countries
TechCrunch AI · Jun 02 · Relevance: ████████░░ 8/10
Why it matters: Deploying an AI model (Claude Mythos Preview) at scale to hunt vulnerabilities in power, water, healthcare, and communications infrastructure across 150 organizations is a landmark real-world security application — over 10,000 serious flaws already found. This represents a new operational model for AI-assisted critical infrastructure defense.
- Project Glasswing now includes 150 partner organizations across 15+ countries
- Partners have collectively found over 10,000 serious vulnerabilities in critical infrastructure
- Target sectors include power, water, healthcare, and communications affecting up to 100 million people
OpenAI expands Codex with role-specific plugins to build a general-purpose app for non-developers
The Decoder · Jun 02 · Relevance: ██████░░░░ 6/10
Why it matters: OpenAI's pivot of Codex toward non-developer white-collar workers — with 5M weekly users and 1-in-5 being non-developers — signals a strategic expansion from developer tooling to general enterprise automation, with role-specific AI agents for finance, sales, and analytics entering production workflows.
- Five million users per week use Codex; 20% are non-developers, growing 3x faster than the developer segment
- Six new role-specific plugins launched covering data analytics, sales, creative, product design, equity investing, and investment banking
- OpenAI is positioning Codex as a general-purpose work application, not just a coding assistant
• Research
Microsoft offers devs a better way to control AI agent behavior
TechCrunch AI · Jun 02 · Relevance: ███████░░░ 7/10
Why it matters: Portable policy files for defining agent behavior across developer, compliance, and security teams address one of the core governance gaps in agentic AI deployment. This specification could become a foundational primitive for enterprise AI safety and auditability.
- Microsoft introduced a specification enabling portable policy files to govern AI agent behavior
- Policy control spans developer, compliance, and security team roles
- Announced at Build 2026 as part of Microsoft's broader agentic AI push
• Industry
Coralogix raises $200M on bet that someone needs to watch the AI agents
TechCrunch AI · Jun 03 · Relevance: ███████░░░ 7/10
Why it matters: A $200M Series F at a $1.6B valuation for an AI agent observability platform reflects growing enterprise demand for monitoring infrastructure as autonomous agents proliferate in production environments. This funding validates the observability/monitoring layer as a critical and distinct segment of the AI stack.
- Coralogix raised $200M in a Series F round, valuing the company at $1.6 billion
- The raise comes less than a year after its previous funding round
- The company is positioning itself as the monitoring and observability layer for AI agents
• Policy
Publishers will be able to opt out of AI Search, thanks to new regulation
TechCrunch AI · Jun 03 · Relevance: ███████░░░ 7/10
Why it matters: UK regulators mandating a publisher opt-out mechanism for generative AI search sets a precedent for data sourcing governance that could reshape training data pipelines and web crawling practices globally once the rollout extends beyond the UK. This is a concrete regulatory action with structural implications for AI search products.
- UK regulators are requiring Google to provide a tool for publishers to opt out of generative AI search features
- The opt-out mechanism will be piloted in the UK before a global rollout
- This is a regulatory mandate, not a voluntary commitment, giving it real enforcement weight
Further Reading
- • Build 2026: Microsoft tops Google in image generation while playing catch-up on reasoning — The Decoder
- • Anthropic scales Claude Mythos to critical infrastructure in 15+ countries — TechCrunch AI
- • Microsoft offers devs a better way to control AI agent behavior — TechCrunch AI
- • Coralogix raises $200M on bet that someone needs to watch the AI agents — TechCrunch AI
- • Perplexity announces hybrid AI system that decides what runs locally or in the cloud — The Decoder
- • Publishers will be able to opt out of AI Search, thanks to new regulation — TechCrunch AI
- • Holo3.1: Fast & Local Computer Use Agents — Hugging Face Blog
- • OpenAI expands Codex with role-specific plugins to build a general-purpose app for non-developers — The Decoder
Full Transcript
Click to expand full episode transcript
Sam: Microsoft just shipped seven in-house AI models at Build 2026, including their first reasoning model. That's worth pausing on. For years, Microsoft's AI strategy was essentially "we fund OpenAI, we integrate OpenAI." Now they're building their own frontier-class models, and on image generation specifically, they're claiming benchmark results that beat Google. Meanwhile, they introduced a portable policy specification for governing agent behavior — which honestly might matter more than the models themselves. We'll get into all of it.
Priya: Welcome to AI Revolution for Wednesday, June 3rd, 2026. I'm Priya Nair.
Sam: And I'm Sam Kim.
Priya: We've got a packed show today. Microsoft's big Build announcements — both the models and the governance tooling. Anthropic scaling their vulnerability hunting program to critical infrastructure in fifteen-plus countries. Perplexity's hybrid local-cloud orchestration system. A new UK regulation forcing opt-out mechanisms for AI search. And a couple of quick hits on observability funding and OpenAI's Codex expansion. Let's get into it.
Sam: So, Microsoft Build 2026. Seven new in-house models. Let's talk about what that actually means architecturally. Microsoft has been building smaller models under the Phi brand for a while — Phi-2, Phi-3, Phi-4 — these were efficient models that punched above their weight class. But a reasoning model is a different beast. Reasoning models like OpenAI's o-series or DeepSeek's R1 use what's called chain-of-thought inference — they spend more compute at inference time, essentially "thinking longer" before answering. Microsoft building their own reasoning model means they've developed the reinforcement learning pipeline and the reward modeling infrastructure to train this kind of system independently.
Priya: And the strategic implications here are significant. Microsoft has poured over thirteen billion dollars into OpenAI. They've built their Copilot products on top of OpenAI's API. Now they're building competitive models in-house. That gives them negotiating leverage, it gives them fallback options, and it means they can optimize models specifically for their product surfaces — Windows, Office, Azure — rather than taking general-purpose models and bolting them on.
Sam: The image generation piece is interesting too. They're claiming they've surpassed Google's current offerings on standard benchmarks. I'd want to see independent evaluations before taking that at face value, but Microsoft has been investing heavily in multimodal research. If these models are genuinely competitive at image generation while also offering a reasoning model, that's a broad portfolio they can cross-optimize.
Priya: The other piece from Build that I think deserves its own segment is the agent policy specification. Microsoft introduced a way to define portable policy files that govern how AI agents behave. Think of it like this: right now, if you deploy an agent in your organization, the rules about what it can and can't do are usually hardcoded into the application logic or configured per-deployment. There's no standard way to say "here are the constraints this agent operates under" in a format that different teams can read, audit, and modify.
Sam: Right. And what Microsoft is proposing is essentially a declarative policy layer. Developer teams define what the agent can do technically. Compliance teams define what it's allowed to do legally. Security teams define what it should never do from a risk perspective. And these policies live in portable files that travel with the agent across environments.
Priya: This matters because the governance gap is one of the biggest blockers to enterprise agent deployment. If you're a large organization and you want to deploy autonomous agents that can take actions — book meetings, modify databases, send emails on behalf of employees — you need to be able to express and enforce guardrails in a way that's auditable. A specification for this is a foundational primitive. It's like how RBAC — role-based access control — became standard for human users. We need an equivalent for agents.
Sam: Whether Microsoft's specific proposal becomes the standard is an open question. But someone needed to propose one, and doing it at Build with their developer ecosystem behind it gives it momentum.
Priya: Let's move to Anthropic. Project Glasswing has expanded to 150 partner organizations across fifteen-plus countries, and they're using Claude Mythos to hunt vulnerabilities in critical infrastructure — power grids, water systems, healthcare networks, communications.
Sam: The numbers here are striking. Over ten thousand serious vulnerabilities found so far. And we're talking about infrastructure that collectively serves up to a hundred million people. What Mythos is doing is essentially automated security auditing at a scale that human red teams can't match. It's scanning codebases, network configurations, system architectures, and identifying exploitable flaws.
Priya: What makes this different from traditional vulnerability scanning tools? Those already exist — Nessus, Qualys, dozens of others.
Sam: Good question. Traditional scanners work from known vulnerability signatures. They check if you're running a version of software with a known CVE, or if common misconfigurations are present. What a frontier model like Mythos can do is reason about novel vulnerability patterns. It can look at how systems interact and identify logical flaws that wouldn't match any existing signature. It can understand application logic, not just configuration state. The ten thousand flaws found — I'd be very curious to know what fraction of those were already catalogued versus genuinely novel findings.
Priya: The scale of the deployment is also notable. Fifteen-plus countries, critical infrastructure sectors. Anthropic is essentially building a track record of real-world security impact that goes well beyond benchmark performance. If you're evaluating whether AI can meaningfully improve your security posture, this is the most concrete evidence we've seen deployed at this scale.
Sam: And it establishes an operational model where an AI company partners directly with infrastructure operators. That's different from selling an API — it's closer to a managed security service powered by frontier AI.
Priya: Let's talk about Perplexity's hybrid orchestration system. They've announced an architecture where an intelligent layer decides whether a given task should run on a local model on your device or get routed to a powerful cloud model.
Sam: This is an architectural pattern I think we'll see a lot more of. The core idea is that not every query needs a two-trillion-parameter model running in a data center. If you ask "what time is it in Tokyo," a small on-device model handles that fine. If you ask "analyze this research paper and compare it to these three prior works," you need the big cloud model. Perplexity is building an orchestrator that makes that routing decision automatically.
Priya: The privacy angle is the one that jumps out to me. If sensitive queries — anything involving personal data, medical information, financial details — can be handled locally without ever leaving the device, that's a meaningful architectural improvement for privacy. You get the capability of cloud AI when you need it, but you keep sensitive inference local by default.
Sam: The technical challenge is making the routing decision well. You need a lightweight classifier that can assess task complexity in real time and route appropriately. If it over-routes to local models, users get degraded quality. If it over-routes to cloud, you lose the latency and privacy benefits. Getting that routing model right is the key engineering problem here.
Priya: Quick hit on Coralogix. They just raised two hundred million dollars at a one point six billion dollar valuation, positioning themselves as the monitoring and observability layer for AI agents. This came less than a year after their previous round.
Sam: The speed of the re-raise tells you something about how fast investor demand is growing for agent infrastructure. Every organization deploying autonomous agents needs to know what those agents are doing, when they fail, and how they're performing. Traditional APM tools weren't designed for this. Agents have non-deterministic behavior, they make multi-step decisions, they interact with external systems. Monitoring that requires purpose-built tooling.
Priya: It validates that the AI stack is developing distinct layers — model providers, orchestration frameworks, policy governance like what Microsoft announced, and now observability. Each layer is becoming its own market.
Sam: Now, the UK regulation story. UK regulators are requiring Google to provide a tool that lets publishers opt out of having their content used in generative AI search features. This will pilot in the UK before rolling out globally.
Priya: What's significant here is that it's a mandate, not a voluntary commitment. Google has offered various robots.txt directives and opt-out mechanisms before, but those were discretionary. A regulatory requirement with enforcement behind it is structurally different. It means publishers have a legally backed right to control whether their content gets synthesized in AI-generated search answers.
Sam: The practical question is how this affects AI search quality. If major publishers opt out — news organizations, scientific publishers, reference sites — the knowledge available to AI search gets narrower. There's a real tension between publisher rights and the utility of AI search, and this regulation starts drawing that line.
Priya: And it will likely influence other jurisdictions. The EU has been moving in a similar direction. If the UK pilot works, expect this pattern to spread.
Sam: Two more quick items. OpenAI expanded Codex with six role-specific plugins covering data analytics, sales, creative work, product design, and investment banking. They're reporting five million weekly users, and twenty percent are non-developers — that non-developer segment growing three times faster than the developer base. OpenAI is clearly repositioning Codex from a coding assistant to a general-purpose work tool.
Priya: And on the open-source side, Holo 3.1 dropped on Hugging Face — a computer use agent model optimized for fast, local execution. It can drive desktop automation without cloud dependency. Open weights, meaning anyone can deploy it. Worth watching as the on-device agent space matures.
Sam: Looking ahead, I think today's stories paint a picture of the infrastructure and governance layers catching up to model capabilities. We've had powerful models for a while. Now we're seeing the policy specifications, the observability platforms, the hybrid routing architectures, and the regulatory frameworks that make deploying these models in production actually feasible.
Priya: The Microsoft governance spec and the Anthropic infrastructure deployment are two sides of the same coin. One is about controlling what agents do. The other is about using AI capability at real-world scale on systems that matter. Both signal that we're moving past the "can AI do this" phase and into the "how do we deploy AI responsibly and at scale" phase.
Sam: I'll be watching whether Microsoft's policy spec gets adoption outside the Microsoft ecosystem. If it stays proprietary, it's a product feature. If other platforms adopt it, it becomes infrastructure. That distinction matters.
Priya: And on the Anthropic side, I want to see a breakdown of those ten thousand vulnerabilities. How many were novel versus known? How many have been remediated? The aggregate number is impressive, but the details will tell us how much AI is actually advancing defensive security versus accelerating existing workflows.
Sam: That's our show for today. Links to everything we discussed are at cleartext.fm.
Priya: Thanks for listening. We're back tomorrow.
AI Revolution is an automated daily podcast covering AI advancements. Generated 2026-06-03.
Sources: MIT Technology Review, VentureBeat AI, The Verge, Wired, TechCrunch AI, Ars Technica, IEEE Spectrum, The Decoder, The Gradient, Hugging Face Blog, Google AI Blog, AI News, SemiAnalysis, and The Register.