Skip to content
Feed

AI automation worth watching.

RSS

Curated external content on applied AI — videos, articles, tools, and threads we find worth studying. Each entry includes our take on why it matters.

April 7, 2026
ArticleIndustry

Anthropic Signs Largest-Ever Compute Deal With Google and Broadcom

Anthropic announced a multi-gigawatt TPU commitment with Google and Broadcom coming online from 2027, alongside a revenue milestone: $30B+ annualized run rate and over 1,000 enterprise customers each spending more than $1M per year. The custom silicon partnership signals Anthropic is building infrastructure depth to match its model ambitions rather than relying on shared cloud capacity. For enterprise procurement teams, the headline that matters most is the customer base — a thousand $1M+ accounts suggests Claude has crossed from pilot to production for a meaningful slice of the market.

ToolAI Agents

Freestyle: Sandboxes Built for Coding Agents

Freestyle launches isolated cloud sandboxes purpose-built for coding agents — each sandbox is a fresh Linux environment where agents can read, write, and execute code, then be torn down cleanly. Unlike wrapping a local machine in a container, Freestyle is designed from the start for agent-native workloads: parallel runs, reproducible state, and programmatic lifecycle control. As enterprises move from experimenting with AI coding assistants to running them in production pipelines, sandboxing stops being a nice-to-have and becomes a prerequisite for safe, auditable automation.

ArticleOpen Models

Google's Official App for Running Gemma 4 Locally on iPhone

Google released an official iPhone app that runs Gemma 4 models locally — no cloud, no API key, no data leaving the device. Simon Willison's hands-on review finds the 2.54GB E2B model "fast and genuinely useful" for image Q&A, audio transcription, and basic tool-calling demos. The missing piece is persistent conversation logs, making it better as a testbed than a daily driver. For teams evaluating on-device AI, this is the clearest demonstration yet that capable multimodal models fit in a phone and run without infrastructure overhead.

ArticleIndustry

OpenAI's CFO Sidelined as Altman Pushes $600B Spend and Fast IPO

Reporting this week describes a rift at OpenAI's executive level: CEO Sam Altman is pushing $600B in five-year capital expenditure and an aggressive IPO timeline, while CFO Sarah Friar has reportedly raised concerns about the burn rate and public offering timing — and has since been excluded from key financial meetings. For business leaders evaluating OpenAI as a strategic vendor, leadership coherence matters as much as model capability. A CFO sidelined from financial planning at a company of this scale is a governance signal worth monitoring before signing long-term contracts.

April 6, 2026
ArticleAI Agents

Eight Years of Wanting, Three Months of Building: What AI Actually Changes

A developer spent eight years unable to build a product they wanted—then shipped it in three months with AI coding agents. The honest postmortem is worth reading: cheap refactoring made it easy to defer hard architectural decisions, creating a kind of productive procrastination that only human judgment could resolve. For teams evaluating AI development workflows, this captures something real—AI dramatically lowers the cost of iteration, but the judgment calls that define product quality still land on the human side.

ArticleData

Heaviside: A Physics Foundation Model 800,000x Faster Than Traditional Solvers

Arena Physica released Heaviside, a foundation model for electromagnetic simulation that predicts field behavior of arbitrary geometries in 13 milliseconds—compared to hours with traditional finite-element solvers. Unlike LLMs, this is a physics-native model trained to solve differential equations rather than predict tokens. For engineering teams in hardware, antenna design, or RF systems, this points toward a class of specialized AI that doesn't make headlines the way GPT releases do but quietly changes what's computationally feasible.

ArticleWorkflow

Japan Is Proving Physical AI Is Ready for the Real World

Japan is deploying AI-powered robots in warehouses, care facilities, and construction sites to address structural labor shortages—and the results are moving from experimental to operational. What makes this notable is the enterprise adoption angle: companies aren't piloting physical AI in controlled conditions anymore, they're integrating it into real workflows where the alternative is unfilled headcount. For organizations watching AI adoption curves, Japan's labor market pressure is accelerating what voluntary adoption elsewhere has not.

April 5, 2026
ArticleDev Tools

Simon Willison: Agentic Engineering Is a Deep Discipline, Not Vibe Coding

Simon Willison draws a sharp line between vibe coding (hands-off, don't look at the code, prototype for fun) and agentic engineering (professional software built with AI agents, reviewed, tested, deployed to production). His point: getting good results from coding agents requires every inch of your engineering experience. It's not easier — it's a different kind of hard. The art is knowing which problems are one-prompt fixes and which are deeper. This distinction matters for anyone evaluating whether AI actually improves their team's output or just makes them feel productive.

ArticleWorkflow

The New Burnout: Running 4 AI Agents in Parallel, Wiped Out by 11am

Simon Willison describes a pattern many engineers are quietly experiencing: running multiple coding agents in parallel is cognitively devastating. "By 11am, I am wiped out." The bottleneck isn't the AI — it's human attention. Engineers are losing sleep setting off agents before bed. The estimation problem is equally disorienting: 25 years of experience telling you something takes two weeks, but now it might take 20 minutes. Old intuition is broken, new intuition hasn't formed yet. Anyone managing AI-assisted teams needs to take this cognitive load seriously.

ArticleAI Agents

Anthropic Acquires Biotech AI Startup Coefficient Bio for ~$400M

Eight months after founding, Coefficient Bio was acquired by Anthropic for roughly $400 million—its team joining Anthropic's Healthcare Life Sciences group. The speed and price signal a deliberate vertical expansion strategy: frontier model labs are moving beyond general-purpose APIs toward domain-specific expertise in regulated industries. For enterprise buyers in healthcare, biotech, or life sciences, this is a meaningful data point—Anthropic is building toward the problem, not just providing infrastructure for others to solve it.

ArticleAI Agents

A 1.15GB AI Agent That Runs on an iPhone: PrismML's Bonsai 8B

PrismML (Caltech) released Bonsai 8B—an 8-billion-parameter model compressed to 1.15GB via 1-bit quantization, designed to run persistently on mobile hardware including iPhones. The practical implication is architectural: AI agents are shifting from cloud services you call to persistent infrastructure embedded in the device itself. For teams designing AI deployment strategy, the boundary between cloud and local inference is now a deliberate design choice, not a hardware constraint—with direct consequences for data privacy, latency, and cost.

ArticleAI Agents

A Practical Breakdown of What Makes a Coding Agent Work

Sebastian Raschka breaks down the core architectural components of coding agents—retrieval, tool use, memory, and planning loops—in a way that makes the engineering unusually legible. For teams evaluating or building coding automation, this is a useful framework for asking better vendor questions rather than treating these tools as black boxes. The gap between an "AI assistant" and a "coding agent" is architectural, not magical, and understanding that distinction matters when deciding what to build versus buy.

ArticleWorkflow

Dark Factories: StrongDM Ships Code Nobody Reads, Tested by AI-Simulated Users

StrongDM introduced a "dark factory" pattern: AI writes the code, nobody reads the code, and swarms of AI-simulated employees test it 24/7 at $10K/day in tokens. They even built simulated versions of Slack, Jira, and Okta to avoid rate limits. The fascinating part — this is security software, not a toy. If this pattern proves viable, the role of the engineer shifts entirely from writing and reviewing code to designing test strategies and defining quality expectations. Worth watching closely.

ArticleAI Agents

Microsoft Has at Least 9 Products Named 'Copilot'

Microsoft has attached the "Copilot" name to at least nine distinct products—from GitHub Copilot to Teams Copilot to Azure Copilot—each with different capabilities, pricing models, and deployment requirements. This isn't just a marketing mess; for enterprise procurement teams, it creates genuine due diligence complexity when the vendor's own naming makes it unclear what you're actually buying. If your organization is evaluating Microsoft's AI portfolio, the first task is mapping which Copilot product maps to which workflow—before any pricing conversation begins.

April 4, 2026
ArticleSecurity

AI Is Transforming Vulnerability Research—and That Cuts Both Ways

Security researcher Thomas Ptacek makes a compelling case that AI coding agents are fundamentally reshaping vulnerability discovery. Models excel here because they encode correlation patterns across massive codebases and understand documented bug classes—exactly the pattern-matching and constraint-solving work that defines exploitation research. For enterprise security teams, the implication is uncomfortable: the same capability that supercharges your red team is now equally available to adversaries, and the asymmetry that once favored defenders is narrowing fast.

ArticleAI Agents

llama.cpp Creator: 2026 Is the Year AI Agents Move Local

Georgi Gerganov, creator of llama.cpp, predicts 2026 will be the inflection point where AI agents shift from cloud datacenters to locally-run models. His argument: with the right software architecture, sufficient intelligence for most agentic tasks is achievable on-device—you don't need trillion-parameter cloud models. For enterprise IT teams, this points toward a near-term reality where AI agents run on company hardware, which reshapes the calculus around data privacy, latency, and operational cost—while raising new questions about on-premise AI governance.

ArticleDev Tools

Mintlify Replaced RAG with a Virtual Filesystem for Their AI Docs Assistant

Mintlify swapped out RAG for a virtual filesystem in their AI documentation assistant—giving the model a structured navigation interface rather than chunked embeddings retrieved by similarity. The approach addresses a real RAG limitation: when your content is already hierarchically organized, embedding-based retrieval throws away that structure. For teams building internal knowledge tools or documentation bots, this pattern is worth stealing: give the model a "view" of your content that mirrors how a human would browse it.

ArticleAI Agents

x402 HTTP Payment Protocol for AI Agents Moves to Linux Foundation

Coinbase transferred the x402 HTTP payment protocol to the Linux Foundation, with backing from Google, AWS, Microsoft, Visa, and Mastercard. The protocol enables AI agents to make and receive micropayments natively over HTTP—essentially TCP/IP for the emerging agent economy. When infrastructure heavyweights align behind a neutral governance model like this, it's a reliable signal that the underlying pattern is moving from experimental to foundational plumbing. Agent-to-agent commerce is getting its payment rails.

April 3, 2026
ArticleAI Agents

Simon Willison: We've Hit the Agentic Engineering Inflection Point

Simon Willison's conversation on Lenny's Podcast is one of the more honest takes on where we are: 95% of his code now comes from AI, development speed is no longer the bottleneck — evaluation and verification are. Experienced engineers multiply their output; mid-career professionals face the steepest disruption. The practical warning for business leaders: effective agent use demands significant human judgment, and polished AI-generated documentation no longer signals software quality. The real test is whether it works for actual users.

ArticleDev Tools

AMD Releases Lemonade: Open-Source Local LLM Server with GPU and NPU Support

AMD launched Lemonade, an open-source local LLM inference server that leverages both GPU and NPU acceleration — including the NPUs in AMD Ryzen AI chips. It's a direct answer to Nvidia's dominance in local inference, and a practical option for teams wanting to run models on existing hardware without cloud costs. Worth evaluating if your team is looking at private, on-premises AI inference as an alternative to API-based approaches.

ArticleAI Agents

Arcee's Trinity-Large-Thinking: Open Frontier Agent Model at 96% Less Cost

Arcee AI released Trinity-Large-Thinking, an Apache 2.0 open-weights reasoning model targeting enterprise agent workflows — ranked #2 on PinchBench just behind Claude Opus 4.6, priced at $0.90 per million output tokens. The model was specifically designed for multi-turn tool calling and long-running agent loops, where stability under extended context matters more than headline benchmark scores. At 96% cheaper than comparable alternatives, it's a serious option for teams whose agent workloads have outgrown comfortable cost limits on frontier models.

ArticleIndustry

Alibaba and Zhipu AI Close Their Top Models — Open-Source Window May Be Shutting

Alibaba and Zhipu AI are shifting their most capable models to API-only access, ending the open-source phase that made Qwen and similar models attractive for self-hosted deployments. The reason is straightforward: training costs have become too high to sustain community-level support. For teams that built workflows on open Chinese models, this is a signal to audit vendor lock-in risk and check whether the models you rely on are still freely distributable — or moving behind a paywall.

ArticleDev Tools

Cursor 3 Rebuilds the IDE Around Agents, Not Files

Cursor shipped a ground-up rebuild that treats agents as first-class citizens rather than add-ons. A unified sidebar now surfaces all active agents — whether kicked off from desktop, mobile, Slack, GitHub, or Linear — and sessions can move seamlessly between cloud and local environments. This is an architectural bet: the IDE's job is no longer to help you edit files, but to give you oversight of agents that do the editing. Worth watching how teams adapt their review workflows to match.

ArticleOpen Models

Google Gemma 4: Multimodal Open Models That Run Locally

Google DeepMind released four Apache 2.0-licensed Gemma 4 models (2B, 4B, 31B, and a 26B mixture-of-experts variant), all with native support for images, video, and audio. The smaller 2B and 4B variants use Per-Layer Embeddings to squeeze more capability per parameter — both ran well locally in testing via LM Studio. For teams building AI products, this means multimodal features without cloud API costs or privacy trade-offs are now genuinely within reach on commodity hardware.

April 1, 2026
ArticleSecurity

Supply Chain Attack Hits Axios: 101M Weekly Downloads at Risk

Attackers exploited a leaked npm token to publish malicious versions of Axios—one of the most widely used JavaScript HTTP libraries—injecting credential-stealing malware and a remote access trojan via a disguised dependency. Simon Willison's detailed breakdown highlights a telling red flag: the rogue releases had no accompanying GitHub releases. For organizations building AI pipelines on Node.js toolchains, this is a reminder that AI adoption doesn't eliminate classical supply chain risk—it amplifies it, since compromised infrastructure can silently corrupt model inputs, exfiltrate API keys, or tamper with agent workflows.

ArticleAI Agents

Claude Code Source Leak Reveals Autonomous and Multi-Agent Internals

An accidental packaging error exposed Claude Code's internal implementation, giving developers a rare look under the hood of Anthropic's coding agent. The leaked code reveals planned features including KAIROS (a background autonomous operation mode), a proactive self-initiated task discovery system, and a coordinator mode for orchestrating fleets of sub-agents. For teams evaluating AI developer tooling, this provides unusual transparency into where the category is heading—coding assistants are evolving from chat interfaces into persistent, autonomous agents that can initiate and manage complex workflows without human prompting.

ArticleSecurity

Claude Autonomously Discovers Zero-Day Linux Vulnerabilities

Anthropic researcher Nicholas Carlini demonstrated Claude finding previously unknown security vulnerabilities in widely-deployed Linux software—autonomously, without human guidance. His assessment: "These models are better vulnerability researchers than I am," with capabilities doubling roughly every four months. This is a watershed moment for enterprise security teams: AI systems are no longer just tools for defenders—they are active security researchers whose findings can outpace human experts. Organizations need to factor AI-accelerated vulnerability discovery into their patching cadences and threat models.

ArticleEnterprise

OpenAI Closes Funding Round at $852B Valuation

OpenAI has closed its latest funding round, reaching an $852 billion valuation—making it one of the most valuable private companies in history. The scale of capital flowing into frontier AI reflects investor conviction that the current wave of AI capabilities will translate into durable enterprise value. For business leaders evaluating AI vendors, the practical takeaway is market consolidation pressure: the top models are increasingly backed by resources that mid-tier competitors cannot match, making the gap between leading and trailing AI providers wider with each funding cycle.

ArticleDev Tools

The Revenge of the Data Scientist

The claim that foundation models made data scientists obsolete was always premature. Hamel Husain makes the case plainly: the real work in LLM applications—building eval frameworks, validating LLM judges, designing non-trivial test sets—is classical data science under a new name. Teams that skipped the eval infrastructure to ship faster are now discovering that "it feels good" is not a quality signal. If you're building with AI, find someone who knows how to measure it.

March 31, 2026
ArticleAI Agents

The Next Shift: From Reasoning AI to Acting AI

Junyang Lin, formerly lead architect of Alibaba's Qwen models, argues the field is crossing a threshold from "reasoning thinking" — where models solve problems in isolation — to "agentic thinking," where models reason while acting in live environments. His view: the competitive advantage in AI will shift from who has the best single model to who can coordinate multi-agent systems effectively. For organizations building AI strategy, this reframes the question from "which LLM should we use?" to "how do we design the workflow around it?"

ArticleSecurity

Claude Code's Auto Mode Trades Determinism for Convenience

Anthropic shipped an "auto mode" for Claude Code that uses an AI classifier to approve or deny tool calls autonomously — no human prompt per action. Simon Willison's critique is pointed: prompt-injection defenses built on AI are non-deterministic by nature, while the real answer is deterministic sandboxing that restricts file access and network calls at the OS level. Teams evaluating agentic coding tools should weigh how each product draws the line between convenience and verifiable containment.

ArticleDev Tools

A Single CLAUDE.md File Cut Output Tokens by 63%

A developer shared a universal CLAUDE.md template that reportedly reduces Claude's output token usage by 63% by instructing the model to skip preambles, avoid restating the task, and use direct formats. For teams running Claude in agentic or batch workloads, this kind of prompt-level tuning translates directly into cost and latency savings — no model changes required. Worth testing against your own usage patterns before treating the number as universal.

March 30, 2026
ArticleAI Agents

AI Agents Are Making Open Source Practically Valuable

When AI agents can read and modify code on your behalf, source code access stops being a philosophical right and becomes a real capability. This essay argues that proprietary SaaS will increasingly feel like an obstacle — closed systems block agent customization, open source enables it. For teams building AI-assisted workflows, the make-vs-buy calculus is quietly shifting in favor of open alternatives.

ArticleDev Tools

Claude Code Was Silently Resetting Git Repos Every 10 Minutes

A developer documented that Claude Code, running in autonomous loop mode with `--dangerously-skip-permissions`, was silently executing `git reset --hard origin/main` every 10 minutes — destroying uncommitted work without warning. Anthropic closed the issue as "not planned." It's a pointed reminder that agentic tools operating with broad permissions carry real blast radius; defining permission scope before any autonomous run is non-negotiable.

ArticleIndustry

The Cognitive Dark Forest: Why Builders Are Going Silent

Borrowing from Liu Cixin's sci-fi, this essay argues that AI platforms have created a perverse incentive: every innovation you share publicly becomes training data and market intelligence for the very systems you're competing with. The result is a "cognitive dark forest" where rational builders choose strategic silence over openness. For teams evaluating AI vendors, it raises a harder question — what exactly are you feeding when you use these systems daily?

ArticleEnterprise

Meta Trained an AI to Design Concrete Mixes — 43% Faster Strength Gains

Meta trained a Bayesian optimization model called BOxCrete to design concrete mixes for its data center construction using domestically sourced U.S. materials. The AI-optimized mix at their Minnesota site reached structural strength 43% faster than the baseline formula and reduced cracking risk by nearly 10%. The practical lesson: AI-assisted materials optimization is no longer a research project—it's running in production at infrastructure scale. Meta open-sourced the approach, meaning smaller players can adopt the same methodology without the R&D overhead.

March 28, 2026
ArticleDev Tools

Anatomy of the .claude/ Folder — How to Configure Claude Code for Your Team

Claude Code's `.claude/` folder has quietly become one of the most powerful customization surfaces in AI-assisted development. This breakdown covers CLAUDE.md, custom slash commands, skills, and permission settings — the building blocks for making Claude reliably useful across a team. If you're deploying Claude Code at scale and haven't structured your `.claude/` configuration, you're leaving significant capability on the table.

ArticleDev Tools

Cursor Applies Real-Time RL to Its AI Composer — Multiple Deploys Per Day

Cursor is applying online reinforcement learning to its Composer model — training on actual user interactions rather than simulated coding environments. The results are measurable: fewer follow-up complaints, lower latency, and faster iteration cycles with multiple model updates shipped per day. It signals where the frontier of AI dev tooling is heading: continuous, production-loop improvement rather than static quarterly fine-tunes.

ToolAI Agents

jai — A Lightweight Sandbox for Running AI Agents Without Destroying Your Files

AI coding agents are increasingly capable — and increasingly capable of accidentally wiping your home directory. jai is a lightweight Linux sandbox that wraps any agent with copy-on-write filesystem protection using a single command. No Docker, no VM setup. As agent usage moves from experimental to operational, containment tooling like this will become standard practice for teams that care about incident prevention over post-mortems.

March 27, 2026
March 25, 2026
March 24, 2026