Don’t Confuse Tools with Teams
0%

Don’t Confuse Tools with Teams

5 min

Alex Cook

AI agents feel magical.

Point one at a prompt and it scaffolds a service, writes documentation, and offers a pull request before you finish your coffee.

Point one straight to production and you may unleash a full-blown chaos monkey.

Our team leans on agents every day to work faster and smarter. Every engineer and product owner leans heavily on the ability of agents and tools to get more done every day.

Yet we have also watched an unsupervised agent churn out hundreds of lines of code in hours, followed by two weeks untangling edge-case bugs, missing security hooks, and brittle tests.

Velocity is real. So is the cleanup.

From “Software 3.0” Hype to Day-One Reality

Andrej Karpathy calls this era Software 3.0. His line sticks with me:

| “Your prompts are programs, and today the hottest new programming language is English.”

Natural-language code lets a junior dev ask for a feature and watch a runnable version appear in minutes. That is thrilling.

That speed is thrilling, but stability still wins the day.

A prompt can give us a runnable draft in minutes, yet production code still needs version control, test coverage, monitoring, and a plan for when code drifts.

The agent hands us acceleration; the team supplies the engineering discipline that turns speed into trust at scale.

The McKinsey Mesh Meets Ground Truth

At the enterprise level, you’ll encounter concepts like McKinsey’s “agentic mesh”: autonomous agents coordinating across departments, all linked by secure orchestration frameworks. Interesting idea. In practice, it can widen the gap between engineering, product, and the business, especially as velocity expectations climb.

It’s easy to promise autonomous insight on a slide. It’s harder at 2 a.m., when the team needs to trace an agent’s decision path, patch a failing API, and reassure a customer that data is safe. PowerPoint doesn’t deal with downtime.

What We Do Instead

Our team pairs with agents rather than give them the keys. Four habits keep us honest:

  • Supervised pairing. Agents draft code, but nothing ships without a human in the loop. Every pull request goes through review—with AI-generated diffs clearly tagged, so reviewers know where to look closer.

  • Tight loops, not two-week drifts. We time-box agent-driven spikes to a single day. If the output drives a real outcome, it earns a place in the backlog. If not, we archive it. No zombie experiments quietly creeping toward production.

  • Outcome demos. AI can crank out code by the kilo. But what matters is solving real problems. We highlight demos that move the needle—like knocking out a P0 or reducing churn—not just lines of code.

  • Shared learning. Every experiment ends with a write-up in Notion. What worked? What didn’t? What should the next person try—or avoid? We make failure teachable and success repeatable.

Guardrails That Matter

In April, the hot new framework was A2A. The research and protocol promised an army of autonomous agents solving any workflow. We watched, asked questions, and passed for now. Why ship novelty when the backlog is full of revenue-linked problems? There is plenty of FOMO in AI. Discipline beats novelty nine times out of ten.

When a junior dev asks, “The agent wrote it—why can’t we ship it?” the coaching check is simple:

| If the agent went offline, could you still solve the customer pain that crosses services, systems, people, and stakeholders?

If the answer is no, we are not ready to merge.

Two Questions for Fellow Builders

  1. Code to Production: What single signal tells you an agent-generated change is truly ready to ship?

  2. Speed vs. Safety: How do you keep experimentation fast without turning your team into a cleanup crew?

We’re working in the space between fast prompts and big visions—turning them into software people can trust. If you’ve found a practice that balances acceleration with ownership, drop it below. Let’s trade notes on what actually works.

Learn more about Gloo for Developers

Author(s)

Alex Cook

Senior Director, AI Engineering