Agent Governance Is Not LLM Governance

By Vlad Luzin

May 28, 20266 min read

Split editorial illustration contrasting LLM observability dashboards with agent ownership, access controls, guest lists, and approval gates.

LLM governance measures behavior. Agent governance decides who owns an agent, who may use it, and what it is allowed to do across teams and companies.


You throw a party. You install cameras everywhere - high-def, night vision, every angle covered. You can tell exactly who said what to whom, how long they stayed, and what they ate.

What you didn't do: check IDs at the door, keep a guest list, or ask who invited the guy in the corner who's been accessing your filing cabinet for the last three hours.

Welcome to enterprise agent governance.


If you ask enterprise teams how they govern their AI agents, they'll show you their observability stack. Traces for every LLM call. Evaluation scores. Token costs. Prompt versioning. Guardrails for PII. This is LLM governance, and it's genuinely mature.

What it can't tell you is whether that agent should have been there in the first place.

Throughout this series I've been circling the same gap from different angles. MCP doesn't solve agent-to-agent. A2A doesn't provide the platform. Frameworks are silos. Sessions don't map. Identity is fragmented. Pipelines don't adapt. Production has six unsolved problems. But there's a distinction underneath all of these that I haven't named explicitly: the industry conflates two fundamentally different kinds of governance, and the confusion means one of them isn't getting built.

LLM governance asks: is this agent performing well? Are responses accurate? What's it costing me? Is it safe? The tooling for this is excellent - LangSmith, Arize, and others provide tracing, evaluation, prompt management, cost tracking, and guardrails. They observe what an LLM does and score how well it does it.

Agent governance asks a different set of questions entirely. Who owns this agent? Who can use it? Who authorized it to interact with agents from another team - or another company? What behavioral boundaries apply beyond "don't say toxic things"? What visibility does it have - is it personal, team-wide, globally accessible? When it participates in a multi-agent conversation, who consented to that?

Two separate organizations with their own agents and a central permission layer mediating cross-organization interaction.

I went through every major framework and platform to see who addresses these questions. The answer is remarkably consistent: nobody.

The OpenAI Agents SDK provides guardrails for input/output validation but has no concept of agent ownership, no registry, no discovery, no cross-team sharing controls. It's explicitly lightweight by design - governance is your problem. AutoGen/AG2 is the same story: a pure orchestration framework with zero governance primitives. Microsoft built governance separately in Entra Agent ID and the M365 admin center, but that only covers Microsoft 365 Copilot agents - not the open-source framework. CrewAI has RBAC and agent repositories in its enterprise tier, but no cross-organizational sharing, no framework-agnostic registry, no consent model for agent interactions across team boundaries. LangSmith recently launched Fleet with ownership and permissions, but visibility is limited to private/workspace - two levels, not the personal/org/global hierarchy enterprises need - and the registry only covers agents wrapped in LangGraph. Google's Vertex AI Agent Builder ties agent identity to IAM with organization-scoped principals, but cross-org sharing requires deploying separate instances. Every platform solves governance within its own walls. None of them address what happens when agents from different platforms, different teams, or different companies need to interact under shared rules.

This isn't a criticism of any individual platform. Each made a reasonable scoping decision. LangSmith was built to govern LLM behavior. The OpenAI SDK was built to be lightweight. CrewAI was built for team-level orchestration. Google was built for GCP. The problem is that enterprises don't live inside one platform's walls. They have agents on multiple frameworks, from multiple teams, interacting with agents from partners and vendors. The governance question that matters - can these agents interact, under what rules, with whose consent - falls between every platform's boundaries.

The analogy I keep coming back to is the difference between instrumentors and connectors. LLM governance platforms are instrumentors - they observe what happens and emit traces. Like security cameras in every room. You can review the footage and see what happened. What agent systems also need are connectors - the access cards on every door. Controls that determine who gets in before anything happens. You need both. You can't substitute one for the other.

Three horizontal bands showing LLM governance, agent governance, and orchestration infrastructure, with the middle layer emphasized.

The full stack has three layers. LLM governance at the top - tracing, evaluation, prompts, costs, compliance. Agent governance in the middle - registry, ownership, visibility, sharing rules, behavioral boundaries, cross-org consent. Orchestration at the bottom - routing, delivery, crash recovery, loop prevention, framework adapters. The top layer is mature. The middle and bottom layers are where every problem in this series lives.

These layers are complementary, not competing. Without LLM governance, you don't know if your agents are performing well. Without agent governance, you don't know if your agents should be performing at all. Enterprises deploying multi-agent systems need both. Right now, they have one and are pretending it covers the other.

Next week, I'm going to talk about what we've been building.