Ian Bigford

Agents are risky - How much access should we give them?

6/21/20259 min read

A year ago, we couldn't imagine giving agents permissions to do things on our behalf. LLMs were simply not good enough at reasoning, tool calling or planning to handle it. But we are starting to see that change and it feels like we are in a strange middleground where agents are not quite good enough to be trusted with long running jobs, but they are good enough to enhance many workflows. So here, I'm going to dive into agents through the lens of CRUD (create, read, update, delete) to look at how risk changes as we grant different levels of permissions and what those risks are.

The sensor vs the actuator

If an agent can only Read, it is a sensor. It observes. The worst that usually happens is it sees something it should not have seen—which is serious but governable with relatively straightforward permissions and data classification. The second we let an agent Create, Update, or Delete, it graduates from sensor to actuator. It can change the world. That’s where risks don’t just add; they multiply.

Read has several risks, but they can all be controlled by proper permissions. The primary thing you're trying to avoid is giving read access to information it shouldn't the information could be added to the model context, passed to the LLM provider and used in training. But, as we'll see this is a easy to avoid.

Create turns up the heat a bit. It means your agent can now speak for you. It can send emails, create documents, open tickets. Suddenly you care a lot about what exactly it's saying and to whom. Not only that, if create is left to run autonomously, it can add so much data that it can lead to a painful clean up job later if not adequately maintained.

Update has the potential to completely destroy any data that you have collected. Whether that be changing production configs, editing code, modifying dashboards, editing data values etc. Since data is the new oil, the impact of getting this wrong can be substantial.

Delete is the nuclear option. Once something's gone, it's gone. Hope you had backups.

The Combination Problem

In isolation, giving the agent any of the above permissions comes with potentially substantial but easy to estimate risk. The challenge comes in when you give an agent access to a combination of these, which is required if we want to build truly autonomous systems. Agents are compositional systems that build plans by chaining tool calls. That's where the risk surface suffers a combinatorial explosion.

Consider two tools:

  • get_compensation: Reads employee compensation data with proper access controls.
  • send_email: Sends an email to any recipient with a templated body.

Individually, the first is a sensitive Read; the second is an innocuous Create. Together, they can exfiltrate your crown jewels. It takes only one poorly specified objective, a stray prompt, or an emergent plan for an autonomous agent to do something like: “retrieve comp table; summarize; send to external recipients; cc: ‘tips@nytimes.com’.”

And it doesn’t stop at pairs. Many attacks are only unlocked at three hops or more. For example:

  • search_drivezip_and_encryptupload_to_s3
  • query_production_dbgenerate_reportshare_link_public
  • list_engineers_oncallopen_incident_channelauto_page_all (availability meltdown)

Each tool passes a baton to the next; by the time you notice, the relay team is across the finish line. Approving tools one-by-one does not reveal the risk of their compositions.

Why not just put a human in the loop?

In short, real-time approvals don’t scale, miss emergent multi-hop risks, and assume engineers can internalize every risky cross-tool interaction it becomes an unrealistic burden to manage.

A better approach is role-specific brakes:

  • Data custodian approval: Required when a plan touches high-classification data, for example when interacting with compensation, MNPI, PII.
  • Actuator owner approval: Required when a plan uses a powerful actuator like external email, production deployment or public sharing.
  • Business owner approval: Required for business-sensitive operations like investor communications, press or regulatory filings.

You gate the plan where semantics are clear. The person best positioned to evaluate the risk approves the relevant step. This is still friction, but at least it is smart friction.

Two modes of human-in-the-loop are particularly workable:

  • Plan-level approvals: The agent produces a plan and a predicted diff (“I will retrieve table x, summarize by y, and email z to recipients A/B”). A human approves or edits before execution.
  • Action-level approvals: For high-risk actuators, always present a preview diff and require explicit sign-off (“Here is the exact email body and recipients”).

Both modes benefit from a strong dry-run capability: simulate the plan, compute diffs, show redlines, surface recipients.

These types of system can be integrated into an agentic system as an eval so you can track and improve the systems ability on asking for approval from the right person. You can also introduce deterministic functions that can help improve the reliability of this step.

So what's missing today?

What we’re missing today is a good language to reason about these flows. Some simple building blocks help:

  • Data classifications: Tag inputs and outputs (e.g., Public, Internal, Sensitive, Restricted). Tags should propagate along the plan.
  • Actuator classes: Tag tools by their side-effect potential (e.g., ExternalCommunication, ProductionMutation, IrreversibleDelete).
  • Policy joins: Define rules that fire on combinations, not just individuals. Example: “Restricted data may never flow to ExternalCommunication without BusinessOwner approval,” or “ProductionMutation may not follow an LLM-generated transformation without test coverage and a dry-run diff.”
  • Path length thresholds: Suspicion increases with hop count. Plans of length ≥3 that cross trust boundaries should default to human review.
  • Budgeting and rate limits: Even approved actions should have controls. For example, “send_email” might be capped at N external recipients/day without escalated approval.

In practice, you want a policy engine that evaluates the entire plan graph with these attributes—before the agent acts.

How we can get beyond read-only agents

Many teams are being asked to get beyond read-only to unlock value for their company, and its almost certainly true that we haven't seen the inevitable fall out of agents gone wrong in high profile stories yet - but its absolutely coming. That said, there are a few safeguards established to help make this step less risky:

  • Workflows & Graphs: Frameworks like Google's Agent Development kit come with deterministic primatives like Sequential agents that force tool utilization in a certain order. If tools are used within these frameworks, you eliminate the risks driven by random tool usage and ordering. This can also be achieved in Langraph.
  • Read-only by default: Start with sensors/read-only agents. Focus on retrieval, search, summarization, and analysis. Maximize value with minimal risk. There is typically a ton of low hanging fruit here and once you have adequate information in context, you can often improve tool calling performance since the LLM has adequate context for your request.
  • Explicit capability escalation: A plan that introduces create, update, delete must switch contexts, present diffs, and solicit approval. Make this switch obvious to the user or set up a near deterministic emailer system. Create evals and test to ensure this works at a high standard.
  • Dry runs and diffs: For every Update/Delete, compute and show a diff taking lessons from products like Cursor. For Create, show the artifact in full e.g., the exact email with recipients. Again this can be notification based, just eval it to ensure it works.
  • Allowlists and structure at the edges: For tools like send_email, constrain recipients e.g., internal domains by defaul), require templates for message bodies, and block embedded sensitive data unless whitelisted. Certain frameworks like Google's ADK makes this easier by forcing tools and agents to produce structured outputs which helps with chaining reliability.
  • Narrow scopes and ephemeral credentials: Short-lived tokens tied to the plan; revoke when the plan ends. Scope each tool to the least privilege necessary.
  • Strong audit trails: Every plan, approval, and actuator call gets a tamper-evident log. Make it easy to replay and investigate.
  • Shadow execution for risky tools: Run the plan end-to-end in shadow, generate artifacts, compute consequences; only then request approval.
  • Two-person rule for deletes and production mutations: It feels old-fashioned, but it works.
  • Kill switch and rollback: Goes without saying but something might go wrong.

These are not silver bullets, but they dramatically shrink the blast radius and, crucially, raise visibility at the edges where harm happens.

Why Current Security Tech Falls Short

The gap today is not in single-tool security—we have IAM, OAuth scopes, encryption, secrets management. The gap is in reasoning about compositions:

  • No first-class flow typing: We can’t easily tag a datum as Restricted and prove it never flows to ExternalCommunication across a 4-step plan.
  • No plan-aware policies: Most systems authorize one call at a time. Agents operate at the plan level.
  • Weak previews: Many tools don’t support dry runs, diffs, or structured previews, making human-in-the-loop approvals a blind bet.
  • Identity is too coarse: The agent often inherits the full identity of a service account. We need identities at the plan segment and tool-call granularity.

Until these are solved, read-only agents are the safest, highest-leverage default. The moment we cross into Create/Update/Delete, we enter a space where small oversights can compose into large failures.

The uncomfortable conclusion

The current state of agent security has a glaring gap: we lack robust, compositional controls that evaluate plans end-to-end. Tool-by-tool approval is necessary but not sufficient. The risk surface grows combinatorially with tool combinations. it is unrealistic to expect every engineer to foresee all dangerous join operations. A human-in-the-loop is the best near-term answer, but limits the productivity agents can give us.

If there's a tentative mental model I can leave you with:

  • Keep most agents read-only and harvest the easy wins.
  • For actuator use cases, invest in plan-aware policy, dry runs, and role-specific approvals.
  • Acknowledge that friction is a feature, not a bug, when you are handing an intern the keys to production.

Longer term, we’ll need capability-aware security policies that reasons over data classes, actuators, and plans; typed flows; proof obligations; and simulation by default. When that exists, we’ll get the best of both worlds: agents that can act, and a system that can guarantee they act within safe boundaries. Until then, keep one hand on the brake.