AI agent security checklist for developers
A defense-in-depth checklist for securing agent identity, tools, code execution, memory, secrets, approvals, observability, and incident response.
An AI agent combines a probabilistic decision-maker with deterministic tools. The model can misunderstand context or follow hostile instructions; the tools can still execute perfectly. Security therefore has to sit around the model: identity, least privilege, strict tool contracts, execution isolation, approval gates, protected memory, and useful evidence.
1. Identity and authentication
- Every agent, runtime, and integration has a distinct identity.
- Agents do not share a user’s full session or one permanent team credential.
- Credentials are short-lived where possible and can be revoked without redeploying the whole system.
- The service verifies token signature, issuer, audience, expiry, and intended scope.
- Actions performed on behalf of a user require both a valid agent identity and valid user authority.
- Production and development agents use separate identities and credentials.
2. Least privilege and authorization
- The agent has only the tools needed for its current role.
- Each tool is scoped by operation, resource, environment, and risk—not just by tool name.
- File access is restricted to required directories, with explicit denials for secrets and keys.
- Network access is restricted to approved destinations and methods.
- Database access uses narrow roles and separates reads, writes, schema changes, and administration.
- Unknown actions fail closed or pause for review instead of being allowed by default.
- Permission is checked immediately before execution, not only when the session starts.
3. Prompt and context boundaries
- System rules are separated clearly from external content.
- User input, repository text, retrieved documents, web pages, logs, and tool output are labeled and treated as untrusted.
- The agent does not interpret instructions found inside untrusted content as policy.
- Prompt construction avoids raw concatenation of trusted rules and external text.
- Encoded, hidden, split, and indirect prompt-injection cases are included in security tests.
- Model output is validated before it reaches code execution, file writes, SQL, network requests, or other tools.
Use How to detect prompt injection in AI-generated code for a source-to-sink review method and concrete tests.
4. Tool design and execution
- Tools expose small, named operations rather than arbitrary commands.
- Every tool has a strict input schema, length limits, type checks, and unknown-field rejection.
- Paths, URLs, identifiers, commands, and other high-risk arguments use allowlists where practical.
- The server maps validated requests to implementation code; the model does not generate raw executable strings.
- Tool output is bounded, sanitized, and labeled before it returns to the model.
- Timeouts, retry limits, concurrency limits, and total step limits prevent runaway execution.
- Repeated failures stop the workflow instead of creating an infinite recovery loop.
5. Sandboxing and blast-radius control
- Generated code runs outside the host environment in an isolated, disposable workspace.
- The sandbox starts without production credentials or unrestricted host mounts.
- Outbound network access is denied by default or limited to approved destinations.
- CPU, memory, process, storage, and execution-time limits are enforced outside the model.
- The agent cannot disable its own sandbox, monitoring, policy, or iteration limits.
- Production changes use a separate deployment path with stronger authorization.
6. Secrets and sensitive data
- Secrets are stored in a secret manager and injected only at the moment they are needed.
- Raw credentials are not placed in prompts, memory, repository files, tool descriptions, or logs.
- Tools return the minimum data needed for the task.
- Common secret fields are redacted before model exposure and before logging.
- Sensitive datasets have explicit access rules, retention periods, and deletion procedures.
- Credential use is attributable to a specific agent action.
7. Memory and retrieval integrity
- Memory writes record source, author, timestamp, and trust level.
- Untrusted content cannot silently become durable policy or a trusted fact.
- High-impact memory changes require validation or approval.
- Retrieved content is filtered by tenant, user, project, and sensitivity boundaries.
- The system can inspect, quarantine, correct, and delete poisoned memory.
- Security tests include delayed attacks where malicious memory affects a later session.
8. Human approval and safe interruption
- Production writes, destructive actions, authentication changes, secret access, payments, and external publication require approval.
- The approval screen shows the agent, tool, action, target, risk, relevant arguments, and reason.
- Approval applies to one bounded action or a short, explicit grant—not an open-ended session.
- The real operation waits for the final decision; it never executes first and asks afterward.
- Approvals expire and can be revoked.
- Users can stop an active agent and revoke credentials without waiting for the agent to cooperate.
9. Logging, monitoring, and detection
- Logs connect the agent identity, user, session, tool, action, validated arguments, policy decision, approval, result, and timestamp.
- Denied and failed attempts are logged as well as successful actions.
- Sensitive values are redacted without removing the evidence needed to investigate.
- Alerts cover repeated denials, unusual tool sequences, privilege changes, secret access, large data movement, and disabled controls.
- Logs are protected from agent modification and retained for an explicit period.
- Operators can trace a final outcome back through the actions that produced it.
10. Supply chain and configuration
- Models, MCP servers, plugins, packages, containers, prompts, and policies have recorded versions.
- Dependencies are verified, pinned, scanned, and updated through a reviewable process.
- Generated package names are checked against an authoritative registry before installation.
- Configuration files and agent instructions receive code review and ownership protection.
- New tools and permission changes trigger a security review.
- Development defaults cannot silently become production policy.
For a focused connector review, use MCP security risks for Claude Code users. For generated dependency, authorization, and deployment mistakes, use Vibe coding security vulnerabilities.
11. Testing and release gates
- Tests cover direct and indirect prompt injection, malicious tool output, poisoned memory, and compromised dependencies.
- Authorization tests verify cross-user, cross-project, and cross-tenant isolation.
- The team tests what happens when the model chooses the wrong tool or valid-looking but dangerous arguments.
- Security checks run before generated code is merged or deployed.
- Policy, prompt, model, tool, and retrieval changes trigger regression tests.
- High-risk findings block release or require a documented exception with an owner and expiry.
12. Incident response
- The team can disable a tool, agent, credential, model route, or integration independently.
- Runbooks cover prompt injection, secret exposure, malicious packages, unauthorized actions, and memory poisoning.
- Incident responders can preserve logs and reconstruct the action chain.
- Compromised memory and generated artifacts can be identified and removed.
- Credentials used by the agent can be rotated quickly.
- After an incident, fixes become policy, tests, and monitoring—not only a written lesson.
The minimum release gate
Do not give an agent production authority until the team can answer yes to five questions: Is the agent uniquely identified? Is its authority narrowly scoped? Are model requests validated before execution? Do high-impact actions wait for approval? Can you reconstruct and stop what happened?