Agent

The Agent block runs a model inside a workflow. You give it instructions, context, and tools; the model reasons over the input, calls tools as needed, and returns plain text or structured JSON that later blocks read by reference. Most workflows are built around one or more Agent blocks.

An agent and an Agent block are related but distinct. An agent is a whole workflow that reasons and acts on its own; an Agent block is one reasoning step inside it. The simplest agent is a single Agent block with tools, and larger ones wire several together with other blocks. See Agents.

Configuration

Messages

The messages sent to the model. Each message has a role: System sets the agent's job and rules, User gives it the input to act on. Insert a connection tag to pass an earlier output, like <start.input>.

You are a support assistant for an analytics product.
Answer in two sentences, cite the doc you used, and never guess a price.

Model

The model that runs the step. Defaults to claude-sonnet-4-6. Type or pick any model from OpenAI, Anthropic, Google, xAI, Groq, Cerebras, DeepSeek, Azure, AWS Bedrock, Google Vertex, or OpenRouter, or a local model through Ollama or VLLM.

Files

Files for the model to read: images for a vision-capable model, or documents for text. Upload them on the block, or pass a file from an earlier block, such as an upload trigger or an API response, with a connection tag.

Tools

Capabilities the agent can call while it runs: search a knowledge base, send a Slack message, run a Function, call any of the integrations, or use a custom tool or MCP server you've added. The model decides which to call and when. (For where tools come from and when to reach for which, see Agents.) Each tool has a usage control:

Auto. The model calls it when the context warrants.
Force. The model must call it on every run.
None. The tool is hidden from the model, which disables it without removing it from the block.

Skills

Agent skills the agent can load on demand: reusable instruction packages like a coding standard or a support playbook. Only the skill names sit in context up front, and the agent loads the full instructions when it decides a skill is relevant.

Memory

Built-in conversation memory, kept across runs by a conversation ID:

None. Each run is independent.
Conversation. The full history for that conversation ID.
Sliding window (messages). The most recent N messages.
Sliding window (tokens). Recent messages up to a token budget.

Memory needs a conversation ID to persist between runs. For memory that's shared across workflows or managed as its own store, use the Memory block instead.

Response Format

Give the agent a JSON Schema to force structured output. The response is constrained to match the schema, and each field becomes its own output you read by name, like <agent.sentiment>. Without a response format, the agent returns plain text in content.

{
  "name": "user_analysis",
  "schema": {
    "type": "object",
    "properties": {
      "sentiment": { "type": "string", "enum": ["positive", "negative", "neutral"] },
      "confidence": { "type": "number", "minimum": 0, "maximum": 1 }
    },
    "required": ["sentiment", "confidence"]
  }
}

Advanced

Some settings live under advanced, or appear only for models that support them:

Temperature. How varied the output is. Stay low (0–0.3) when you need accuracy and repeatability, middle (around 0.5) for everyday work, higher (0.7+) when you want creative variety.
Max output tokens. Caps the response length. Defaults to the model's full limit.
Reasoning effort / Thinking level. For models with extended reasoning, how much the model thinks before answering. Higher is more thorough but slower and costs more tokens.
Prompt caching. For Anthropic Claude models, reuses the system prompt and tool definitions between runs instead of re-reading them every time. Cached input costs a tenth of the normal rate, but writing the cache costs 1.25x, so leave it off for one-off runs and turn it on when the same agent runs repeatedly. The cache covers a prefix only if it reaches 1,024 tokens (2,048 on Haiku) — below that Anthropic ignores it and nothing changes. Entries expire after five minutes of no use.
API key. Your key for the chosen provider. Hidden on hosted Sim, which supplies one.

OpenAI and Gemini cache automatically at no extra cost and need no setting; their discount is already reflected in what you are charged.

Outputs

After the agent runs, later blocks read its result by name:

Output	What it is
`<agent.content>`	The response: text, or the structured object when a response format is set
`<agent.tokens>`	Token usage, an object `{ input, output, total }`
`<agent.toolCalls>`	The tools the agent called, with their inputs and results
`<agent.model>`	The model that ran the step
`<agent.cost>`	Estimated cost of the call

When a response format is set, its fields are readable directly, like <agent.sentiment>.

Streamed thinking and tool calls

While an agent runs, Sim can stream its thinking and tool lifecycle live — the canvas terminal always shows them, and a deployed chat shows them when its Include thinking setting and the client's protocol opt-in agree. What actually streams depends on the model: models marked with full deltas or summaries stream thinking when a Thinking level or Reasoning effort is set (DeepSeek reasoner models always reason); a model marked "Not streamed" thinks internally but its provider withholds the text.

Live tool-call chips stream for OpenAI, Anthropic, Azure Anthropic, Google, Vertex AI, DeepSeek, Groq, AWS Bedrock models. Other providers run tools without live chips and project the settled final answer when the run completes; they do not ask the model to regenerate that answer just to create a stream.

Provider	Streamed thinking	Models
OpenAI	Summaries only — Requires OpenAI organization verification; falls back to no summaries.	`gpt-5.6-sol`, `gpt-5.6-terra`, `gpt-5.6-luna`, `gpt-5.5-pro`, `gpt-5.5`, `gpt-5.4-pro`, `gpt-5.4`, `gpt-5.4-mini`, `gpt-5.4-nano`, `gpt-5.2-pro`, `gpt-5.2`, `gpt-5.1`, `gpt-5-pro`, `gpt-5`, `gpt-5-mini`, `gpt-5-nano`, `o4-mini`, `o3`, `o3-mini`, `o1`
Anthropic	Summaries only — These generations omit full thinking; Sim requests summarized thinking on streaming runs.	`claude-fable-5`, `claude-sonnet-5`, `claude-opus-5`, `claude-opus-4-8`, `claude-opus-4-7`, `claude-opus-4-6`, `claude-sonnet-4-6`, `claude-opus-4-5`, `claude-opus-4-1`, `claude-sonnet-4-5`, `claude-haiku-4-5`
Azure OpenAI	Summaries only — Requires OpenAI organization verification; falls back to no summaries.	`azure/gpt-5.4`, `azure/gpt-5.4-mini`, `azure/gpt-5.4-nano`, `azure/gpt-5.2`, `azure/gpt-5.1`, `azure/gpt-5.1-codex`, `azure/gpt-5`, `azure/gpt-5-mini`, `azure/gpt-5-nano`, `azure/o3`, `azure/o4-mini`
Azure Anthropic	Summaries only — These generations omit full thinking; Sim requests summarized thinking on streaming runs.	`azure-anthropic/claude-opus-4-6`, `azure-anthropic/claude-opus-4-5`, `azure-anthropic/claude-sonnet-4-5`, `azure-anthropic/claude-opus-4-1`, `azure-anthropic/claude-haiku-4-5`
Google	Summaries only	`gemini-3.6-flash`, `gemini-3.5-flash-lite`, `gemini-3.5-flash`, `gemini-3.1-pro-preview`, `gemini-3.1-flash-lite`, `gemini-3-flash-preview`, `gemini-2.5-pro`, `gemini-2.5-flash`, `gemini-2.5-flash-lite`
Vertex AI	Summaries only	`vertex/gemini-3.5-flash`, `vertex/gemini-3.1-pro-preview`, `vertex/gemini-3.1-flash-lite`, `vertex/gemini-3-flash-preview`, `vertex/gemini-2.5-pro`, `vertex/gemini-2.5-flash`, `vertex/gemini-2.5-flash-lite`
DeepSeek	Full thinking deltas	`deepseek-v4-pro`, `deepseek-v4-flash`, `deepseek-reasoner`
Groq	Full thinking deltas	`groq/openai/gpt-oss-120b`, `groq/openai/gpt-oss-20b`, `groq/openai/gpt-oss-safeguard-20b`, `groq/qwen/qwen3.6-27b`
Meta	Not streamed	`muse-spark-1.1`
Kimi	Full thinking deltas	`kimi-k2.6`
Z.ai	Full thinking deltas	`glm-5.2`, `glm-5.1`, `glm-5`, `glm-5-turbo`, `glm-4.7`, `glm-4.6`, `glm-4.5`, `glm-4.5-air`

Example

A workflow that reads an incoming customer message and classifies it:

The Agent reads the message from Start with <start.input> and returns a result that later blocks read as <agent.content>.

Best Practices

Write a clear system message. Define the agent's role, tone, and limits. Specific instructions produce more reliable output than a vague prompt.
Match the model and temperature to the task. Use a stronger model and lower temperature (0–0.3) for accuracy; raise temperature for creative or varied output.
Give the agent only the tools it needs. Too many tools dilute its choices. For jobs that don't overlap, use a second Agent block instead of overloading one.
Use a response format when a downstream block needs specific fields. It guarantees the shape, and you read each field as <agent.field>.

Common Questions

OpenAI, Anthropic, Google (Gemini), xAI (Grok), DeepSeek, Groq, Cerebras, Azure OpenAI, Azure Anthropic, Google Vertex AI, AWS Bedrock, OpenRouter, and local models via Ollama or VLLM. Type or select any supported model from the model combobox.

Four modes: None (no memory, each run is independent), Conversation (full history keyed by a conversation ID), Sliding window by messages (the N most recent messages), and Sliding window by tokens (messages up to a token budget). Memory needs a conversation ID to persist across runs.

In Auto, the model decides when to call a tool based on context. In Force, the model must call the tool on every run. In None, the tool is hidden from the model and never sent, which disables it without removing it from the block.

It enforces structured output by providing a JSON Schema. When set, the model's response is constrained to match the schema exactly, and each field is read directly by downstream blocks using <agent.fieldName>. Without a response format, the agent returns its standard outputs: content, model, tokens, and toolCalls.

They appear only for models that support extended reasoning. Reasoning Effort (OpenAI o-series and GPT-5 models) and Thinking Level (Anthropic Claude and Gemini models with thinking) control how much compute the model spends reasoning before responding. Higher levels produce more thorough answers but cost more tokens and take longer.

Turn it on when the same agent runs repeatedly with a large, stable system prompt or tool set — cached input bills at a tenth of the normal input rate. Leave it off for one-off runs, because writing the cache costs 1.25x and nothing reads it back. The setting appears only for Anthropic Claude models; OpenAI and Gemini cache automatically with no setting and no write fee. Anthropic only caches a prefix of at least 1,024 tokens (2,048 on Haiku), and entries expire after five minutes of no use.

The Agent block uses each Anthropic model's full max output token limit by default (for example, 64,000 tokens). You can override this with the Max Output Tokens setting. For non-streaming requests that exceed the SDK's internal threshold, the provider automatically uses internal streaming to avoid timeouts.

Yes. Use any Ollama or VLLM-compatible model by typing the model name directly into the model combobox, as long as it exposes a compatible API endpoint.

Agent

Common Questions

On this page