Sim

Evaluator

The Evaluator block uses AI to score and assess content quality against custom metrics. Perfect for quality control, A/B testing, and ensuring AI outputs meet specific standards.

Evaluator Block Configuration

Configuration Options

Evaluation Metrics

Define custom metrics to evaluate content against. Each metric includes:

  • Name: A short identifier for the metric
  • Description: A detailed explanation of what the metric measures
  • Range: The numeric range for scoring (e.g., 1-5, 0-10)

Example metrics:

Accuracy (1-5): How factually accurate is the content?
Clarity (1-5): How clear and understandable is the content?
Relevance (1-5): How relevant is the content to the original query?

Content

The content to be evaluated. This can be:

  • Directly provided in the block configuration
  • Connected from another block's output (typically an Agent block)
  • Dynamically generated during workflow execution

Model Selection

Choose an AI model to perform the evaluation:

  • OpenAI: GPT-4o, o1, o3, o4-mini, gpt-4.1
  • Anthropic: Claude 3.7 Sonnet
  • Google: Gemini 2.5 Pro, Gemini 2.0 Flash
  • Other Providers: Groq, Cerebras, xAI, DeepSeek
  • Local Models: Ollama or VLLM compatible models

Use models with strong reasoning capabilities like GPT-4o or Claude 3.7 Sonnet for best results.

API Key

Your API key for the selected LLM provider. This is securely stored and used for authentication.

Example Use Cases

Content Quality Assessment - Evaluate content before publication

Agent (Generate) → Evaluator (Score) → Condition (Check threshold) → Publish or Revise

A/B Testing Content - Compare multiple AI-generated responses

Parallel (Variations) → Evaluator (Score Each) → Function (Select Best) → Response

Customer Support Quality Control - Ensure responses meet quality standards

Agent (Support Response) → Evaluator (Score) → Function (Log) → Condition (Review if Low)

Outputs

  • <evaluator.content>: Summary of the evaluation with scores
  • <evaluator.model>: Model used for evaluation
  • <evaluator.tokens>: Token usage statistics
  • <evaluator.cost>: Estimated evaluation cost

Best Practices

  • Use specific metric descriptions: Clearly define what each metric measures to get more accurate evaluations
  • Choose appropriate ranges: Select scoring ranges that provide enough granularity without being overly complex
  • Connect with Agent blocks: Use Evaluator blocks to assess Agent block outputs and create feedback loops
  • Use consistent metrics: For comparative analysis, maintain consistent metrics across similar evaluations
  • Combine multiple metrics: Use several metrics to get a comprehensive evaluation
On this page

On this page

Start building today
Trusted by over 60,000 builders.
Build Agentic workflows visually on a drag-and-drop canvas or with natural language.
Get started