Evaluator
The Evaluator block uses AI to score and assess content quality using customizable evaluation metrics that you define. Perfect for quality control, A/B testing, and ensuring your AI outputs meet specific standards.

Overview
The Evaluator block enables you to:
Score Content Quality: Use AI to evaluate content against custom metrics with numeric scores
Define Custom Metrics: Create specific evaluation criteria tailored to your use case
Automate Quality Control: Build workflows that automatically assess and filter content
Track Performance: Monitor improvements and consistency over time with objective scoring
How It Works
The Evaluator block processes content through AI-powered assessment:
- Receive Content - Takes input content from previous blocks in your workflow
- Apply Metrics - Evaluates content against your defined custom metrics
- Generate Scores - AI model assigns numeric scores for each metric
- Provide Summary - Returns detailed evaluation with scores and explanations
Configuration Options
Evaluation Metrics
Define custom metrics to evaluate content against. Each metric includes:
- Name: A short identifier for the metric
- Description: A detailed explanation of what the metric measures
- Range: The numeric range for scoring (e.g., 1-5, 0-10)
Example metrics:
Accuracy (1-5): How factually accurate is the content?
Clarity (1-5): How clear and understandable is the content?
Relevance (1-5): How relevant is the content to the original query?
Content
The content to be evaluated. This can be:
- Directly provided in the block configuration
- Connected from another block's output (typically an Agent block)
- Dynamically generated during workflow execution
Model Selection
Choose an AI model to perform the evaluation:
OpenAI: GPT-4o, o1, o3, o4-mini, gpt-4.1 Anthropic: Claude 3.7 Sonnet Google: Gemini 2.5 Pro, Gemini 2.0 Flash Other Providers: Groq, Cerebras, xAI, DeepSeek Local Models: Any model running on Ollama
Recommendation: Use models with strong reasoning capabilities like GPT-4o or Claude 3.7 Sonnet for more accurate evaluations.
API Key
Your API key for the selected LLM provider. This is securely stored and used for authentication.
How It Works
- The Evaluator block takes the provided content and your custom metrics
- It generates a specialized prompt that instructs the LLM to evaluate the content
- The prompt includes clear guidelines on how to score each metric
- The LLM evaluates the content and returns numeric scores for each metric
- The Evaluator block formats these scores as structured output for use in your workflow
Example Use Cases
Content Quality Assessment
Scenario: Evaluate blog post quality before publication
- Agent block generates blog post content
- Evaluator assesses accuracy, readability, and engagement
- Condition block checks if scores meet minimum thresholds
- High scores → Publish, Low scores → Revise and retry
A/B Testing Content
Scenario: Compare multiple AI-generated responses
- Parallel block generates multiple response variations
- Evaluator scores each variation on clarity and relevance
- Function block selects highest-scoring response
- Response block returns the best result
Customer Support Quality Control
Scenario: Ensure support responses meet quality standards
- Support agent generates response to customer inquiry
- Evaluator scores helpfulness, empathy, and accuracy
- Scores logged for training and performance monitoring
- Low scores trigger human review process
Inputs and Outputs
Content: The text or structured data to evaluate
Evaluation Metrics: Custom criteria with scoring ranges
Model: AI model for evaluation analysis
API Key: Authentication for selected LLM provider
evaluator.content: Summary of the evaluation
evaluator.model: Model used for evaluation
evaluator.tokens: Token usage statistics
evaluator.cost: Cost summary for the evaluation call
Metric Scores: Numeric scores for each defined metric
Evaluation Summary: Detailed assessment with explanations
Access: Available in blocks after the evaluator
Best Practices
- Use specific metric descriptions: Clearly define what each metric measures to get more accurate evaluations
- Choose appropriate ranges: Select scoring ranges that provide enough granularity without being overly complex
- Connect with Agent blocks: Use Evaluator blocks to assess Agent block outputs and create feedback loops
- Use consistent metrics: For comparative analysis, maintain consistent metrics across similar evaluations
- Combine multiple metrics: Use several metrics to get a comprehensive evaluation