Evaluator Block YAML Schema
YAML configuration reference for Evaluator blocks
Schema Definition
type: object
required:
- type
- name
- inputs
properties:
type:
type: string
enum: [evaluator]
description: Block type identifier
name:
type: string
description: Display name for this evaluator block
inputs:
type: object
required:
- content
- metrics
- model
- apiKey
properties:
content:
type: string
description: Content to evaluate (can reference other blocks)
metrics:
type: array
description: Evaluation criteria and scoring ranges
items:
type: object
properties:
name:
type: string
description: Metric identifier
description:
type: string
description: Detailed explanation of what the metric measures
range:
type: object
properties:
min:
type: number
description: Minimum score value
max:
type: number
description: Maximum score value
required: [min, max]
description: Scoring range with numeric bounds
model:
type: string
description: AI model identifier (e.g., gpt-4o, claude-3-5-sonnet-20241022)
apiKey:
type: string
description: API key for the model provider (use {{ENV_VAR}} format)
temperature:
type: number
minimum: 0
maximum: 2
description: Model temperature for evaluation
default: 0.3
azureEndpoint:
type: string
description: Azure OpenAI endpoint URL (required for Azure models)
azureApiVersion:
type: string
description: Azure API version (required for Azure models)
connections:
type: object
properties:
success:
type: string
description: Target block ID for successful evaluation
error:
type: string
description: Target block ID for error handling
Connection Configuration
Connections define where the workflow goes based on evaluation results:
connections:
success: <string> # Target block ID for successful evaluation
error: <string> # Target block ID for error handling (optional)
Examples
Content Quality Evaluation
content-evaluator:
type: evaluator
name: "Content Quality Evaluator"
inputs:
content: <content-generator.content>
metrics:
- name: "accuracy"
description: "How factually accurate is the content?"
range:
min: 1
max: 5
- name: "clarity"
description: "How clear and understandable is the content?"
range:
min: 1
max: 5
- name: "relevance"
description: "How relevant is the content to the original query?"
range:
min: 1
max: 5
- name: "completeness"
description: "How complete and comprehensive is the content?"
range:
min: 1
max: 5
model: gpt-4o
temperature: 0.2
apiKey: '{{OPENAI_API_KEY}}'
connections:
success: quality-report
error: evaluation-error
Customer Response Evaluation
response-evaluator:
type: evaluator
name: "Customer Response Evaluator"
inputs:
content: <customer-agent.content>
metrics:
- name: "helpfulness"
description: "How helpful is the response in addressing the customer's needs?"
range:
min: 1
max: 10
- name: "tone"
description: "How appropriate and professional is the tone?"
range:
min: 1
max: 10
- name: "completeness"
description: "Does the response fully address all aspects of the inquiry?"
range:
min: 1
max: 10
model: claude-3-5-sonnet-20241022
apiKey: '{{ANTHROPIC_API_KEY}}'
connections:
success: response-processor
A/B Testing Evaluation
ab-test-evaluator:
type: evaluator
name: "A/B Test Evaluator"
inputs:
content: |
Version A: <version-a.content>
Version B: <version-b.content>
Compare these two versions for the following criteria.
metrics:
- name: "engagement"
description: "Which version is more likely to engage users?"
range: "A, B, or Tie"
- name: "clarity"
description: "Which version communicates more clearly?"
range: "A, B, or Tie"
- name: "persuasiveness"
description: "Which version is more persuasive?"
range: "A, B, or Tie"
model: gpt-4o
temperature: 0.1
apiKey: '{{OPENAI_API_KEY}}'
connections:
success: test-results
Multi-Dimensional Content Scoring
comprehensive-evaluator:
type: evaluator
name: "Comprehensive Content Evaluator"
inputs:
content: <ai-writer.content>
metrics:
- name: "technical_accuracy"
description: "How technically accurate and correct is the information?"
range:
min: 0
max: 100
- name: "readability"
description: "How easy is the content to read and understand?"
range:
min: 0
max: 100
- name: "seo_optimization"
description: "How well optimized is the content for search engines?"
range:
min: 0
max: 100
- name: "user_engagement"
description: "How likely is this content to engage and retain readers?"
range:
min: 0
max: 100
- name: "brand_alignment"
description: "How well does the content align with brand voice and values?"
range:
min: 0
max: 100
model: gpt-4o
temperature: 0.3
apiKey: '{{OPENAI_API_KEY}}'
connections:
success: content-optimization
Output References
After an evaluator block executes, you can reference its outputs:
# In subsequent blocks
next-block:
inputs:
evaluation: <evaluator-name.content> # Evaluation summary
scores: <evaluator-name.scores> # Individual metric scores
overall: <evaluator-name.overall> # Overall assessment
Best Practices
- Define clear, specific evaluation criteria
- Use appropriate scoring ranges for your use case
- Choose models with strong reasoning capabilities
- Use lower temperature for consistent scoring
- Include detailed metric descriptions
- Test with diverse content types
- Consider multiple evaluators for complex assessments