Evaluator Block YAML Schema

Schema Definition

type: object
required:
  - type
  - name
  - inputs
properties:
  type:
    type: string
    enum: [evaluator]
    description: Block type identifier
  name:
    type: string
    description: Display name for this evaluator block
  inputs:
    type: object
    required:
      - content
      - metrics
      - model
      - apiKey
    properties:
      content:
        type: string
        description: Content to evaluate (can reference other blocks)
      metrics:
        type: array
        description: Evaluation criteria and scoring ranges
        items:
          type: object
          properties:
            name:
              type: string
              description: Metric identifier
            description:
              type: string
              description: Detailed explanation of what the metric measures
            range:
              type: object
              properties:
                min:
                  type: number
                  description: Minimum score value
                max:
                  type: number
                  description: Maximum score value
              required: [min, max]
              description: Scoring range with numeric bounds
      model:
        type: string
        description: AI model identifier (e.g., gpt-4o, claude-3-5-sonnet-20241022)
      apiKey:
        type: string
        description: API key for the model provider (use {{ENV_VAR}} format)
      temperature:
        type: number
        minimum: 0
        maximum: 2
        description: Model temperature for evaluation
        default: 0.3
      azureEndpoint:
        type: string
        description: Azure OpenAI endpoint URL (required for Azure models)
      azureApiVersion:
        type: string
        description: Azure API version (required for Azure models)
  connections:
    type: object
    properties:
      success:
        type: string
        description: Target block ID for successful evaluation
      error:
        type: string
        description: Target block ID for error handling

Connection Configuration

Connections define where the workflow goes based on evaluation results:

connections:
  success: <string>                     # Target block ID for successful evaluation
  error: <string>                       # Target block ID for error handling (optional)

Examples

Content Quality Evaluation

content-evaluator:
  type: evaluator
  name: "Content Quality Evaluator"
  inputs:
    content: <content-generator.content>
    metrics:
      - name: "accuracy"
        description: "How factually accurate is the content?"
        range:
          min: 1
          max: 5
      - name: "clarity"
        description: "How clear and understandable is the content?"
        range:
          min: 1
          max: 5
      - name: "relevance"
        description: "How relevant is the content to the original query?"
        range:
          min: 1
          max: 5
      - name: "completeness"
        description: "How complete and comprehensive is the content?"
        range:
          min: 1
          max: 5
    model: gpt-4o
    temperature: 0.2
    apiKey: '{{OPENAI_API_KEY}}'
  connections:
    success: quality-report
    error: evaluation-error

Customer Response Evaluation

response-evaluator:
  type: evaluator
  name: "Customer Response Evaluator"
  inputs:
    content: <customer-agent.content>
    metrics:
      - name: "helpfulness"
        description: "How helpful is the response in addressing the customer's needs?"
        range:
          min: 1
          max: 10
      - name: "tone"
        description: "How appropriate and professional is the tone?"
        range:
          min: 1
          max: 10
      - name: "completeness"
        description: "Does the response fully address all aspects of the inquiry?"
        range:
          min: 1
          max: 10
    model: claude-3-5-sonnet-20241022
    apiKey: '{{ANTHROPIC_API_KEY}}'
  connections:
    success: response-processor

A/B Testing Evaluation

ab-test-evaluator:
  type: evaluator
  name: "A/B Test Evaluator"
  inputs:
    content: |
      Version A: <version-a.content>
      Version B: <version-b.content>
      
      Compare these two versions for the following criteria.
    metrics:
      - name: "engagement"
        description: "Which version is more likely to engage users?"
        range: "A, B, or Tie"
      - name: "clarity"
        description: "Which version communicates more clearly?"
        range: "A, B, or Tie"
      - name: "persuasiveness"
        description: "Which version is more persuasive?"
        range: "A, B, or Tie"
    model: gpt-4o
    temperature: 0.1
    apiKey: '{{OPENAI_API_KEY}}'
  connections:
    success: test-results

Multi-Dimensional Content Scoring

comprehensive-evaluator:
  type: evaluator
  name: "Comprehensive Content Evaluator"
  inputs:
    content: <ai-writer.content>
    metrics:
      - name: "technical_accuracy"
        description: "How technically accurate and correct is the information?"
        range:
          min: 0
          max: 100
      - name: "readability"
        description: "How easy is the content to read and understand?"
        range:
          min: 0
          max: 100
      - name: "seo_optimization"
        description: "How well optimized is the content for search engines?"
        range:
          min: 0
          max: 100
      - name: "user_engagement"
        description: "How likely is this content to engage and retain readers?"
        range:
          min: 0
          max: 100
      - name: "brand_alignment"
        description: "How well does the content align with brand voice and values?"
        range:
          min: 0
          max: 100
    model: gpt-4o
    temperature: 0.3
    apiKey: '{{OPENAI_API_KEY}}'
  connections:
    success: content-optimization

Output References

After an evaluator block executes, you can reference its outputs:

# In subsequent blocks
next-block:
  inputs:
    evaluation: <evaluator-name.content>     # Evaluation summary
    scores: <evaluator-name.scores>          # Individual metric scores
    overall: <evaluator-name.overall>        # Overall assessment

Best Practices

Define clear, specific evaluation criteria
Use appropriate scoring ranges for your use case
Choose models with strong reasoning capabilities
Use lower temperature for consistent scoring
Include detailed metric descriptions
Test with diverse content types
Consider multiple evaluators for complex assessments

Evaluator Block YAML Schema

On this page