Evaluator Block YAML 架构

架构定义

type: object
required:
  - type
  - name
  - inputs
properties:
  type:
    type: string
    enum: [evaluator]
    description: Block type identifier
  name:
    type: string
    description: Display name for this evaluator block
  inputs:
    type: object
    required:
      - content
      - metrics
      - model
      - apiKey
    properties:
      content:
        type: string
        description: Content to evaluate (can reference other blocks)
      metrics:
        type: array
        description: Evaluation criteria and scoring ranges
        items:
          type: object
          properties:
            name:
              type: string
              description: Metric identifier
            description:
              type: string
              description: Detailed explanation of what the metric measures
            range:
              type: object
              properties:
                min:
                  type: number
                  description: Minimum score value
                max:
                  type: number
                  description: Maximum score value
              required: [min, max]
              description: Scoring range with numeric bounds
      model:
        type: string
        description: AI model identifier (e.g., gpt-4o, claude-3-5-sonnet-20241022)
      apiKey:
        type: string
        description: API key for the model provider (use {{ENV_VAR}} format)
      temperature:
        type: number
        minimum: 0
        maximum: 2
        description: Model temperature for evaluation
        default: 0.3
      azureEndpoint:
        type: string
        description: Azure OpenAI endpoint URL (required for Azure models)
      azureApiVersion:
        type: string
        description: Azure API version (required for Azure models)
  connections:
    type: object
    properties:
      success:
        type: string
        description: Target block ID for successful evaluation
      error:
        type: string
        description: Target block ID for error handling

连接配置

连接定义了根据评估结果工作流的走向：

connections:
  success: <string>                     # Target block ID for successful evaluation
  error: <string>                       # Target block ID for error handling (optional)

示例

内容质量评估

content-evaluator:
  type: evaluator
  name: "Content Quality Evaluator"
  inputs:
    content: <content-generator.content>
    metrics:
      - name: "accuracy"
        description: "How factually accurate is the content?"
        range:
          min: 1
          max: 5
      - name: "clarity"
        description: "How clear and understandable is the content?"
        range:
          min: 1
          max: 5
      - name: "relevance"
        description: "How relevant is the content to the original query?"
        range:
          min: 1
          max: 5
      - name: "completeness"
        description: "How complete and comprehensive is the content?"
        range:
          min: 1
          max: 5
    model: gpt-4o
    temperature: 0.2
    apiKey: '{{OPENAI_API_KEY}}'
  connections:
    success: quality-report
    error: evaluation-error

客户反馈评估

response-evaluator:
  type: evaluator
  name: "Customer Response Evaluator"
  inputs:
    content: <customer-agent.content>
    metrics:
      - name: "helpfulness"
        description: "How helpful is the response in addressing the customer's needs?"
        range:
          min: 1
          max: 10
      - name: "tone"
        description: "How appropriate and professional is the tone?"
        range:
          min: 1
          max: 10
      - name: "completeness"
        description: "Does the response fully address all aspects of the inquiry?"
        range:
          min: 1
          max: 10
    model: claude-3-5-sonnet-20241022
    apiKey: '{{ANTHROPIC_API_KEY}}'
  connections:
    success: response-processor

A/B 测试评估

ab-test-evaluator:
  type: evaluator
  name: "A/B Test Evaluator"
  inputs:
    content: |
      Version A: <version-a.content>
      Version B: <version-b.content>
      
      Compare these two versions for the following criteria.
    metrics:
      - name: "engagement"
        description: "Which version is more likely to engage users?"
        range: "A, B, or Tie"
      - name: "clarity"
        description: "Which version communicates more clearly?"
        range: "A, B, or Tie"
      - name: "persuasiveness"
        description: "Which version is more persuasive?"
        range: "A, B, or Tie"
    model: gpt-4o
    temperature: 0.1
    apiKey: '{{OPENAI_API_KEY}}'
  connections:
    success: test-results

多维内容评分

comprehensive-evaluator:
  type: evaluator
  name: "Comprehensive Content Evaluator"
  inputs:
    content: <ai-writer.content>
    metrics:
      - name: "technical_accuracy"
        description: "How technically accurate and correct is the information?"
        range:
          min: 0
          max: 100
      - name: "readability"
        description: "How easy is the content to read and understand?"
        range:
          min: 0
          max: 100
      - name: "seo_optimization"
        description: "How well optimized is the content for search engines?"
        range:
          min: 0
          max: 100
      - name: "user_engagement"
        description: "How likely is this content to engage and retain readers?"
        range:
          min: 0
          max: 100
      - name: "brand_alignment"
        description: "How well does the content align with brand voice and values?"
        range:
          min: 0
          max: 100
    model: gpt-4o
    temperature: 0.3
    apiKey: '{{OPENAI_API_KEY}}'
  connections:
    success: content-optimization

输出参考

在 Evaluator Block 执行后，您可以引用其输出：

# In subsequent blocks
next-block:
  inputs:
    evaluation: <evaluator-name.content>     # Evaluation summary
    scores: <evaluator-name.scores>          # Individual metric scores
    overall: <evaluator-name.overall>        # Overall assessment

最佳实践

定义清晰、具体的评估标准
根据您的使用场景选择适当的评分范围
选择具有强大推理能力的模型
使用较低的温度以获得一致的评分
包含详细的指标描述
使用多样化的内容类型进行测试
对于复杂的评估，考虑使用多个评估器

Evaluator Block YAML 架构

On this page