Evaluator Block YAML 架构
Evaluator Block 的 YAML 配置参考
架构定义
type: object
required:
- type
- name
- inputs
properties:
type:
type: string
enum: [evaluator]
description: Block type identifier
name:
type: string
description: Display name for this evaluator block
inputs:
type: object
required:
- content
- metrics
- model
- apiKey
properties:
content:
type: string
description: Content to evaluate (can reference other blocks)
metrics:
type: array
description: Evaluation criteria and scoring ranges
items:
type: object
properties:
name:
type: string
description: Metric identifier
description:
type: string
description: Detailed explanation of what the metric measures
range:
type: object
properties:
min:
type: number
description: Minimum score value
max:
type: number
description: Maximum score value
required: [min, max]
description: Scoring range with numeric bounds
model:
type: string
description: AI model identifier (e.g., gpt-4o, claude-3-5-sonnet-20241022)
apiKey:
type: string
description: API key for the model provider (use {{ENV_VAR}} format)
temperature:
type: number
minimum: 0
maximum: 2
description: Model temperature for evaluation
default: 0.3
azureEndpoint:
type: string
description: Azure OpenAI endpoint URL (required for Azure models)
azureApiVersion:
type: string
description: Azure API version (required for Azure models)
connections:
type: object
properties:
success:
type: string
description: Target block ID for successful evaluation
error:
type: string
description: Target block ID for error handling
连接配置
连接定义了根据评估结果工作流的走向:
connections:
success: <string> # Target block ID for successful evaluation
error: <string> # Target block ID for error handling (optional)
示例
内容质量评估
content-evaluator:
type: evaluator
name: "Content Quality Evaluator"
inputs:
content: <content-generator.content>
metrics:
- name: "accuracy"
description: "How factually accurate is the content?"
range:
min: 1
max: 5
- name: "clarity"
description: "How clear and understandable is the content?"
range:
min: 1
max: 5
- name: "relevance"
description: "How relevant is the content to the original query?"
range:
min: 1
max: 5
- name: "completeness"
description: "How complete and comprehensive is the content?"
range:
min: 1
max: 5
model: gpt-4o
temperature: 0.2
apiKey: '{{OPENAI_API_KEY}}'
connections:
success: quality-report
error: evaluation-error
客户反馈评估
response-evaluator:
type: evaluator
name: "Customer Response Evaluator"
inputs:
content: <customer-agent.content>
metrics:
- name: "helpfulness"
description: "How helpful is the response in addressing the customer's needs?"
range:
min: 1
max: 10
- name: "tone"
description: "How appropriate and professional is the tone?"
range:
min: 1
max: 10
- name: "completeness"
description: "Does the response fully address all aspects of the inquiry?"
range:
min: 1
max: 10
model: claude-3-5-sonnet-20241022
apiKey: '{{ANTHROPIC_API_KEY}}'
connections:
success: response-processor
A/B 测试评估
ab-test-evaluator:
type: evaluator
name: "A/B Test Evaluator"
inputs:
content: |
Version A: <version-a.content>
Version B: <version-b.content>
Compare these two versions for the following criteria.
metrics:
- name: "engagement"
description: "Which version is more likely to engage users?"
range: "A, B, or Tie"
- name: "clarity"
description: "Which version communicates more clearly?"
range: "A, B, or Tie"
- name: "persuasiveness"
description: "Which version is more persuasive?"
range: "A, B, or Tie"
model: gpt-4o
temperature: 0.1
apiKey: '{{OPENAI_API_KEY}}'
connections:
success: test-results
多维内容评分
comprehensive-evaluator:
type: evaluator
name: "Comprehensive Content Evaluator"
inputs:
content: <ai-writer.content>
metrics:
- name: "technical_accuracy"
description: "How technically accurate and correct is the information?"
range:
min: 0
max: 100
- name: "readability"
description: "How easy is the content to read and understand?"
range:
min: 0
max: 100
- name: "seo_optimization"
description: "How well optimized is the content for search engines?"
range:
min: 0
max: 100
- name: "user_engagement"
description: "How likely is this content to engage and retain readers?"
range:
min: 0
max: 100
- name: "brand_alignment"
description: "How well does the content align with brand voice and values?"
range:
min: 0
max: 100
model: gpt-4o
temperature: 0.3
apiKey: '{{OPENAI_API_KEY}}'
connections:
success: content-optimization
输出参考
在 Evaluator Block 执行后,您可以引用其输出:
# In subsequent blocks
next-block:
inputs:
evaluation: <evaluator-name.content> # Evaluation summary
scores: <evaluator-name.scores> # Individual metric scores
overall: <evaluator-name.overall> # Overall assessment
最佳实践
- 定义清晰、具体的评估标准
- 根据您的使用场景选择适当的评分范围
- 选择具有强大推理能力的模型
- 使用较低的温度以获得一致的评分
- 包含详细的指标描述
- 使用多样化的内容类型进行测试
- 对于复杂的评估,考虑使用多个评估器