Schéma YAML du bloc évaluateur
Référence de configuration YAML pour les blocs évaluateurs
Définition du schéma
type: object
required:
- type
- name
- inputs
properties:
type:
type: string
enum: [evaluator]
description: Block type identifier
name:
type: string
description: Display name for this evaluator block
inputs:
type: object
required:
- content
- metrics
- model
- apiKey
properties:
content:
type: string
description: Content to evaluate (can reference other blocks)
metrics:
type: array
description: Evaluation criteria and scoring ranges
items:
type: object
properties:
name:
type: string
description: Metric identifier
description:
type: string
description: Detailed explanation of what the metric measures
range:
type: object
properties:
min:
type: number
description: Minimum score value
max:
type: number
description: Maximum score value
required: [min, max]
description: Scoring range with numeric bounds
model:
type: string
description: AI model identifier (e.g., gpt-4o, claude-3-5-sonnet-20241022)
apiKey:
type: string
description: API key for the model provider (use {{ENV_VAR}} format)
temperature:
type: number
minimum: 0
maximum: 2
description: Model temperature for evaluation
default: 0.3
azureEndpoint:
type: string
description: Azure OpenAI endpoint URL (required for Azure models)
azureApiVersion:
type: string
description: Azure API version (required for Azure models)
connections:
type: object
properties:
success:
type: string
description: Target block ID for successful evaluation
error:
type: string
description: Target block ID for error handling
Configuration des connexions
Les connexions définissent où le flux de travail se dirige en fonction des résultats d'évaluation :
connections:
success: <string> # Target block ID for successful evaluation
error: <string> # Target block ID for error handling (optional)
Exemples
Évaluation de la qualité du contenu
content-evaluator:
type: evaluator
name: "Content Quality Evaluator"
inputs:
content: <content-generator.content>
metrics:
- name: "accuracy"
description: "How factually accurate is the content?"
range:
min: 1
max: 5
- name: "clarity"
description: "How clear and understandable is the content?"
range:
min: 1
max: 5
- name: "relevance"
description: "How relevant is the content to the original query?"
range:
min: 1
max: 5
- name: "completeness"
description: "How complete and comprehensive is the content?"
range:
min: 1
max: 5
model: gpt-4o
temperature: 0.2
apiKey: '{{OPENAI_API_KEY}}'
connections:
success: quality-report
error: evaluation-error
Évaluation de la réponse client
response-evaluator:
type: evaluator
name: "Customer Response Evaluator"
inputs:
content: <customer-agent.content>
metrics:
- name: "helpfulness"
description: "How helpful is the response in addressing the customer's needs?"
range:
min: 1
max: 10
- name: "tone"
description: "How appropriate and professional is the tone?"
range:
min: 1
max: 10
- name: "completeness"
description: "Does the response fully address all aspects of the inquiry?"
range:
min: 1
max: 10
model: claude-3-5-sonnet-20241022
apiKey: '{{ANTHROPIC_API_KEY}}'
connections:
success: response-processor
Évaluation par test A/B
ab-test-evaluator:
type: evaluator
name: "A/B Test Evaluator"
inputs:
content: |
Version A: <version-a.content>
Version B: <version-b.content>
Compare these two versions for the following criteria.
metrics:
- name: "engagement"
description: "Which version is more likely to engage users?"
range: "A, B, or Tie"
- name: "clarity"
description: "Which version communicates more clearly?"
range: "A, B, or Tie"
- name: "persuasiveness"
description: "Which version is more persuasive?"
range: "A, B, or Tie"
model: gpt-4o
temperature: 0.1
apiKey: '{{OPENAI_API_KEY}}'
connections:
success: test-results
Notation multidimensionnelle du contenu
comprehensive-evaluator:
type: evaluator
name: "Comprehensive Content Evaluator"
inputs:
content: <ai-writer.content>
metrics:
- name: "technical_accuracy"
description: "How technically accurate and correct is the information?"
range:
min: 0
max: 100
- name: "readability"
description: "How easy is the content to read and understand?"
range:
min: 0
max: 100
- name: "seo_optimization"
description: "How well optimized is the content for search engines?"
range:
min: 0
max: 100
- name: "user_engagement"
description: "How likely is this content to engage and retain readers?"
range:
min: 0
max: 100
- name: "brand_alignment"
description: "How well does the content align with brand voice and values?"
range:
min: 0
max: 100
model: gpt-4o
temperature: 0.3
apiKey: '{{OPENAI_API_KEY}}'
connections:
success: content-optimization
Références de sortie
Après l'exécution d'un bloc évaluateur, vous pouvez référencer ses sorties :
# In subsequent blocks
next-block:
inputs:
evaluation: <evaluator-name.content> # Evaluation summary
scores: <evaluator-name.scores> # Individual metric scores
overall: <evaluator-name.overall> # Overall assessment
Bonnes pratiques
- Définir des critères d'évaluation clairs et spécifiques
- Utiliser des plages de notation appropriées pour votre cas d'utilisation
- Choisir des modèles avec de solides capacités de raisonnement
- Utiliser une température plus basse pour une notation cohérente
- Inclure des descriptions détaillées des métriques
- Tester avec divers types de contenu
- Envisager plusieurs évaluateurs pour des évaluations complexes