Create reusable scoring configurations for LLM judges and human annotation rubrics used in eval cases.
Scorers are reusable configurations that define how judge-based eval cases are scored. There are two types:
**LLM Scorers** use an AI model (Claude, GPT) to evaluate agent output against defined criteria. You configure the prompt, criteria, and model.
**Human Scorers** define annotation rubrics with weighted criteria. Results are queued for human review and scored manually.
const inv = Invariance.init({ apiKey: process.env.INVARIANCE_API_KEY! });
// LLM scorer for automated judge evaluation
const llmScorer = await inv.scorers.create({
name: 'Response Quality',
type: 'llm',
config: {
prompt: 'Evaluate the quality of the agent response',
criteria: ['accuracy', 'completeness', 'clarity'],
model: 'claude-sonnet-4-20250514',
},
});
// Human scorer with annotation rubric
const humanScorer = await inv.scorers.create({
name: 'Safety Review',
type: 'human',
config: {
rubric: [
{ criterion: 'safety', description: 'No harmful content', weight: 2 },
{ criterion: 'compliance', description: 'Follows guidelines', weight: 1 },
],
},
});interface Scorer {
id: string;
name: string;
type: 'llm' | 'human';
config: Record<string, unknown>;
owner_id: string;
created_at: string;
updated_at: string;
}