Experiments
Run structured experiments combining datasets, eval suites, and prompt versions for systematic agent testing.
import { Invariance } from '@invariance/sdk';
Overview
Experiments bring together datasets, eval suites, and optional prompt versions into a single reproducible test run. You specify a dataset version (for inputs), an eval suite (for assertions/judges), and optionally a prompt version, then run the experiment to get scored results.
Compare two experiments to see how changes in prompts, models, or agent logic affect eval metrics.
Quick Example
const inv = Invariance.init({ apiKey: process.env.INVARIANCE_API_KEY! });
const exp = await inv.experiments.create({
name: 'Prompt A/B Test',
dataset_id: 'ds-123',
dataset_version: 1,
suite_id: 'suite-456',
prompt_version_id: 'pv-789',
});
// Run the experiment
const result = await inv.experiments.run(exp.id);
console.log(result.status); // 'completed'
Type Definitions
Experiment
interface Experiment {
id: string;
name: string;
dataset_id: string;
dataset_version: number;
suite_id: string;
prompt_version_id: string | null;
status: 'pending' | 'running' | 'completed' | 'failed';
run_id: string | null;
config: Record<string, unknown>;
created_at: string;
completed_at: string | null;
}
A structured experiment combining dataset, suite, and optional prompt version.
API Reference
experiments.list
List experiments with optional filters.
async list(opts?: { suite_id?: string; dataset_id?: string; status?: string }): Promise<Experiment[]>
ReturnsPromise<Experiment[]>
experiments.create
Create a new experiment.
async create(body: CreateExperimentBody): Promise<Experiment>
Parameters
namestringExperiment name
dataset_idstringDataset ID
dataset_versionnumberPublished dataset version number
suite_idstringEval suite ID
prompt_version_idstringOptional prompt version to pin
ReturnsPromise<Experiment>
experiments.run
Execute an experiment.
async run(id: string): Promise<Experiment>
ReturnsPromise<Experiment>
experiments.compare
Compare two experiment results.
async compare(expA: string, expB: string): Promise<ExperimentCompareResult>
Parameters
expAstringFirst experiment ID
expBstringSecond experiment ID
ReturnsPromise<ExperimentCompareResult>
experiments.delete
Delete an experiment.
async delete(id: string): Promise<void>
ReturnsPromise<void>
Use Cases
- A/B test different prompt versions against the same dataset
- Compare agent performance across model changes
- Track experiment history for systematic improvement