Docs/Advanced/Experiments

Experiments

Run structured experiments combining datasets, eval suites, and prompt versions for systematic agent testing.

import { Invariance } from '@invariance/sdk';
Prerequisites: Initialization

Overview

Experiments bring together datasets, eval suites, and optional prompt versions into a single reproducible test run. You specify a dataset version (for inputs), an eval suite (for assertions/judges), and optionally a prompt version, then run the experiment to get scored results.

Compare two experiments to see how changes in prompts, models, or agent logic affect eval metrics.

Quick Example

Create and run an experimenttypescript
const inv = Invariance.init({ apiKey: process.env.INVARIANCE_API_KEY! });

const exp = await inv.experiments.create({
  name: 'Prompt A/B Test',
  dataset_id: 'ds-123',
  dataset_version: 1,
  suite_id: 'suite-456',
  prompt_version_id: 'pv-789',
});

// Run the experiment
const result = await inv.experiments.run(exp.id);
console.log(result.status); // 'completed'

Type Definitions

Experiment
interface Experiment {
  id: string;
  name: string;
  dataset_id: string;
  dataset_version: number;
  suite_id: string;
  prompt_version_id: string | null;
  status: 'pending' | 'running' | 'completed' | 'failed';
  run_id: string | null;
  config: Record<string, unknown>;
  created_at: string;
  completed_at: string | null;
}
A structured experiment combining dataset, suite, and optional prompt version.

API Reference

experiments.list
List experiments with optional filters.
async list(opts?: { suite_id?: string; dataset_id?: string; status?: string }): Promise<Experiment[]>
ReturnsPromise<Experiment[]>
experiments.create
Create a new experiment.
async create(body: CreateExperimentBody): Promise<Experiment>
Parameters
namestringExperiment name
dataset_idstringDataset ID
dataset_versionnumberPublished dataset version number
suite_idstringEval suite ID
prompt_version_idstringOptional prompt version to pin
ReturnsPromise<Experiment>
experiments.run
Execute an experiment.
async run(id: string): Promise<Experiment>
Parameters
idstringExperiment ID
ReturnsPromise<Experiment>
experiments.compare
Compare two experiment results.
async compare(expA: string, expB: string): Promise<ExperimentCompareResult>
Parameters
expAstringFirst experiment ID
expBstringSecond experiment ID
ReturnsPromise<ExperimentCompareResult>
experiments.delete
Delete an experiment.
async delete(id: string): Promise<void>
Parameters
idstringExperiment ID
ReturnsPromise<void>

Use Cases

  • A/B test different prompt versions against the same dataset
  • Compare agent performance across model changes
  • Track experiment history for systematic improvement
On this page
OverviewQuick ExampleType DefinitionsAPI ReferenceUse CasesRelated Modules