Datasets
Manage versioned test datasets for eval runs. Create from scratch, import from traces, and publish immutable versions.
import { Invariance } from '@invariance/sdk';
Overview
Datasets provide structured test inputs for eval runs and experiments. Each dataset contains rows with input/expected pairs and optional tags.
Datasets can be created manually, imported from existing trace data, or promoted from eval run comparisons. They support versioning — you work on a draft and publish immutable versions for reproducible testing.
Architecture
Datasets are stored in the database with rows linked by dataset_id. Publishing creates a snapshot (version) that locks the current rows. Experiments reference a specific dataset version to ensure reproducibility.
Quick Example
const inv = Invariance.init({ apiKey: process.env.INVARIANCE_API_KEY! });
// Create a dataset
const ds = await inv.datasets.create({
name: 'Customer Support Cases',
agent_id: 'support-agent',
});
// Add test rows
await inv.datasets.addRows(ds.id, [
{ input: { query: 'How do I reset my password?' }, expected: { intent: 'password_reset' }, tags: ['auth'] },
{ input: { query: 'What are your business hours?' }, expected: { intent: 'hours' }, tags: ['general'] },
]);
// Publish immutable version
const version = await inv.datasets.publish(ds.id, { notes: 'Initial test set' });
Type Definitions
Dataset
interface Dataset {
id: string;
name: string;
description: string | null;
agent_id: string | null;
row_count: number;
current_draft_version: number | null;
latest_published_version: number | null;
created_at: string;
}
A test dataset with rows and versions.
DatasetRow
interface DatasetRow {
id: string;
dataset_id: string;
input: Record<string, unknown>;
expected: Record<string, unknown>;
tags: string[];
source_trace_node_id: string | null;
created_at: string;
}
A single test case row in a dataset.
DatasetVersion
interface DatasetVersion {
id: string;
dataset_id: string;
version: number;
notes: string | null;
row_count: number;
created_at: string;
}
An immutable published version of a dataset.
API Reference
datasets.list
List all datasets.
async list(opts?: { agent_id?: string }): Promise<Dataset[]>
Parameters
agent_idstringFilter by agent
ReturnsPromise<Dataset[]>
datasets.create
Create a new dataset.
async create(body: CreateDatasetBody): Promise<Dataset>
Parameters
namestringDataset name
descriptionstringDescription
agent_idstringScope to agent
ReturnsPromise<Dataset>
datasets.addRows
Add one or more rows to a dataset.
async addRows(id: string, body: CreateDatasetRowBody | CreateDatasetRowBody[]): Promise<DatasetRow[]>
Parameters
idstringDataset ID
bodyCreateDatasetRowBody | CreateDatasetRowBody[]Row(s) with input, expected, tags
ReturnsPromise<DatasetRow[]>
datasets.listRows
List rows in a dataset with pagination.
async listRows(id: string, opts?: { limit?: number; offset?: number; tags?: string }): Promise<DatasetRow[]>
Parameters
idstringDataset ID
limitnumberMax rows to return
offsetnumberSkip rows
ReturnsPromise<DatasetRow[]>
datasets.publish
Publish the current draft as an immutable version.
async publish(id: string, body?: { notes?: string }): Promise<DatasetVersion>
Parameters
idstringDataset ID
notesstringVersion notes
ReturnsPromise<DatasetVersion>
datasets.createFromTraces
Create a new dataset by importing rows from existing trace sessions.
async createFromTraces(body: DatasetFromTracesBody): Promise<Dataset>
Parameters
session_idsstring[]Sessions to import from
agent_idstringAgent ID
namestringNew dataset name
ReturnsPromise<Dataset>
datasets.importTraces
Import trace data as rows into an existing dataset.
async importTraces(id: string, body: ImportDatasetRowsFromTracesBody): Promise<DatasetRow[]>
Parameters
idstringTarget dataset ID
session_idsstring[]Sessions to import from
agent_idstringAgent ID
ReturnsPromise<DatasetRow[]>
Use Cases
- Create hand-curated test datasets for regression testing
- Import real production trace data to build realistic test sets
- Publish immutable dataset versions for reproducible eval runs
- Tag rows for selective testing (e.g., only "auth" or "billing" cases)