Docs/Advanced/Datasets

Datasets

Manage versioned test datasets for eval runs. Create from scratch, import from traces, and publish immutable versions.

import { Invariance } from '@invariance/sdk';
Prerequisites: Initialization

Overview

Datasets provide structured test inputs for eval runs and experiments. Each dataset contains rows with input/expected pairs and optional tags.

Datasets can be created manually, imported from existing trace data, or promoted from eval run comparisons. They support versioning — you work on a draft and publish immutable versions for reproducible testing.

Architecture

Datasets are stored in the database with rows linked by dataset_id. Publishing creates a snapshot (version) that locks the current rows. Experiments reference a specific dataset version to ensure reproducibility.

Quick Example

Create dataset, add rows, publishtypescript
const inv = Invariance.init({ apiKey: process.env.INVARIANCE_API_KEY! });

// Create a dataset
const ds = await inv.datasets.create({
  name: 'Customer Support Cases',
  agent_id: 'support-agent',
});

// Add test rows
await inv.datasets.addRows(ds.id, [
  { input: { query: 'How do I reset my password?' }, expected: { intent: 'password_reset' }, tags: ['auth'] },
  { input: { query: 'What are your business hours?' }, expected: { intent: 'hours' }, tags: ['general'] },
]);

// Publish immutable version
const version = await inv.datasets.publish(ds.id, { notes: 'Initial test set' });

Type Definitions

Dataset
interface Dataset {
  id: string;
  name: string;
  description: string | null;
  agent_id: string | null;
  row_count: number;
  current_draft_version: number | null;
  latest_published_version: number | null;
  created_at: string;
}
A test dataset with rows and versions.
DatasetRow
interface DatasetRow {
  id: string;
  dataset_id: string;
  input: Record<string, unknown>;
  expected: Record<string, unknown>;
  tags: string[];
  source_trace_node_id: string | null;
  created_at: string;
}
A single test case row in a dataset.
DatasetVersion
interface DatasetVersion {
  id: string;
  dataset_id: string;
  version: number;
  notes: string | null;
  row_count: number;
  created_at: string;
}
An immutable published version of a dataset.

API Reference

datasets.list
List all datasets.
async list(opts?: { agent_id?: string }): Promise<Dataset[]>
Parameters
agent_idstringFilter by agent
ReturnsPromise<Dataset[]>
datasets.create
Create a new dataset.
async create(body: CreateDatasetBody): Promise<Dataset>
Parameters
namestringDataset name
descriptionstringDescription
agent_idstringScope to agent
ReturnsPromise<Dataset>
datasets.addRows
Add one or more rows to a dataset.
async addRows(id: string, body: CreateDatasetRowBody | CreateDatasetRowBody[]): Promise<DatasetRow[]>
Parameters
idstringDataset ID
bodyCreateDatasetRowBody | CreateDatasetRowBody[]Row(s) with input, expected, tags
ReturnsPromise<DatasetRow[]>
datasets.listRows
List rows in a dataset with pagination.
async listRows(id: string, opts?: { limit?: number; offset?: number; tags?: string }): Promise<DatasetRow[]>
Parameters
idstringDataset ID
limitnumberMax rows to return
offsetnumberSkip rows
ReturnsPromise<DatasetRow[]>
datasets.publish
Publish the current draft as an immutable version.
async publish(id: string, body?: { notes?: string }): Promise<DatasetVersion>
Parameters
idstringDataset ID
notesstringVersion notes
ReturnsPromise<DatasetVersion>
datasets.createFromTraces
Create a new dataset by importing rows from existing trace sessions.
async createFromTraces(body: DatasetFromTracesBody): Promise<Dataset>
Parameters
session_idsstring[]Sessions to import from
agent_idstringAgent ID
namestringNew dataset name
ReturnsPromise<Dataset>
datasets.importTraces
Import trace data as rows into an existing dataset.
async importTraces(id: string, body: ImportDatasetRowsFromTracesBody): Promise<DatasetRow[]>
Parameters
idstringTarget dataset ID
session_idsstring[]Sessions to import from
agent_idstringAgent ID
ReturnsPromise<DatasetRow[]>

Use Cases

  • Create hand-curated test datasets for regression testing
  • Import real production trace data to build realistic test sets
  • Publish immutable dataset versions for reproducible eval runs
  • Tag rows for selective testing (e.g., only "auth" or "billing" cases)
On this page
OverviewArchitectureQuick ExampleType DefinitionsAPI ReferenceUse CasesRelated Modules