Configuration¶

Coleman uses YAML configuration files with typed Pydantic v2 models for validation. Configs are loaded via load_spec() (library) or coleman --config (CLI).

For coleman sweep, the same YAML file may also include a top-level sweep section. This section is used only by sweep commands.

Configuration file format¶

A configuration file is a YAML document whose top-level keys map directly to the sections of RunSpec:

# my-experiment.yaml

execution:
  parallel_pool_size: 4
  independent_executions: 30
  seed: 42                     # optional — deterministic RNG seed
  verbose: false
  force_sequential_under_scalene: true

experiment:
  budget:
    mode: ratio
    values: [0.1, 0.5, 0.8]
  datasets_dir: examples
  datasets:
    - alibaba@druid
  experiment_dir: results/experiments/
  rewards:
    - RNFail
    - TimeRank
  policies:
    - UCB
    - FRRMAB
    - LinUCB

# Note: the list above is an example, not an exhaustive list of available policies.
# See the policy API reference for the complete set.

algorithm:
  ucb:
    rnfail:
      c: 0.5
  frrmab:
    window_sizes: [50, 100, 200]
  linucb:
    rnfail:
      alpha: 0.5

hcs_configuration:
  wts_strategy: false

contextual_information:
  config:
    previous_build: [Duration, NumRan, NumErrors]
  feature_group:
    feature_group_name: time_execution
    feature_group_values: [Duration, NumErrors]

results:
  enabled: true
  sink: parquet               # "parquet" (default), "duckdb" or "clickhouse"
  out_dir: ./runs
  batch_size: 1000
  top_k_prioritization: 0    # 0 = hash only
  manifest_enabled: false    # true = write manifest.json under each run_id
  duckdb:
    file_count: 1
    base_name: results
    shard_key: execution_id

checkpoint:
  enabled: true
  interval: 50000
  base_dir: checkpoints

telemetry:
  enabled: false
  otlp_endpoint: http://localhost:4318
  service_name: coleman
  export_interval_millis: 5000

hooks:
  fail_fast: false
  plugins:
    - my_project.hooks.ForecastHook

extensions:
  my_domain:
    forecast_selection:
      policy: ThompsonSampling
      reward: Binary

sweep:
  axes:
    - mode: grid
      params:
        algorithm.ucb.rnfail.c: [0.1, 0.3, 0.5]
        execution.parallel_pool_size: [1, 4]
  seeds: [0, 1, 2]

RunSpec schema reference¶

Every configuration file is validated against RunSpec, which composes the following typed sub-specs:

Section	Model	Key fields
`execution`	`ExecutionSpec`	`parallel_pool_size`, `independent_executions`, `seed`, `verbose`, `force_sequential_under_scalene`
`experiment`	`ExperimentSpec`	`budget`, `datasets_dir`, `datasets`, `experiment_dir`, `rewards`, `policies`
`algorithm`	`AlgorithmSpec`	Free-form nested dict — any algorithm can store its own parameters
`hcs_configuration`	`HCSConfigurationSpec`	`wts_strategy`
`contextual_information`	`ContextualInformationSpec`	`config` (previous build columns), `feature_group`
`results`	`ResultsSpec`	`enabled`, `sink`, `out_dir`, `batch_size`, `top_k_prioritization`, `manifest_enabled`, `duckdb`, `clickhouse`
`checkpoint`	`CheckpointSpec`	`enabled`, `interval`, `base_dir`
`telemetry`	`TelemetrySpec`	`enabled`, `otlp_endpoint`, `service_name`, `export_interval_millis`
`hooks`	`HooksSpec`	`fail_fast`, `plugins`
`extensions`	`dict[str, Any]`	namespaced custom config passthrough

sweep is not part of RunSpec; it is a CLI-level extension consumed by coleman sweep.

Parallelism model¶

Coleman supports two independent parallelism layers:

Intra-run process pool via execution.parallel_pool_size. This controls how many worker processes execute one run's independent_executions.
Inter-spec concurrency via coleman sweep --workers (or API run_many(..., max_workers=...)). This controls how many different resolved specs run at the same time.

When Scalene profiling is active, force_sequential_under_scalene: true forces intra-run pool size to 1 for profiling stability.

Policy and reward name resolution¶

experiment.policies and experiment.rewards are resolved case-insensitively.

Supported wildcard aliases:

*
all

Unknown names are ignored with warnings and reported before execution starts.

Runner hooks and extensions¶

Coleman supports custom domain workflows via two top-level sections:

extensions: namespaced custom config passed to hook contexts.
hooks: lifecycle plugins loaded from dotted paths.

Hook registration¶

hooks:
  fail_fast: false
  plugins:
    - my_project.hooks.ForecastHook
    - my_project.hooks.audit_hook

Supported plugin symbols:

Class (instantiated with no arguments).
Function with signature (event_name, context, payload=None).

Lifecycle events¶

Hook methods are optional. Coleman dispatches these events in order:

on_run_start(context)
on_dataset_start(context)
on_execution_start(context)
on_execution_end(context, execution_result)
on_dataset_end(context, dataset_result)
on_run_end(context, run_result)
on_error(context, error)

on_error is also dispatched when execution startup or environment construction fails, using the most specific available context.

Execution contract (sequential vs parallel)¶

on_run_* and on_dataset_* execute in the coordinator process.
on_execution_* execute in the worker process context.
In sequential mode, worker and coordinator are the same process, but the event contract is unchanged.
Hook plugin loading in workers is path-based and process-local to keep multiprocessing pickle-safe.

Error handling (`fail_fast`)¶

fail_fast: true (default): hook exceptions stop execution.
fail_fast: false: hook exceptions are logged with run/dataset/execution identifiers and execution continues.

Hook context and payloads¶

HookContext includes stable identifiers and execution metadata:

run_id
dataset_id
execution_id
worker_id
parallel_mode
iteration, trials, budget_mode, budget_value
extensions

Event payloads:

ExecutionResult(status, duration_seconds)
DatasetResult(status, executions, duration_seconds)
RunResult(status, datasets, duration_seconds)

Lifecycle diagram¶

flowchart TD
    A[on_run_start] --> B[on_dataset_start]
    B --> C[on_execution_start]
    C --> D[runner execution]
    D --> E[on_execution_end]
    E --> F[on_dataset_end]
    F --> G{more datasets?}
    G -- yes --> B
    G -- no --> H[on_run_end]
    D -. exception .-> I[on_error]

Minimal end-to-end config¶

packs:
  - execution/default
  - experiment/alibaba_druid
  - algorithm/defaults
  - reward/rnfail
  - results/parquet

execution:
  independent_executions: 10
  parallel_pool_size: 4

hooks:
  fail_fast: false
  plugins:
    - my_project.hooks.ForecastHook

extensions:
  my_domain:
    forecast_selection:
      policy: ThompsonSampling
      reward: Binary

For a complete extension guide including custom Policy/Reward and source-level Environment/EvaluationMetric customization paths, see Extensibility & Parallelism.

All fields have sensible defaults. A minimal config only needs to specify the settings you want to override:

# minimal.yaml — everything else uses defaults
experiment:
  datasets: ["alibaba@druid"]
  policies: ["UCB"]
  rewards: ["RNFail"]

Config packs¶

Config packs are small, composable YAML fragments stored under packs/<category>/<name>.yaml. Reference them with the packs key:

# experiment-with-packs.yaml
packs:
  - policy/linucb
  - reward/rnfail
  - results/parquet
  - telemetry/off

experiment:
  datasets: ["alibaba@druid"]

execution:
  independent_executions: 30

Resolution order¶

Packs are loaded left-to-right and deep-merged (nested dicts are merged recursively; scalar values from later packs win).
Inline keys from the user config are applied on top as final overrides.
The merged dict is validated against RunSpec.

Shipped packs¶

Pack	Category	Description
`policy/linucb`	Policy	LinUCB with default alpha values
`reward/rnfail`	Reward	RNFail reward function
`runtime/local`	Runtime	Single-process local execution
`results/parquet`	Results	Parquet sink with defaults
`results/duckdb`	Results	DuckDB sink with consolidated files
`telemetry/off`	Telemetry	Telemetry disabled

DuckDB sink configuration¶

Use results.sink: duckdb to write results directly into one or more DuckDB files.

results:
  enabled: true
  sink: duckdb
  out_dir: ./runs
  batch_size: 1000
  duckdb:
    file_count: 1         # default: single consolidated .duckdb file
    base_name: results    # output: results.duckdb (or results_000.duckdb, ...)
    shard_key: execution_id

file_count allows partitioning writes across multiple DuckDB files when desired.

Creating custom packs¶

Add a YAML file under packs/<category>/<name>.yaml with any subset of RunSpec fields. For example, a custom policy pack:

# packs/policy/my-ucb.yaml
experiment:
  policies:
    - UCB

algorithm:
  ucb:
    rnfail:
      c: 0.3
    timerank:
      c: 0.3

Sweep engine¶

The sweep engine generates multiple RunSpec instances from a single base config by expanding parameter ranges. Two modes are supported:

Grid mode (Cartesian product)¶

Every combination of parameter values is generated:

from coleman.spec import RunSpec, SweepSpec, SweepAxis, expand_sweep

base = RunSpec(experiment={"datasets": ["alibaba@druid"], "policies": ["UCB"]})
sweep_spec = SweepSpec(
    axes=[SweepAxis(mode="grid", params={
        "algorithm.ucb.rnfail.c": [0.1, 0.3, 0.5],
        "execution.parallel_pool_size": [1, 4],
    })],
)
specs = expand_sweep(base, sweep_spec)
# 3 × 2 = 6 specs

Zip mode (paired lists)¶

Parameter lists are paired element-wise. All lists must have equal length — a ValueError is raised otherwise:

sweep_spec = SweepSpec(
    axes=[SweepAxis(mode="zip", params={
        "algorithm.ucb.rnfail.c": [0.1, 0.3, 0.5],
        "algorithm.ucb.timerank.c": [0.2, 0.4, 0.6],
    })],
)
specs = expand_sweep(base, sweep_spec)
# 3 paired specs

Seed replication¶

Add seeds to replicate each generated spec once per seed. The seed is stored on execution.seed and affects the run_id:

sweep_spec = SweepSpec(
    axes=[SweepAxis(mode="grid", params={"algorithm.ucb.rnfail.c": [0.1, 0.5]})],
    seeds=[0, 1, 2],
)
specs = expand_sweep(base, sweep_spec)
# 2 values × 3 seeds = 6 specs, each with a unique run_id

CLI sweep¶

# Uses top-level sweep section from base.yaml (if present)
coleman sweep --config base.yaml --workers 4

# Add/override an extra grid dimension from CLI
coleman sweep --config base.yaml \
    --grid algorithm.ucb.rnfail.c=0.1,0.3,0.5 \
    --grid execution.seed=range(0,20) \
    --workers 4

# Dry-run to preview generated specs
coleman sweep --config base.yaml \
    --grid execution.seed=range(0,5) \
    --dry-run

If both YAML sweep.axes and CLI --grid are provided, Coleman merges them and computes the Cartesian product across all axes.

If both sources define the same dotted key, the sweep is rejected with a ValueError instead of silently creating duplicate combinations.

Deterministic run_id¶

Every RunSpec produces a deterministic 12-character identifier:

run_id = sha256(canonical_json(resolved_spec))[:12]

The canonical JSON uses sorted keys and compact separators so that the same logical config always yields the same run_id, regardless of field insertion order.

Artifacts are written to <out_dir>/<run_id>/:

./runs/
  ddd8bbefa143/
    spec.resolved.json     # fully resolved RunSpec
    provenance.json        # git commit, Python version, uv.lock hash
    results/               # experiment results
    checkpoints/           # crash-recovery state

Provenance¶

Each run persists provenance metadata alongside results:

File	Contents
`spec.resolved.json`	The fully resolved `RunSpec` as canonical JSON
`provenance.json`	Git commit hash, dirty flag, Python version, `uv.lock` hash

This enables exact experiment reproduction: given the same spec.resolved.json, the same run_id and outputs are produced (within documented constraints such as floating-point non-determinism across platforms).

Loading and validating configs¶

Library API¶

from coleman.api import load_spec, save_resolved
from coleman.spec.run_id import compute_run_id

# Load with pack resolution
spec = load_spec("my-experiment.yaml", packs_dir="packs")

# Compute run_id
rid = compute_run_id(spec)
print(f"run_id: {rid}")

# Persist resolved config
save_resolved(spec, f"./runs/{rid}/spec.resolved.json")

CLI¶

# Validate a config and print the run_id
coleman validate --config my-experiment.yaml

# Validate and write the resolved spec
coleman validate --config my-experiment.yaml --resolve resolved.json