Skip to content

Benchmarks workflow

The Benchmarks GitHub Actions workflow runs a reduced-size benchmark to sanity-check the Rust acceleration and, on demand, capture a Python-only baseline.

Triggers

  • Scheduled: 0 6 * * 1 (Mondays at 06:00 UTC). Runs the Rust backend sanity check only; the Python baseline job is not executed on scheduled runs.
  • Manual: workflow_dispatch, with a toggle to include the Python backend baseline. The Python-only baseline job runs only on these manual executions when the toggle is enabled.

What it runs

  • Sets up Python 3.13 and installs dependencies with uv.
  • Builds the Rust backend via make rust-build (Rust toolchain installed with dtolnay/rust-toolchain).
  • Executes:
    GSPPY_BACKEND=<rust|python> uv run python benchmarks/bench_support.py \
      --n_tx 5000 --tx_len 6 --vocab 200 --min_support 0.2 --warmup
    
    (Python-only job skips the Rust build.)

Artifacts

Each job uploads: - benchmark_<backend>.log: CLI output from bench_support.py. - benchmark_<backend>.json: Parsed metrics. Fields: - timestamp: UTC ISO string. - backend: rust or python. - n_tx, tx_len, vocab, min_support: benchmark parameters. - python_time_s: observed Python runtime (always present when parsing succeeds). - rust_time_s: observed Rust runtime (Rust job only, when available). - speedup_x: Python time divided by Rust time (Rust job when both timings exist). - improvement_pct: Percent improvement vs Python (Rust job when both timings exist).

How to interpret results

  • Prefer the JSON for automation; use the log for troubleshooting when parsing fails.
  • A higher speedup_x or improvement_pct means the Rust backend is faster relative to Python.
  • If the Rust backend is unavailable or fails, the log will include the Python-only timing; JSON may omit rust_time_s and derived fields.

Adding regression thresholds later

If you add regression detection in the future, consider: - Comparing rust_time_s against a saved baseline or an allowed max (e.g., < 2.0s for this trimmed workload). - Monitoring speedup_x to ensure Rust remains faster than Python (e.g., speedup_x >= 2.0). - Allowing configurable thresholds via workflow inputs so you can tighten/relax gates without code changes. - Failing fast when parsed metrics are missing to avoid silent regressions.

Benchmarks

Benchmark scripts in the benchmarks/ directory measure support-counting performance across backends.

Running benchmarks

Use the provided Makefile targets to run common benchmarks:

make bench-small
make bench-big

You can also run the Python benchmark script directly with custom parameters:

uv run --python .venv/bin/python --no-project \
  python benchmarks/bench_support.py --n_tx 1000000 --tx_len 8 --vocab 50000 --min_support 0.2 --warmup

Adjust parameters to match your hardware. Large benchmarks can be resource-intensive.