CLI¶

hotcoco ships two CLI tools:

coco — Python CLI. Installed with pip install hotcoco. Covers dataset management (filter, merge, split, sample, stats) and is the primary tool for most workflows.
coco-eval — Rust CLI. Installed with cargo install hotcoco-cli. Evaluation only, no Python required.

`coco` — Python CLI¶

pip install hotcoco

JSON output mode¶

Every subcommand accepts a --json flag that writes a single JSON object to stdout instead of human-readable text. stderr (progress, warnings, errors) is untouched.

coco eval --gt ann.json --dt det.json --json
coco stats ann.json --json
coco healthcheck ann.json --json

This is designed for CI/CD pipelines, dashboards, and shell scripts that need to gate on metric values without parsing human output:

# Gate a CI step on AP ≥ 0.50
AP=$(coco eval --gt ann.json --dt det.json --json | jq '.metrics.AP')
python -c "import sys; sys.exit(0 if $AP >= 0.50 else 1)"

When --json is set and an error occurs, the exit code is still 1 and the error is also JSON:

{"error": "No such file or directory (os error 2)"}

`coco eval`¶

Evaluate detections against ground truth annotations. Prints the standard COCO metrics table.

coco eval --gt <gt.json> --dt <dt.json> [options]

Flag	Description	Default
`--gt <path>`	Ground truth annotations JSON	required
`--dt <path>`	Detection results JSON	required
`--iou-type`	`bbox`, `segm`, or `keypoints`	`bbox`
`--lvis`	LVIS-style evaluation (max 300 dets, frequency-group AP)	off
`--img-ids 1,2,3`	Evaluate only these image IDs	all
`--cat-ids 1,2,3`	Evaluate only these category IDs	all
`--no-cats`	Pool all categories (class-agnostic evaluation)	off
`--tide`	Print TIDE error decomposition after standard metrics	off
`--tide-pos-thr`	IoU threshold for TP/FP classification in TIDE	`0.5`
`--tide-bg-thr`	Minimum IoU with any GT for Loc/Both/Bkg distinction	`0.1`
`--diagnostics`	Per-image diagnostics: F1 distribution and label error candidates	off
`--diag-iou-thr`	IoU threshold for diagnostics TP/FP classification	`0.5`
`--diag-score-thr`	Min detection score for label error candidates	`0.5`
`--report <path>`	Save a PDF evaluation report to this path (requires `hotcoco[plot]`)	off
`--title`	Report title shown in the header	`COCO Evaluation Report`
`--slices <path>`	JSON file with named image ID groups for sliced evaluation	off
`--healthcheck`	Run dataset healthcheck before evaluation (warnings to stderr)	off
`--calibration`	Compute confidence calibration (ECE/MCE) after standard metrics	off
`--cal-bins`	Number of calibration bins	`10`
`--cal-iou-thr`	IoU threshold for calibration TP/FP classification	`0.5`
`--json`	Write results as JSON to stdout instead of human-readable text	off

# Bounding box evaluation
coco eval --gt instances_val2017.json --dt bbox_results.json

# Segmentation
coco eval --gt instances_val2017.json --dt segm_results.json --iou-type segm

# Keypoints
coco eval --gt person_keypoints_val2017.json --dt kpt_results.json --iou-type keypoints

# LVIS-style evaluation
coco eval --gt lvis_val.json --dt lvis_results.json --lvis

# With TIDE error decomposition
coco eval --gt instances_val2017.json --dt bbox_results.json --tide

# TIDE at a stricter localization threshold
coco eval --gt instances_val2017.json --dt bbox_results.json --tide --tide-pos-thr 0.75

# Save a PDF evaluation report
coco eval --gt instances_val2017.json --dt bbox_results.json --report report.pdf

# PDF report with custom title and LVIS-style evaluation
coco eval --gt lvis_val.json --dt lvis_results.json --lvis --report lvis_report.pdf --title "LVIS Evaluation"

# Sliced evaluation (compare metrics across image subsets)
coco eval --gt instances_val2017.json --dt bbox_results.json --slices slices.json

# Pre-flight healthcheck before evaluation
coco eval --gt instances_val2017.json --dt bbox_results.json --healthcheck

# JSON output for CI/CD pipelines
coco eval --gt instances_val2017.json --dt bbox_results.json --json

# JSON with TIDE and slices combined
coco eval --gt instances_val2017.json --dt bbox_results.json --tide --slices slices.json --json

JSON output shape:

{
  "hotcoco_version": "0.3.0",
  "params": { "iou_type": "Bbox", "iou_thresholds": [...], "area_ranges": {...}, ... },
  "metrics": { "AP": 0.578, "AP50": 0.861, "AP75": 0.600, "APs": 0.327, ... },
  "tide": { "delta_ap": {...}, "counts": {...}, "ap_base": 0.578, ... },
  "slices": { "daytime": { "AP": 0.61, ... }, "_overall": { ... } },
  "healthcheck": { "errors": [], "warnings": [] }
}

tide, slices, and healthcheck keys are only present when the corresponding flag is passed.

`coco healthcheck`¶

Validate a dataset for structural errors, quality warnings, and distribution issues.

coco healthcheck <annotation_file> [--dt <detections.json>]

Flag	Description
`--dt <path>`	Detection results JSON — enables GT/DT compatibility checks
`--json`	Write results as JSON to stdout

# Dataset only
coco healthcheck instances_val2017.json

# With detections (also checks GT/DT compatibility)
coco healthcheck instances_val2017.json --dt bbox_results.json

# JSON output (full errors/warnings list + summary)
coco healthcheck instances_val2017.json --json

`coco stats`¶

Print a health-check summary of a dataset: image and annotation counts, per-category breakdown, image dimensions, and annotation area distribution.

coco stats instances_val2017.json
coco stats instances_val2017.json --all-cats  # show all categories, not just top 20
coco stats instances_val2017.json --json       # machine-readable output

`coco filter`¶

Subset a dataset by category, image ID, or annotation area.

coco filter <file> -o <output> [options]

Flag	Description
`--cat-ids 1,2,3`	Keep only these category IDs
`--img-ids 1,2,3`	Keep only these image IDs
`--area-rng MIN,MAX`	Keep annotations within this area range (inclusive)
`--keep-empty-images`	Preserve images with no matching annotations
`-o / --output`	Output JSON path (required)
`--json`	Write before/after counts as JSON to stdout

# Keep only "person" (category 1)
coco filter instances_val2017.json --cat-ids 1 -o person.json

# Medium-sized objects only
coco filter instances_val2017.json --area-rng 1024,9216 -o medium.json

# JSON output: {"before": {"images": 5000, ...}, "after": {...}, "output": "..."}
coco filter instances_val2017.json --cat-ids 1 -o person.json --json

`coco split`¶

Split a dataset into train/val (or train/val/test) subsets. Writes separate JSON files for each split.

coco split <file> -o <prefix> [options]

Flag	Description	Default
`--val-frac`	Fraction of images for validation	`0.2`
`--test-frac`	Fraction for a test set (omit for two-way split)	—
`--seed`	Random seed for reproducibility	`42`
`-o / --output`	Output prefix	(required)
`--json`	Write per-split counts as JSON to stdout	off

Writes <prefix>_train.json, <prefix>_val.json, and optionally <prefix>_test.json.

# 80/20 split
coco split person.json -o splits/person --val-frac 0.2

# 70/15/15 split
coco split person.json -o splits/person --val-frac 0.15 --test-frac 0.15

`coco merge`¶

Combine multiple annotation files into one. All files must share the same category taxonomy.

coco merge <file1> <file2> [<file3> ...] -o <output>

coco merge batch1.json batch2.json batch3.json -o combined.json

# JSON output: input list with per-file counts + output counts
coco merge batch1.json batch2.json -o combined.json --json

`coco sample`¶

Draw a random subset of images (with their annotations).

coco sample <file> -o <output> [options]

Flag	Description
`--n N`	Number of images to sample
`--frac F`	Fraction of images to sample
`--seed`	Random seed (default `42`)
`-o / --output`	Output JSON path (required)
`--json`	Write before/after counts as JSON to stdout

# Sample 500 images
coco sample instances_val2017.json --n 500 --seed 0 -o sample.json

# Sample 10% of the dataset
coco sample instances_val2017.json --frac 0.1 -o sample.json

`coco explore`¶

Launch a local dataset browser to explore a dataset interactively. Requires pip install hotcoco[browse].

coco explore --gt <annotations.json> --images <images_dir/> [options]

Flag	Description	Default
`--gt <path>`	Ground truth annotation JSON	required
`--images <dir>`	Directory containing image files	required
`--dt <path>`	Detection results JSON (enables detection overlay)	off
`--batch-size N`	Images loaded per batch	`12`
`--port N`	Local server port	`7860`

coco explore --gt instances_val2017.json --images /data/coco/val2017/

# With detection overlay
coco explore --gt instances_val2017.json --images /data/images/ --dt results.json

# Custom port
coco explore --gt instances_val2017.json --images /data/images/ --port 7861

Opens a sidebar with category filter and shuffle. Click any thumbnail to open a full-resolution lightbox with canvas annotation overlay. See the Dataset Browser guide.

`coco compare`¶

Compare two model evaluations on the same dataset with per-metric deltas, per-category breakdown, and optional bootstrap confidence intervals.

coco compare --gt <annotations.json> --dt-a <model_a.json> --dt-b <model_b.json> [options]

Flag	Description	Default
`--gt`	Ground truth annotations (COCO JSON)	required
`--dt-a`	Detections from model A	required
`--dt-b`	Detections from model B	required
`--iou-type`	`bbox`, `segm`, or `keypoints`	`bbox`
`--lvis`	LVIS-style federated evaluation	off
`--bootstrap N`	Bootstrap samples for confidence intervals	`0` (disabled)
`--seed`	Random seed for bootstrap	`42`
`--confidence`	Confidence level for CIs	`0.95`
`--name-a`	Display name for model A	`Model A`
`--name-b`	Display name for model B	`Model B`
`--json`	JSON output for CI/CD pipelines	off

# Basic comparison
coco compare --gt ann.json --dt-a baseline.json --dt-b improved.json

# With bootstrap CIs
coco compare --gt ann.json --dt-a a.json --dt-b b.json --bootstrap 1000

# JSON output for CI/CD
coco compare --gt ann.json --dt-a a.json --dt-b b.json --bootstrap 1000 --json

`coco convert`¶

Convert between annotation formats. Supports COCO JSON ↔ YOLO labels, COCO JSON ↔ Pascal VOC XML, and COCO JSON ↔ CVAT for Images XML.

COCO → YOLO:

coco convert --from coco --to yolo --input <annotations.json> --output <labels_dir/>

YOLO → COCO:

coco convert --from yolo --to coco --input <labels_dir/> --output <annotations.json> [--images-dir <images/>]

COCO → Pascal VOC:

coco convert --from coco --to voc --input <annotations.json> --output <voc_dir/>

Pascal VOC → COCO:

coco convert --from voc --to coco --input <voc_dir/> --output <annotations.json>

COCO → CVAT:

coco convert --from coco --to cvat --input <annotations.json> --output <annotations.xml>

CVAT → COCO:

coco convert --from cvat --to coco --input <annotations.xml> --output <annotations.json>

Flag	Description
`--from`	Source format: `coco`, `yolo`, `voc`, or `cvat`
`--to`	Target format: `coco`, `yolo`, `voc`, or `cvat`
`--input`	Input path — JSON file (COCO), label directory (YOLO), annotation directory (VOC), or XML file (CVAT)
`--output`	Output path — label directory (YOLO), annotation directory (VOC), XML file (CVAT), or JSON file (COCO)
`--images-dir`	(YOLO → COCO only) Directory of source images; used by Pillow to populate `width`/`height` on each image record. Requires `pip install Pillow`.
`--json`	Write conversion stats as JSON to stdout

# Export val2017 to YOLO labels
coco convert --from coco --to yolo \
    --input instances_val2017.json \
    --output labels/val2017/

# Import YOLO labels back (with image dims)
coco convert --from yolo --to coco \
    --input labels/val2017/ \
    --output reconstructed.json \
    --images-dir images/val2017/

# Export to Pascal VOC
coco convert --from coco --to voc \
    --input instances_val2017.json \
    --output voc_output/

# Import Pascal VOC
coco convert --from voc --to coco \
    --input VOCdevkit/VOC2012/ \
    --output voc2012_as_coco.json

# Export to CVAT
coco convert --from coco --to cvat \
    --input instances_val2017.json \
    --output annotations.xml

# Import CVAT
coco convert --from cvat --to coco \
    --input annotations.xml \
    --output cvat_as_coco.json

`coco-eval` — Rust CLI¶

Evaluation only. No Python required — useful in environments where installing a Python package isn't practical.

cargo install hotcoco-cli

Usage¶

coco-eval --gt annotations.json --dt detections.json --iou-type bbox

Options¶

Flag	Description	Default
`--gt <path>`	Path to ground truth annotations JSON file	required
`--dt <path>`	Path to detection results JSON file	required
`--iou-type <type>`	Evaluation type: `bbox`, `segm`, or `keypoints`	`bbox`
`--img-ids <ids>`	Filter to specific image IDs (comma-separated)	all images
`--cat-ids <ids>`	Filter to specific category IDs (comma-separated)	all categories
`--no-cats`	Pool all categories (disable per-category evaluation)	off
`-o / --output <path>`	Write evaluation results to a JSON file	off

Examples¶

# Bounding box evaluation
coco-eval --gt instances_val2017.json --dt bbox_results.json --iou-type bbox

# Segmentation evaluation
coco-eval --gt instances_val2017.json --dt segm_results.json --iou-type segm

# Keypoint evaluation
coco-eval --gt person_keypoints_val2017.json --dt kpt_results.json --iou-type keypoints

# Filter to specific categories
coco-eval --gt instances_val2017.json --dt results.json --cat-ids 1,3

# Category-agnostic evaluation
coco-eval --gt instances_val2017.json --dt results.json --no-cats

# Save results as JSON (includes per-category AP)
coco-eval --gt instances_val2017.json --dt bbox_results.json --output results.json

Output¶

The standard 12 COCO metrics (10 for keypoints):

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.783
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.971
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.849
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.621
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.893
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.988
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.502
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.835
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.854
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.701
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.935
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.997

Shell completions¶

Both CLIs support tab completion for flags, subcommands, and values.

`coco` (Python)¶

Install argcomplete:

pip install "hotcoco[completions]"

Then register the completion for your shell. The one-time setup depends on your shell:

bashzshfish

Add to ~/.bashrc:

eval "$(register-python-argcomplete coco)"

Add to ~/.zshrc:

autoload -U bashcompinit && bashcompinit
eval "$(register-python-argcomplete coco)"

register-python-argcomplete --shell fish coco | source

After restarting your shell (or sourcing the config), coco <TAB> completes subcommands and coco eval --<TAB> completes flags.

`coco-eval` (Rust)¶

coco-eval --completions <SHELL> prints a completion script to stdout. Pipe it to the right location for your shell:

bashzshfish

coco-eval --completions bash > ~/.bash_completion.d/coco-eval
# or for system-wide:
coco-eval --completions bash | sudo tee /etc/bash_completion.d/coco-eval

Then add to ~/.bashrc if not already sourcing ~/.bash_completion.d/:

source ~/.bash_completion.d/coco-eval

mkdir -p ~/.zsh/completions
coco-eval --completions zsh > ~/.zsh/completions/_coco-eval

Make sure ~/.zsh/completions is on your fpath in ~/.zshrc:

fpath=(~/.zsh/completions $fpath)
autoload -U compinit && compinit

coco-eval --completions fish > ~/.config/fish/completions/coco-eval.fish

Supported shells: bash, zsh, fish, elvish, powershell.

CLI¶

coco — Python CLI¶

JSON output mode¶

coco eval¶

coco healthcheck¶

coco stats¶

coco filter¶

coco split¶

coco merge¶

coco sample¶

coco explore¶

coco compare¶

coco convert¶

coco-eval — Rust CLI¶