Skip to content

CLI

hotcoco ships two CLI tools:

  • coco — Python CLI. Installed with pip install hotcoco. Covers dataset management (filter, merge, split, sample, stats) and is the primary tool for most workflows.
  • coco-eval — Rust CLI. Installed with cargo install hotcoco-cli. Evaluation only, no Python required.

coco — Python CLI

pip install hotcoco

JSON output mode

Every subcommand accepts a --json flag that writes a single JSON object to stdout instead of human-readable text. stderr (progress, warnings, errors) is untouched.

coco eval --gt ann.json --dt det.json --json
coco stats ann.json --json
coco healthcheck ann.json --json

This is designed for CI/CD pipelines, dashboards, and shell scripts that need to gate on metric values without parsing human output:

# Gate a CI step on AP ≥ 0.50
AP=$(coco eval --gt ann.json --dt det.json --json | jq '.metrics.AP')
python -c "import sys; sys.exit(0 if $AP >= 0.50 else 1)"

When --json is set and an error occurs, the exit code is still 1 and the error is also JSON:

{"error": "No such file or directory (os error 2)"}

coco eval

Evaluate detections against ground truth annotations. Prints the standard COCO metrics table.

coco eval --gt <gt.json> --dt <dt.json> [options]
Flag Description Default
--gt <path> Ground truth annotations JSON required
--dt <path> Detection results JSON required
--iou-type bbox, segm, or keypoints bbox
--lvis LVIS-style evaluation (max 300 dets, frequency-group AP) off
--img-ids 1,2,3 Evaluate only these image IDs all
--cat-ids 1,2,3 Evaluate only these category IDs all
--no-cats Pool all categories (class-agnostic evaluation) off
--tide Print TIDE error decomposition after standard metrics off
--tide-pos-thr IoU threshold for TP/FP classification in TIDE 0.5
--tide-bg-thr Minimum IoU with any GT for Loc/Both/Bkg distinction 0.1
--diagnostics Per-image diagnostics: F1 distribution and label error candidates off
--diag-iou-thr IoU threshold for diagnostics TP/FP classification 0.5
--diag-score-thr Min detection score for label error candidates 0.5
--report <path> Save a PDF evaluation report to this path (requires hotcoco[plot]) off
--title Report title shown in the header COCO Evaluation Report
--slices <path> JSON file with named image ID groups for sliced evaluation off
--healthcheck Run dataset healthcheck before evaluation (warnings to stderr) off
--calibration Compute confidence calibration (ECE/MCE) after standard metrics off
--cal-bins Number of calibration bins 10
--cal-iou-thr IoU threshold for calibration TP/FP classification 0.5
--json Write results as JSON to stdout instead of human-readable text off
# Bounding box evaluation
coco eval --gt instances_val2017.json --dt bbox_results.json

# Segmentation
coco eval --gt instances_val2017.json --dt segm_results.json --iou-type segm

# Keypoints
coco eval --gt person_keypoints_val2017.json --dt kpt_results.json --iou-type keypoints

# LVIS-style evaluation
coco eval --gt lvis_val.json --dt lvis_results.json --lvis

# With TIDE error decomposition
coco eval --gt instances_val2017.json --dt bbox_results.json --tide

# TIDE at a stricter localization threshold
coco eval --gt instances_val2017.json --dt bbox_results.json --tide --tide-pos-thr 0.75

# Save a PDF evaluation report
coco eval --gt instances_val2017.json --dt bbox_results.json --report report.pdf

# PDF report with custom title and LVIS-style evaluation
coco eval --gt lvis_val.json --dt lvis_results.json --lvis --report lvis_report.pdf --title "LVIS Evaluation"

# Sliced evaluation (compare metrics across image subsets)
coco eval --gt instances_val2017.json --dt bbox_results.json --slices slices.json

# Pre-flight healthcheck before evaluation
coco eval --gt instances_val2017.json --dt bbox_results.json --healthcheck

# JSON output for CI/CD pipelines
coco eval --gt instances_val2017.json --dt bbox_results.json --json

# JSON with TIDE and slices combined
coco eval --gt instances_val2017.json --dt bbox_results.json --tide --slices slices.json --json

JSON output shape:

{
  "hotcoco_version": "0.3.0",
  "params": { "iou_type": "Bbox", "iou_thresholds": [...], "area_ranges": {...}, ... },
  "metrics": { "AP": 0.578, "AP50": 0.861, "AP75": 0.600, "APs": 0.327, ... },
  "tide": { "delta_ap": {...}, "counts": {...}, "ap_base": 0.578, ... },
  "slices": { "daytime": { "AP": 0.61, ... }, "_overall": { ... } },
  "healthcheck": { "errors": [], "warnings": [] }
}

tide, slices, and healthcheck keys are only present when the corresponding flag is passed.

coco healthcheck

Validate a dataset for structural errors, quality warnings, and distribution issues.

coco healthcheck <annotation_file> [--dt <detections.json>]
Flag Description
--dt <path> Detection results JSON — enables GT/DT compatibility checks
--json Write results as JSON to stdout
# Dataset only
coco healthcheck instances_val2017.json

# With detections (also checks GT/DT compatibility)
coco healthcheck instances_val2017.json --dt bbox_results.json

# JSON output (full errors/warnings list + summary)
coco healthcheck instances_val2017.json --json

coco stats

Print a health-check summary of a dataset: image and annotation counts, per-category breakdown, image dimensions, and annotation area distribution.

coco stats instances_val2017.json
coco stats instances_val2017.json --all-cats  # show all categories, not just top 20
coco stats instances_val2017.json --json       # machine-readable output

coco filter

Subset a dataset by category, image ID, or annotation area.

coco filter <file> -o <output> [options]
Flag Description
--cat-ids 1,2,3 Keep only these category IDs
--img-ids 1,2,3 Keep only these image IDs
--area-rng MIN,MAX Keep annotations within this area range (inclusive)
--keep-empty-images Preserve images with no matching annotations
-o / --output Output JSON path (required)
--json Write before/after counts as JSON to stdout
# Keep only "person" (category 1)
coco filter instances_val2017.json --cat-ids 1 -o person.json

# Medium-sized objects only
coco filter instances_val2017.json --area-rng 1024,9216 -o medium.json

# JSON output: {"before": {"images": 5000, ...}, "after": {...}, "output": "..."}
coco filter instances_val2017.json --cat-ids 1 -o person.json --json

coco split

Split a dataset into train/val (or train/val/test) subsets. Writes separate JSON files for each split.

coco split <file> -o <prefix> [options]
Flag Description Default
--val-frac Fraction of images for validation 0.2
--test-frac Fraction for a test set (omit for two-way split)
--seed Random seed for reproducibility 42
-o / --output Output prefix (required)
--json Write per-split counts as JSON to stdout off

Writes <prefix>_train.json, <prefix>_val.json, and optionally <prefix>_test.json.

# 80/20 split
coco split person.json -o splits/person --val-frac 0.2

# 70/15/15 split
coco split person.json -o splits/person --val-frac 0.15 --test-frac 0.15

coco merge

Combine multiple annotation files into one. All files must share the same category taxonomy.

coco merge <file1> <file2> [<file3> ...] -o <output>
coco merge batch1.json batch2.json batch3.json -o combined.json

# JSON output: input list with per-file counts + output counts
coco merge batch1.json batch2.json -o combined.json --json

coco sample

Draw a random subset of images (with their annotations).

coco sample <file> -o <output> [options]
Flag Description
--n N Number of images to sample
--frac F Fraction of images to sample
--seed Random seed (default 42)
-o / --output Output JSON path (required)
--json Write before/after counts as JSON to stdout
# Sample 500 images
coco sample instances_val2017.json --n 500 --seed 0 -o sample.json

# Sample 10% of the dataset
coco sample instances_val2017.json --frac 0.1 -o sample.json

coco explore

Launch a local dataset browser to explore a dataset interactively. Requires pip install hotcoco[browse].

coco explore --gt <annotations.json> --images <images_dir/> [options]
Flag Description Default
--gt <path> Ground truth annotation JSON required
--images <dir> Directory containing image files required
--dt <path> Detection results JSON (enables detection overlay) off
--batch-size N Images loaded per batch 12
--port N Local server port 7860
coco explore --gt instances_val2017.json --images /data/coco/val2017/

# With detection overlay
coco explore --gt instances_val2017.json --images /data/images/ --dt results.json

# Custom port
coco explore --gt instances_val2017.json --images /data/images/ --port 7861

Opens a sidebar with category filter and shuffle. Click any thumbnail to open a full-resolution lightbox with canvas annotation overlay. See the Dataset Browser guide.


coco compare

Compare two model evaluations on the same dataset with per-metric deltas, per-category breakdown, and optional bootstrap confidence intervals.

coco compare --gt <annotations.json> --dt-a <model_a.json> --dt-b <model_b.json> [options]
Flag Description Default
--gt Ground truth annotations (COCO JSON) required
--dt-a Detections from model A required
--dt-b Detections from model B required
--iou-type bbox, segm, or keypoints bbox
--lvis LVIS-style federated evaluation off
--bootstrap N Bootstrap samples for confidence intervals 0 (disabled)
--seed Random seed for bootstrap 42
--confidence Confidence level for CIs 0.95
--name-a Display name for model A Model A
--name-b Display name for model B Model B
--json JSON output for CI/CD pipelines off
# Basic comparison
coco compare --gt ann.json --dt-a baseline.json --dt-b improved.json

# With bootstrap CIs
coco compare --gt ann.json --dt-a a.json --dt-b b.json --bootstrap 1000

# JSON output for CI/CD
coco compare --gt ann.json --dt-a a.json --dt-b b.json --bootstrap 1000 --json

coco convert

Convert between annotation formats. Supports COCO JSON ↔ YOLO labels, COCO JSON ↔ Pascal VOC XML, and COCO JSON ↔ CVAT for Images XML.

COCO → YOLO:

coco convert --from coco --to yolo --input <annotations.json> --output <labels_dir/>

YOLO → COCO:

coco convert --from yolo --to coco --input <labels_dir/> --output <annotations.json> [--images-dir <images/>]

COCO → Pascal VOC:

coco convert --from coco --to voc --input <annotations.json> --output <voc_dir/>

Pascal VOC → COCO:

coco convert --from voc --to coco --input <voc_dir/> --output <annotations.json>

COCO → CVAT:

coco convert --from coco --to cvat --input <annotations.json> --output <annotations.xml>

CVAT → COCO:

coco convert --from cvat --to coco --input <annotations.xml> --output <annotations.json>
Flag Description
--from Source format: coco, yolo, voc, or cvat
--to Target format: coco, yolo, voc, or cvat
--input Input path — JSON file (COCO), label directory (YOLO), annotation directory (VOC), or XML file (CVAT)
--output Output path — label directory (YOLO), annotation directory (VOC), XML file (CVAT), or JSON file (COCO)
--images-dir (YOLO → COCO only) Directory of source images; used by Pillow to populate width/height on each image record. Requires pip install Pillow.
--json Write conversion stats as JSON to stdout
# Export val2017 to YOLO labels
coco convert --from coco --to yolo \
    --input instances_val2017.json \
    --output labels/val2017/

# Import YOLO labels back (with image dims)
coco convert --from yolo --to coco \
    --input labels/val2017/ \
    --output reconstructed.json \
    --images-dir images/val2017/

# Export to Pascal VOC
coco convert --from coco --to voc \
    --input instances_val2017.json \
    --output voc_output/

# Import Pascal VOC
coco convert --from voc --to coco \
    --input VOCdevkit/VOC2012/ \
    --output voc2012_as_coco.json

# Export to CVAT
coco convert --from coco --to cvat \
    --input instances_val2017.json \
    --output annotations.xml

# Import CVAT
coco convert --from cvat --to coco \
    --input annotations.xml \
    --output cvat_as_coco.json

coco-eval — Rust CLI

Evaluation only. No Python required — useful in environments where installing a Python package isn't practical.

cargo install hotcoco-cli

Usage

coco-eval --gt annotations.json --dt detections.json --iou-type bbox

Options

Flag Description Default
--gt <path> Path to ground truth annotations JSON file required
--dt <path> Path to detection results JSON file required
--iou-type <type> Evaluation type: bbox, segm, or keypoints bbox
--img-ids <ids> Filter to specific image IDs (comma-separated) all images
--cat-ids <ids> Filter to specific category IDs (comma-separated) all categories
--no-cats Pool all categories (disable per-category evaluation) off
-o / --output <path> Write evaluation results to a JSON file off

Examples

# Bounding box evaluation
coco-eval --gt instances_val2017.json --dt bbox_results.json --iou-type bbox

# Segmentation evaluation
coco-eval --gt instances_val2017.json --dt segm_results.json --iou-type segm

# Keypoint evaluation
coco-eval --gt person_keypoints_val2017.json --dt kpt_results.json --iou-type keypoints

# Filter to specific categories
coco-eval --gt instances_val2017.json --dt results.json --cat-ids 1,3

# Category-agnostic evaluation
coco-eval --gt instances_val2017.json --dt results.json --no-cats

# Save results as JSON (includes per-category AP)
coco-eval --gt instances_val2017.json --dt bbox_results.json --output results.json

Output

The standard 12 COCO metrics (10 for keypoints):

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.783
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.971
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.849
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.621
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.893
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.988
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.502
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.835
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.854
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.701
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.935
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.997

Shell completions

Both CLIs support tab completion for flags, subcommands, and values.

coco (Python)

Install argcomplete:

pip install "hotcoco[completions]"

Then register the completion for your shell. The one-time setup depends on your shell:

Add to ~/.bashrc:

eval "$(register-python-argcomplete coco)"

Add to ~/.zshrc:

autoload -U bashcompinit && bashcompinit
eval "$(register-python-argcomplete coco)"
register-python-argcomplete --shell fish coco | source

After restarting your shell (or sourcing the config), coco <TAB> completes subcommands and coco eval --<TAB> completes flags.

coco-eval (Rust)

coco-eval --completions <SHELL> prints a completion script to stdout. Pipe it to the right location for your shell:

coco-eval --completions bash > ~/.bash_completion.d/coco-eval
# or for system-wide:
coco-eval --completions bash | sudo tee /etc/bash_completion.d/coco-eval

Then add to ~/.bashrc if not already sourcing ~/.bash_completion.d/:

source ~/.bash_completion.d/coco-eval
mkdir -p ~/.zsh/completions
coco-eval --completions zsh > ~/.zsh/completions/_coco-eval

Make sure ~/.zsh/completions is on your fpath in ~/.zshrc:

fpath=(~/.zsh/completions $fpath)
autoload -U compinit && compinit
coco-eval --completions fish > ~/.config/fish/completions/coco-eval.fish

Supported shells: bash, zsh, fish, elvish, powershell.