Skip to content

plot

from hotcoco.plot import (
    report,
    pr_curve, pr_curve_iou_sweep, pr_curve_by_category, pr_curve_top_n,
    confusion_matrix, top_confusions, per_category_ap, tide_errors,
    style, SERIES_COLORS, CHROME, SEQUENTIAL,
)

Requires pip install hotcoco[plot] (matplotlib >= 3.5).

All functions share these common parameters:

Parameter Type Description
theme str Visual theme: "cold-brew" (default), "warm-slate", "scientific-blue", or "ember".
paper_mode bool Set both figure and axes background to white. Useful for LaTeX / PowerPoint. Default False.
ax Axes | None Draw on an existing axes. If None, creates a new figure.
save_path str | Path | None Save figure to this path (150 DPI).

All functions return (Figure, Axes).


pr_curve_iou_sweep

pr_curve_iou_sweep(
    coco_eval, *,
    iou_thrs=None, area_rng="all", max_det=None,
    theme="cold-brew", paper_mode=False, ax=None, save_path=None,
)

Plot one precision-recall curve per IoU threshold, with precision averaged across all categories. The primary line (lowest IoU) gets an under-fill and F1 peak annotation.

Parameter Type Description
coco_eval COCOeval Must have run() called first.
iou_thrs list[float] | None IoU thresholds to include. Default: all thresholds in params.
area_rng str Area range: "all", "small", "medium", "large". Default "all".
max_det int | None Max detections. Default: last entry in params.max_dets.

pr_curve_by_category

pr_curve_by_category(
    coco_eval, cat_id, *,
    iou_thr=0.5, area_rng="all", max_det=None,
    theme="cold-brew", paper_mode=False, ax=None, save_path=None,
)

Plot the precision-recall curve for a single category at a fixed IoU threshold, with an F1 peak annotation.

Parameter Type Description
coco_eval COCOeval Must have run() called first.
cat_id int Category ID to plot.
iou_thr float IoU threshold. Default 0.5.
area_rng str Area range. Default "all".
max_det int | None Max detections. Default: last entry in params.max_dets.

pr_curve_top_n

pr_curve_top_n(
    coco_eval, *,
    cat_ids=None, top_n=10, iou_thr=0.5, area_rng="all", max_det=None,
    theme="cold-brew", paper_mode=False, ax=None, save_path=None,
)

Plot precision-recall curves for multiple categories on one axes. When cat_ids is omitted, selects the top top_n categories by AP automatically.

Parameter Type Description
coco_eval COCOeval Must have run() called first.
cat_ids list[int] | None Categories to plot. Default: top top_n by AP.
top_n int Number of top categories when cat_ids is omitted. Default 10.
iou_thr float IoU threshold. Default 0.5.
area_rng str Area range. Default "all".
max_det int | None Max detections. Default: last entry in params.max_dets.

pr_curve

pr_curve(
    coco_eval, *,
    iou_thrs=None, cat_id=None, cat_ids=None,
    iou_thr=None, top_n=10, area_rng="all", max_det=None,
    theme="cold-brew", paper_mode=False, ax=None, save_path=None,
)

Convenience dispatcher — inspects the arguments and calls the appropriate named function. Prefer calling the named functions directly for clarity.

  • No cat_id or cat_idspr_curve_iou_sweep
  • cat_id set → pr_curve_by_category
  • cat_ids or iou_thr set → pr_curve_top_n
Parameter Type Description
coco_eval COCOeval Must have run() called first.
iou_thrs list[float] | None IoU thresholds to plot (IoU sweep mode).
cat_id int | None Single category to plot.
cat_ids list[int] | None Categories to compare.
iou_thr float | None Fixed IoU for single/multi-category modes. Default 0.50.
top_n int Top N categories by AP. Default 10.
area_rng str Area range: "all", "small", "medium", "large".
max_det int | None Max detections. Default: last in params.

confusion_matrix

confusion_matrix(
    cm_dict, *,
    normalize=True, top_n=None,
    group_by=None, cat_groups=None,
    theme="cold-brew", paper_mode=False, ax=None, save_path=None,
)

Plot a confusion matrix heatmap.

Parameter Type Description
cm_dict dict Output of coco_eval.confusion_matrix().
normalize bool Row-normalize values. Default True.
top_n int | None Show only top N categories by confusion mass. Auto-set to 25 when K > 30.
group_by str | None "supercategory" to aggregate by group.
cat_groups dict | None Group name → list of category names. Required with group_by.

top_confusions

top_confusions(
    cm_dict, *,
    top_n=20,
    theme="cold-brew", paper_mode=False, ax=None, save_path=None,
)

Plot the top N misclassifications as horizontal bars. Shows "ground truth → prediction" pairs sorted by count.

Parameter Type Description
cm_dict dict Output of coco_eval.confusion_matrix().
top_n int Number of confusions to show. Default 20.

per_category_ap

per_category_ap(
    results_dict, *,
    top_n=20, bottom_n=5,
    theme="cold-brew", paper_mode=False, ax=None, save_path=None,
)

Plot per-category AP as horizontal bars with a mean AP reference line.

Parameter Type Description
results_dict dict Output of coco_eval.results(per_class=True).
top_n int Top categories to show. Default 20.
bottom_n int Bottom categories to show. Default 5.

tide_errors

tide_errors(
    tide_dict, *,
    theme="cold-brew", paper_mode=False, ax=None, save_path=None,
)

Plot TIDE error breakdown as horizontal bars.

Parameter Type Description
tide_dict dict Output of coco_eval.tide_errors().

reliability_diagram

reliability_diagram(
    cal_or_eval, *,
    n_bins=10, iou_threshold=0.5,
    theme="cold-brew", paper_mode=False, ax=None, save_path=None,
)

Plot a reliability diagram — predicted confidence vs actual accuracy per bin, with a perfect calibration diagonal and gap overlay.

Parameter Type Description
cal_or_eval dict | COCOeval Either the output of ev.calibration() (a dict) or a COCOeval instance. If a COCOeval is passed, calibration() is called automatically.
n_bins int Number of bins (only used when cal_or_eval is a COCOeval). Default 10.
iou_threshold float IoU threshold (only used when cal_or_eval is a COCOeval). Default 0.5.
ev = COCOeval(coco_gt, coco_dt, "bbox")
ev.evaluate()

# From a calibration dict
cal = ev.calibration(n_bins=15)
fig, ax = reliability_diagram(cal)

# Or directly from a COCOeval
fig, ax = reliability_diagram(ev, n_bins=15)

comparison_bar

comparison_bar(
    compare_result: dict, *,
    theme: str = "cold-brew",
    paper_mode: bool = False,
    ax=None,
    save_path: str | Path | None = None,
) -> tuple[Figure, Axes]

Grouped bar chart comparing all metrics between two models. When the compare result includes bootstrap CIs, error bars are drawn on the model B bars.

from hotcoco import compare
from hotcoco.plot import comparison_bar

result = compare(ev_a, ev_b, n_bootstrap=1000)
fig, ax = comparison_bar(result, save_path="comparison.png")

category_deltas

category_deltas(
    compare_result: dict, *,
    top_k: int = 20,
    theme: str = "cold-brew",
    paper_mode: bool = False,
    ax=None,
    save_path: str | Path | None = None,
) -> tuple[Figure, Axes]

Horizontal bar chart of per-category AP deltas (B − A), sorted by magnitude. Green bars are improvements, red bars are regressions. Shows top_k categories from each end.

from hotcoco import compare
from hotcoco.plot import category_deltas

result = compare(ev_a, ev_b)
fig, ax = category_deltas(result, top_k=10, save_path="deltas.png")

report

report(
    coco_eval, *,
    save_path,
    gt_path=None,
    dt_path=None,
    title="COCO Evaluation Report",
)

Generate a publication-quality single-page PDF report. Requires pip install hotcoco[plot].

The report contains:

  • Header — title and timestamp
  • Run context — GT/DT file paths, eval params, and dataset statistics (images, annotations, categories, detections)
  • Summary metrics — AP and AR tables with a PR-curve panel and KPI tiles
  • Per-category AP — bar chart sorted descending, three columns

The metric rows adapt automatically to the evaluation mode:

Mode AP rows AR rows
bbox / segm AP AP50 AP75 APs APm APl AR1 AR10 AR100 ARs ARm ARl
keypoints AP AP50 AP75 APm APl AR AR50 AR75 ARm ARl
LVIS AP AP50 AP75 APs APm APl APr APc APf AR@300 ARs@300 ARm@300 ARl@300
Parameter Type Description
coco_eval COCOeval Must have run() called first.
save_path str | Path Output PDF path.
gt_path str | None Ground-truth JSON path shown in the run context block.
dt_path str | None Detections JSON path shown in the run context block.
title str Report title shown in the header. Default "COCO Evaluation Report".

Returns None. Raises on I/O error or if run() was not called first.


Themes

Four built-in themes:

Theme Character
"cold-brew" Default. Warm off-white background, 10-color infographic palette (alternating warm/cool).
"warm-slate" Warm off-white background, terracotta + slate series colors.
"scientific-blue" Cool/academic. Light blue-grey background, navy + red anchor colors.
"ember" Warm/editorial. Parchment background, rust + copper + amber palette.

Pass paper_mode=True to set figure and axes backgrounds to white, keeping all other theme colors intact. Useful when embedding plots in LaTeX documents or PowerPoint slides.

# Academic paper
fig, ax = pr_curve(ev, theme="scientific-blue", paper_mode=True, save_path="pr.pdf")

# Warm editorial style
fig, ax = per_category_ap(results, theme="ember", save_path="ap.png")

Use the style() context manager to apply a theme to your own matplotlib code:

from hotcoco.plot import style

with style(theme="scientific-blue", paper_mode=True):
    fig, ax = plt.subplots()
    ax.plot(recall, precision)
    fig.savefig("custom.pdf")

Color palette

The cold-brew theme constants are available for custom plots:

from hotcoco.plot import SERIES_COLORS, CHROME, SEQUENTIAL
  • SERIES_COLORS — 10 infographic-optimized data series colors (fjord, kiln, fern, maize, plum, patina, rose, moss, slate, sienna)
  • CHROME — non-data element colors (text, label, tick, grid, spine, background)
  • SEQUENTIAL — 3-stop colormap for heatmaps (stone cream → fjord blue → deep navy)