COCOeval¶
Run COCO evaluation to compute AP/AR metrics.
from hotcoco import COCO, COCOeval
coco_gt = COCO("instances_val2017.json")
coco_dt = coco_gt.load_res("detections.json")
ev = COCOeval(coco_gt, coco_dt, "bbox")
ev.evaluate()
ev.accumulate()
ev.summarize()
use hotcoco::{COCO, COCOeval};
use hotcoco::params::IouType;
use std::path::Path;
let coco_gt = COCO::new(Path::new("instances_val2017.json"))?;
let coco_dt = coco_gt.load_res(Path::new("detections.json"))?;
let mut ev = COCOeval::new(coco_gt, coco_dt, IouType::Bbox);
ev.evaluate();
ev.accumulate();
ev.summarize();
Constructor¶
COCOeval(coco_gt: COCO, coco_dt: COCO, iou_type: str)
| Parameter | Type | Description |
|---|---|---|
coco_gt |
COCO |
Ground truth COCO object |
coco_dt |
COCO |
Detections COCO object (from load_res) |
iou_type |
str |
"bbox", "segm", or "keypoints" |
COCOeval::new(coco_gt: COCO, coco_dt: COCO, iou_type: IouType) -> Self
| Parameter | Type | Description |
|---|---|---|
coco_gt |
COCO |
Ground truth COCO object |
coco_dt |
COCO |
Detections COCO object (from load_res) |
iou_type |
IouType |
IouType::Bbox, IouType::Segm, or IouType::Keypoints |
Properties¶
params¶
params: Params
Evaluation parameters. Modify before calling evaluate().
ev = COCOeval(coco_gt, coco_dt, "bbox")
ev.params.cat_ids = [1, 2, 3]
ev.params.max_dets = [1, 10, 100]
pub params: Params
let mut ev = COCOeval::new(coco_gt, coco_dt, IouType::Bbox);
ev.params.cat_ids = vec![1, 2, 3];
ev.params.max_dets = vec![1, 10, 100];
See Params for all configurable fields.
stats¶
stats: list[float] | None
The 12 summary metrics (10 for keypoints), populated after summarize(). None before summarize() is called.
ev.summarize()
print(f"AP: {ev.stats[0]:.3f}")
print(f"AP50: {ev.stats[1]:.3f}")
pub stats: Option<Vec<f64>>
ev.summarize();
if let Some(stats) = &ev.stats {
println!("AP: {:.3}", stats[0]);
println!("AP50: {:.3}", stats[1]);
}
eval_imgs¶
Per-image evaluation results, populated after evaluate(). See Working with Results for details.
eval_imgs: list[dict | None]
pub eval_imgs: Vec<Option<EvalImg>>
eval¶
Accumulated precision/recall arrays, populated after accumulate(). See Working with Results for details.
eval: dict | None
Contains "precision", "recall", and "scores" arrays.
pub eval: Option<AccumulatedEval>
Access elements with precision_idx(t, r, k, a, m) and recall_idx(t, k, a, m).
Methods¶
evaluate¶
evaluate() -> None
Run per-image evaluation. Matches detections to ground truth annotations using greedy matching sorted by confidence. Must be called before accumulate().
Populates eval_imgs.
accumulate¶
accumulate() -> None
Accumulate per-image results into precision/recall curves using interpolated precision at 101 recall thresholds.
Populates eval.
summarize¶
summarize() -> None
Compute and print the standard COCO metrics. Populates stats.
Non-default parameters
summarize() uses a fixed display format that assumes default iou_thrs, max_dets, and area_rng_lbl. If you've changed any of these, a warning is printed to stderr and some metrics may show -1.000 (e.g. AP50 when iou_thrs doesn't include 0.50). The stats array always has 12 entries (10 for keypoints) regardless of your parameters.
Prints 12 lines for bbox/segm (10 for keypoints):
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.382
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.584
...
run¶
run() -> None
Run the full pipeline in one call: evaluate() → accumulate() → summarize(). Primarily used with LVIS pipelines (Detectron2, MMDetection) that expect a single run() call.
get_results¶
get_results() -> dict[str, float]
Return the summary metrics as a dict. Must be called after summarize() (or run()). Returns an empty dict if summarize() has not been called.
Standard bbox/segm keys: AP, AP50, AP75, APs, APm, APl, AR1, AR10, AR100, ARs, ARm, ARl.
Keypoint keys: AP, AP50, AP75, APm, APl, AR, AR50, AR75, ARm, ARl.
LVIS keys: AP, AP50, AP75, APs, APm, APl, APr, APc, APf, AR@300, ARs@300, ARm@300, ARl@300.
ev.run()
results = ev.get_results()
print(f"AP: {results['AP']:.3f}, AP50: {results['AP50']:.3f}")
print_results¶
print_results() -> None
Print a formatted results table to stdout. For LVIS, matches the lvis-api print_results() style. Must be called after summarize() (or run()).
confusion_matrix¶
confusion_matrix(
iou_thr: float = 0.5,
max_det: int | None = None,
min_score: float | None = None,
) -> dict
Compute a per-category confusion matrix. Unlike evaluate(), this method compares all detections in an image against all ground truth boxes regardless of category, enabling cross-category confusion analysis.
This method is standalone — no evaluate() call is needed first.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
iou_thr |
float |
0.5 |
IoU threshold for a DT↔GT match |
max_det |
int \| None |
last params.max_dets value |
Max detections per image by score |
min_score |
float \| None |
None |
Discard detections below this confidence before max_det truncation |
Returns a dict with:
| Key | Type | Description |
|---|---|---|
"matrix" |
np.ndarray[int64] shape (K+1, K+1) |
Raw confusion counts. Rows = GT category, cols = predicted. Index K is background. |
"normalized" |
np.ndarray[float64] shape (K+1, K+1) |
Row-normalised version (rows sum to 1.0; zero rows stay zero). |
"cat_ids" |
list[int] |
Category IDs for rows/cols 0..K-1. |
"num_cats" |
int |
Number of categories K. |
"iou_thr" |
float |
IoU threshold used. |
Matrix layout (rows = GT, cols = predicted):
matrix[i][j]wherei ≠ K, j ≠ K— GT categoryimatched to predicted categoryj. On-diagonal = TP; off-diagonal = class confusion.matrix[i][K]— GT categoryiunmatched (false negative).matrix[K][j]— Predicted categoryjunmatched (false positive).
ev = COCOeval(coco_gt, coco_dt, "bbox")
cm = ev.confusion_matrix(iou_thr=0.5, max_det=100)
matrix = cm["matrix"]
cat_ids = cm["cat_ids"]
# True positives per category
tp = matrix.diagonal()[:-1]
# False negatives per category
fn = matrix[:-1, -1]
# False positives per category
fp = matrix[-1, :-1]
# Normalised view
print(cm["normalized"])
See Confusion Matrix in the evaluation guide for a full walkthrough.
tide_errors¶
tide_errors(
pos_thr: float = 0.5,
bg_thr: float = 0.1,
) -> dict
Decompose detection errors into six TIDE error types (Bolya et al., ECCV 2020) and compute ΔAP — the AP gain from eliminating each error type.
Requires evaluate() to have been called first.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
pos_thr |
float |
0.5 |
IoU threshold for TP/FP classification |
bg_thr |
float |
0.1 |
Background IoU threshold for Loc/Both/Bkg discrimination |
Returns a dict with:
| Key | Type | Description |
|---|---|---|
"delta_ap" |
dict[str, float] |
ΔAP for each error type. Keys: "Cls", "Loc", "Both", "Dupe", "Bkg", "Miss", "FP", "FN". |
"counts" |
dict[str, int] |
Count of each error type. Keys: "Cls", "Loc", "Both", "Dupe", "Bkg", "Miss". |
"ap_base" |
float |
Baseline mean AP at pos_thr. |
"pos_thr" |
float |
IoU threshold used. |
"bg_thr" |
float |
Background threshold used. |
ev = COCOeval(coco_gt, coco_dt, "bbox")
ev.evaluate()
result = ev.tide_errors(pos_thr=0.5, bg_thr=0.1)
print(f"ap_base: {result['ap_base']:.3f}")
for k, v in sorted(result["delta_ap"].items(), key=lambda x: -x[1]):
if k not in ("FP", "FN"):
print(f" {k}: ΔAP={v:.4f} n={result['counts'].get(k, '—')}")
See TIDE Error Analysis in the evaluation guide for a detailed walkthrough.