Benchmarks¶

Setup¶

Eval Type	pycocotools	faster-coco-eval	hotcoco
bbox	11.79s	3.47s (3.4x)	0.74s (15.9x)
segm	19.49s	10.52s (1.9x)	1.58s (12.3x)
keypoints	4.79s	3.08s (1.6x)	0.19s (25.0x)

Speedups in parentheses are vs pycocotools.

Synthetic benchmark scaling detections by 10x (~437,000 detections) to test behavior at scale:

Eval Type	pycocotools	faster-coco-eval	hotcoco
bbox	106.27s	27.68s (3.8x)	4.07s (26.1x)
segm	184.35s	99.73s (1.8x)	10.84s (17.0x)
keypoints	42.60s	26.54s (1.6x)	0.93s (45.8x)

hotcoco scales better at higher detection counts due to multi-threaded evaluation.

All 34 metrics accurate to within 1e-4 of pycocotools. Verified on COCO val2017:

Metric	pycocotools	hotcoco
AP	0.382	0.382
AP50	0.584	0.584
AP75	0.412	0.412
APs	0.209	0.209
APm	0.420	0.420
APl	0.529	0.529
AR1	0.323	0.323
AR10	0.498	0.498
AR100	0.520	0.520
ARs	0.308	0.308
ARm	0.562	0.562
ARl	0.680	0.680

7 of 12 metrics are exact; the remaining 5 differ by less than 1e-4.

Metric	pycocotools	hotcoco
AP	0.355	0.355
AP50	0.568	0.568
AP75	0.377	0.377
APs	0.163	0.163
APm	0.384	0.384
APl	0.531	0.531
AR1	0.303	0.303
AR10	0.462	0.462
AR100	0.482	0.482
ARs	0.259	0.259
ARm	0.521	0.521
ARl	0.672	0.672

All metrics accurate to within 2e-4 (shown rounded to 3 decimal places).

Metric	pycocotools	hotcoco
AP	0.669	0.669
AP50	0.873	0.873
AP75	0.730	0.730
APm	0.635	0.635
APl	0.732	0.732
AR1	0.291	0.291
AR10	0.707	0.707
AR100	0.739	0.739
ARm	0.685	0.685
ARl	0.815	0.815

Keypoint metrics are exact.

Wall clock time includes file I/O, evaluation, and accumulation. Excludes Python import time.
Only detections are scaled for the 10x benchmark — ground truth annotations are unchanged.
All three tools were verified to produce identical metrics before timing.
Benchmark scripts are in the repository under data/.