Our multi-modal STCrowd dataset supports detection, tracking, and prediction tasks currently. We give evaluation metrics and provide benchmarks.

Detection Task

In 3D object detection of point clouds, we want to infer the label of each object. Therefore, the input to all evaluation methods is the parameter of the 3D bounding box generated from each object in a scene. Each method then output a label for each object of a scan, the 3D bounding box containing the point cloud of object. We evaluate three settings for this task: method using different meters as matching thresholds of 3D center distance.


Mean Average Precision (AP)
We use Average Precision (AP) metric with the 3D center distance threshold. Instead of IoU, since pedestrian are objects with small footprints, and IoU may not be suitable for measuring. AP is the normalized area under the precision recall curve. For crowded scenes, the distance thresholds are chosen from meters and the mean Average Precision (mAP) is calculated by: .
Average Recall with different occlusion levels.
In addition to AP, for crowded scenes, the performance on occluded instances are also considered, and we calculate the average recall with different center distance thresholds for different levels of occlusion i: .