CASE STUDY

Aerial perception: 3x detection rate via paired drone-ground capture

How paired drone-ground capture delivered a 3x improvement in aerial perception model detection rate for an outdoor robotics team.

3x detection rate via paired drone-ground capture

Aerial perception robot training data: 3x detection rate via paired capture

An outdoor robotics company was developing an aerial perception model for a drone-based inspection system. The model needed to detect and classify objects from both aerial and ground perspectives and understand the spatial relationship between the two views. Single-perspective datasets produced models that failed to generalize across viewpoints.

The challenge

Paired aerial-ground capture — where a drone and a ground-based sensor capture the same scene simultaneously from different perspectives — requires precise operational coordination that most data collection programs are not set up for. The capture sessions need to be tightly synchronized, the environments need to be varied, and the labeling needs to maintain consistent entity identity across both views.

What Roborax delivered

Roborax designed and executed a paired capture program across 12 outdoor environments. Drone and ground sensor rigs operated in synchronized sessions, capturing the same scenes from multiple perspectives and time-of-day conditions. Roborax’s annotation team labeled objects with consistent cross-view identity tags across the full 40,000-frame dataset.

The result

The client’s aerial perception model, trained on the paired Roborax dataset, achieved a 3x improvement in object detection rate compared to the model trained on their prior single-perspective dataset.

Related: Aerial and AMR platformsCase studies.

Key lessons for aerial and ground paired capture programs

The critical operational insight from this program was that cross-view consistency in labeling requires a single annotation team working from both views simultaneously — not two separate annotation teams whose outputs are merged later. When entity identity is established independently for aerial and ground views and then merged, inconsistencies accumulate. When it is established once and applied to both views, the cross-view training signal is clean. Related: Aerial and AMR platformsCase studies.

Ready to build your training dataset?