Home / Data services / Human demonstration

Data service 02

human demonstration services

Egocentric video, hand pose, gaze tracking, and force gloves for VLA pre-training.

50,000+
Hours collected
1,200
Scene types
40Hz
Hand pose tracking
DEMONSTRATOR CAPTURE SESSION TASK SCENE Hand pose Gaze tracking Ego video 40 Hz capture DATASET episode_001.hdf5 episode_002.hdf5 episode_003.hdf5 recording... YOUR DATASET HHand pose21-joint at 40 Hz GGaze trackingFixation + saccade maps VEgocentric videoHead-mounted RGB-D FForce glovesGrasp force per finger VLA-READY EPISODES

What is human demonstration?

Human demonstration is the process of a trained operator physically performing a task — grasping, pouring, assembling, navigating — while wearable sensors capture hand pose, gaze direction, body motion, and egocentric video. The resulting dataset teaches a policy how a human actually solves the task, not how a script approximates it.

Why teams partner with us

Collecting demonstration data in-house means recruiting skilled operators, sourcing sensor rigs, building a logging pipeline, and running QA across thousands of episodes. We compress that into a managed program.

  • Trained demonstrators — operators calibrated on your task before a single episode records
  • Full sensor stack — eye tracking, force gloves, IMUs, and head-mounted RGB-D
  • 1,200+ scene types — diverse environments so your policy generalizes
  • Episode-level QA — every demo validated against your acceptance criteria

Why outsource demonstrations?

Your research team should be training models, not managing capture sessions. We handle recruitment, hardware, and quality so you get clean episodes on schedule.

50,000+ hours collected to date.

40 Hz hand pose tracking across all rigs.

3 continents of operator coverage.

Where we collect

41+ delivery centers across 12 countries. Every program runs from a Roborax hub near your target time zone.

Asia Pacific
India · Philippines

Americas
USA · Canada · Colombia · Jamaica · El Salvador · Belize

EMEA
UK · Albania · Kosovo · Morocco

Explore all locations →

What we deliver

The modalities VLA models actually need

Egocentric capture across hands, gaze, scene, and force — synchronized into a single timeline.

Egocentric video

First-person scene capture at 60fps, calibrated for VLA pre-training.

Hand pose + grasps

Per-finger joint angles plus grasp state, at 40Hz.

Eye gaze tracking

Operator gaze fused with scene video for attention modeling.

Force + tactile

Glove-based force data for contact-rich tasks.

How we work

Scene design to packaged dataset

A four-stage pipeline that lands clean multi-modal data in your bucket every week.

1Step 1

Scene design

Spec the task domain and the demonstration patterns. Coverage matrix locked.

2Step 2

Operator briefing

Train demonstrators on the capture protocol. Per-task acceptance criteria reviewed.

3Step 3

Capture

Synchronized multi-modal recording. On-rig validation flags drift.

4Step 4

Label and package

Annotated, time-aligned dataset delivered as bag files or your custom format.

Rigs and tools

Glasses, gloves, and capture rigs

Best-in-class hardware for each modality.

Meta Aria

Research glasses

Quest Pro

Hand tracking

Manus VR

Per-finger gloves

Tobii Pro 3

Gaze glasses

GoPro Hero

Egocentric video

Multi-camera rig

3rd-person sync

What our partners say
Their demonstration pipeline gave us fifty hours of clean egocentric data with hand pose in three weeks. That was enough to bootstrap our VLA pre-training from scratch.
Lin Park
Research Director, Manta Labs

FAQ

Questions about human demonstration data

Teleoperation uses a control interface — the operator never touches the robot. Human demonstration captures natural human motion directly, which is then used to train imitation learning policies. Both produce trajectory data but from different sources.
Operators are briefed on task objectives but not scripted on exact movements. This produces natural variation in approach and execution — which is exactly what imitation learning needs to generalize.
Our operator network spans diverse demographics, hand sizes, grip strengths, and experience levels. If your task requires a specific operator profile — handedness, physical dimensions, domain expertise — we can filter for it.
Every demonstration is reviewed against a task specification before acceptance. Outliers are flagged and sent back for re-capture. Inter-operator consistency metrics are included in every delivery report.

Further reading

From the blog

VR Teleop vs. Physical Demonstration

Which method produces better training data and when.

From the blog

Teleop Operator Fatigue: The Hidden Variable

How fatigue affects data quality and what to do about it.

Build your demonstration pipeline

Specify the scene domain and modality mix. We scope a sized program in two days.

FROM THE FIELD

Data collection insights