The robot data collection cost of building an in-house program is almost always higher than the initial estimate. Most teams undercount by 3 to 5x. This post is a complete breakdown of where the gap comes from.
The robot data collection cost spreadsheet that looks fine until it is not
The build-vs-buy analysis for robot data collection almost always starts with a headcount calculation: X operators at Y hourly rate, running Z sessions per day, producing N demonstrations. The math looks manageable. Then the real costs appear.
This post is a line-by-line accounting of what in-house robot data collection actually costs, based on the experience of teams that have done it at scale. The goal is not to argue for outsourcing — it is to help you build an honest model before you commit to either path.
The costs you put in the spreadsheet
Operator labor: Skilled teleop operators in the US market (2025) typically earn $25–45/hour depending on task complexity and domain (general manipulation vs. medical vs. precision assembly). At 8 operators running 6 billable hours per day, 250 days per year, that is $300,000–$540,000 annually in direct operator cost before benefits, training, and management overhead.
Hardware: A teleop station (VR headset, haptic controllers, workstation) runs $8,000–$25,000 per station depending on configuration. For 8 operators, budget $80,000–$200,000 in upfront hardware, plus 15–20% annually for maintenance, repairs, and replacement cycles.
Software infrastructure: A production-grade data collection platform — session management, trajectory storage, QA tooling, operator dashboards — requires 2–4 months of senior engineering time to build and 0.5–1 FTE to maintain. At $200,000/year for a senior ML engineer, that is $100,000–$200,000 to build and $100,000–$200,000/year to operate.
The costs you leave out
Operator recruitment and churn: Teleop operator roles have high turnover. Industry average churn is 40–60% annually for roles that require 6+ hours of daily VR operation. Recruiting, onboarding, and calibrating a replacement operator costs 4–6 weeks of lost productivity and $5,000–$15,000 in direct recruiting costs. For an 8-person team, plan for 3–5 replacements per year — $15,000–$75,000 in recurring recruiting cost, plus the quality degradation from operators who are in the calibration window.
Quality assurance: Raw demonstrations require QA review before they are usable for training. For high-quality teleop data, expect 15–25% of demonstrations to require flagging or rejection. Manual QA at scale requires 1–2 dedicated QA reviewers ($80,000–$120,000/year each) plus the computational cost of automated filtering pipelines. Most teams underestimate this by 50%.
Facility and compliance costs: Depending on your deployment domain, you may need specific facility conditions (calibrated lighting, clear floor space, specific ambient temperature for haptic hardware), plus liability insurance for operator injury and data handling compliance. For medical or regulated domains, add HIPAA compliance infrastructure, BAA agreements, and audit overhead — easily $50,000–$150,000/year.
Management overhead: An 8-person collection team requires at least 0.5 FTE of operations management. At $150,000/year for an operations manager, that is $75,000/year in management cost. This number compounds as the team grows — a 30-person team typically requires a full-time operations lead plus a data quality manager.
The opportunity cost nobody calculates
The hardest cost to quantify is what your ML and robotics engineers are not doing while they build and maintain the data collection infrastructure. A 3-month engineering sprint to build a data platform is 3 months of model development, architecture exploration, or deployment work that did not happen. For most robotics teams, the bottleneck is not the data collection itself — it is the engineering capacity to build the systems that support it.
At a fully loaded cost of $250,000–$350,000 per senior robotics engineer, a 2-FTE data infrastructure project costs $500,000–$700,000 in opportunity cost, in addition to the direct cost. This rarely appears in the build-vs-buy analysis.
When building in-house makes sense
In-house collection is the right answer when:
- Your data has unique security or IP requirements that preclude third-party access (defense, certain medical applications)
- Your task requires specialized operator expertise that cannot be trained in weeks (surgical robotics, specific industrial domains)
- Your collection methodology is itself a competitive moat — you have developed novel collection techniques that would be diluted by outsourcing
- You are at scale (10M+ demonstrations/year) where per-unit economics of in-house operations become favorable
When outsourcing makes sense
Outsourcing is the right answer when:
- You need to move faster than recruiting and training an in-house team allows
- Your data requirements are variable (high-volume sprints followed by maintenance-level collection)
- You are in the 10,000–1,000,000 demonstration range where vendor infrastructure amortizes well
- Your ML team is the constraint, not your data supply — outsourcing lets them focus on training instead of operations
The honest number
For most robotics teams in the 50,000–500,000 demonstration range, the total annual cost of in-house collection — including all the costs above — runs $800,000–$2,000,000 per year, excluding the opportunity cost of engineering time. Vendor pricing for the same volume typically runs $0.50–$3.00 per demonstration depending on task complexity, or $25,000–$1,500,000 for the same range.
The math often favors outsourcing at this scale. But the more important question is whether your team should be operating a data collection function at all — or whether that capacity belongs in model development, deployment engineering, and the actual robotics that is your core product.
\n\n
If the math is pointing toward outsourcing — or you’re not sure where the crossover is for your program — send us your volume and task spec. We’ll run the numbers with you.
\n





