Member of Technical Staff — Synthetic Data
Compensation
$125k – $225k
+ 0.1% – 0.5% equity
Location
Berkeley, CA / Remote
Full-time, flexible schedule
Apply now
or email careers@trajectorylabs.net
About us
Trajectory Labs builds safety evals and RL environments for frontier AI labs. We started <6 months ago, have contracts with multiple frontier labs, and are rapidly scaling to meet demand.
About the role
Frontier models can already tell when they're being evaluated (Needham et al. arXiv:2505.23836), and awareness is at the lowest it will be. This undermines both the evaluations labs use for safety decisions and the RL environments they use to train safe behavior. We're hiring to build tasks realistic enough to serve as a reliable signal to prevent these issues.
Responsibilities:
- Work directly with the founders to decide what we build next and how we scale task production
- Build the pipelines that generate, validate, and deploy tasks at scale
- Own the infrastructure and CI/CD that keeps task production running reliably
- Manage a team of five scenario designers and build internal tooling that makes them faster
About you
Essential
- Impact-driven: this role involves building and QAing hundreds of tasks, not publishing papers. You work on something not because it's fun but because it matters the most.
- Attention to detail: eval quality lives in the details, and a subtle ambiguity in a task can invalidate the signal.
- Agent native coding: your orchestrator is juggling 4 agents while you read this job description.
- High ownership: you proactively identify and tackle issues with minimal direction.
- Comfort with ambiguity: you can solve complex problems from a vague description.
Highly valued
- Experience working in or alongside an AI safety, alignment, or red teaming context
- Experience designing evaluations, benchmarks, or structured test sets for ML systems
- Familiarity with synthetic data generation pipelines and prompt engineering
Application process
- Submit your application with a resume
- Interview with the founders to discuss your experience and fit for the role
- Take-home: design a small evaluation task set for a capability or behavior we specify — we'll give you a few days and keep it scoped to 2–3 hours
- In-person work trial to mutually assess fit
- Offer
Applications reviewed on a rolling basis.