Member of Technical Staff — Synthetic Data

Compensation

$125k – $225k

+ 0.1% – 0.5% equity

Location

Berkeley, CA / Remote

Full-time, flexible schedule

Apply now

or email careers@trajectorylabs.net

About us

Trajectory Labs builds safety evals and RL environments for frontier AI labs. We started <6 months ago, have contracts with multiple frontier labs, and are rapidly scaling to meet demand.

About the role

Frontier models can already tell when they're being evaluated (Needham et al. arXiv:2505.23836), and awareness is at the lowest it will be. This undermines both the evaluations labs use for safety decisions and the RL environments they use to train safe behavior. We're hiring to build tasks realistic enough to serve as a reliable signal to prevent these issues.

Responsibilities:

Work directly with the founders to decide what we build next and how we scale task production
Build the pipelines that generate, validate, and deploy tasks at scale
Own the infrastructure and CI/CD that keeps task production running reliably
Manage a team of five scenario designers and build internal tooling that makes them faster

About you

Essential

Impact-driven: this role involves building and QAing hundreds of tasks, not publishing papers. You work on something not because it's fun but because it matters the most.
Attention to detail: eval quality lives in the details, and a subtle ambiguity in a task can invalidate the signal.
Agent native coding: your orchestrator is juggling 4 agents while you read this job description.
High ownership: you proactively identify and tackle issues with minimal direction.
Comfort with ambiguity: you can solve complex problems from a vague description.

Highly valued

Experience working in or alongside an AI safety, alignment, or red teaming context
Experience designing evaluations, benchmarks, or structured test sets for ML systems
Familiarity with synthetic data generation pipelines and prompt engineering

Application process

Submit your application with a resume
Interview with the founders to discuss your experience and fit for the role
Take-home: design a small evaluation task set for a capability or behavior we specify — we'll give you a few days and keep it scoped to 2–3 hours
In-person work trial to mutually assess fit
Offer

Applications reviewed on a rolling basis.

Apply now