Staff Machine Learning Platform Engineer, AI Evaluation
Join Apple Services Engineering to build the next generation of AI evaluation systems. We are seeking a staff machine learning platform engineer to lead the architectural design and development of the high availability services and internal tools powering self-service evaluation at scale. You will partner with researchers to operationalize their innovations, transforming complex workflows into intuitive, developer-first platforms. We are looking for builders who thrive in the ambiguity of new initiatives and are passionate about creating scalable infrastructure.
Description
We're building the evaluation platform that will serve all of Apple's generative AI and agent systems. This is early-stage work - some scrappy components exist, much is greenfield and we need a staff engineer who can take it from here to org-wide self-service scale.
This is not a "maintain the infra" role. You'll make consequential decisions about what to build, what to integrate, and what to say no to then ship it in Python with a small team.
Responsibilities:
Platform architecture & delivery: Own the technical direction for our evaluation platform. Design and build the APIs, SDKs, and orchestration services that turn research-grade evaluation methodology into self-service building blocks other teams ship on top of. You'll start scrappy and intentionally, with line of sight to the scaled version.
Productionize ML research: Partner directly with research engineers to assess their code and determine what can be rewritten into clean Python services vs. what requires infrastructure changes (Ray, GPU compute, distributed scheduling). Build the reusable abstractions that make the next research handoff faster than the last.
Want more jobs like this?
Get jobs in Seattle, WA delivered to your inbox every week.

Strategic decision-making: You will balance complex, competing priorities from partner engineering teams, PMs, and leadership. Your job is to distinguish signal from noise, identifying the platform-level decisions that serve the org vs. one-off requests that don't scale. You'll advocate for these decisions clearly in documentation and in rooms with senior stakeholders.
Drive org-level evaluation strategy: Work with your technical manager to assess workload across engineers, set priorities, and define how self-service evaluation reaches every team at Apple. You're a force multiplier, not just through code, but through the clarity of your technical vision.
Developer experience: You own the experience end-to-end. Today that means supporting existing evaluation patterns (trace-based, metric-based). Tomorrow it means enabling breakthrough approaches - surfacing where models fail in non-obvious ways, evaluating multi-turn agent trajectories, and scoring complex tool-use chains.
Operational rigor: Define the team's posture on testing, CI/CD, monitoring, and reliability. You don't need to be an SRE, but you ship with instrumentation and you set the standard others follow.
Preferred Qualifications
Experience with distributed compute frameworks (Ray, Dask)
Background in startup or early-stage environments where you wore multiple hats
Familiarity with LLM token economics, rate limiting, and cost management at scale
Minimum Qualifications
8+ years of software engineering experience with a track record of owning platform-level technical direction.
0-to-1 builder who designs for scale. You've taken something from nothing to production, made deliberate tradeoffs about what to build now vs. later, and can articulate why.
ML depth : You're not building the models, but you can read research code and assess: is this a software problem or an infrastructure problem? Do we need a rewrite or do we need GPUs? You speak the language of research engineers fluently.
AI/Agent evaluation experience that goes beyond traces. You understand the hard problems: non-deterministic outputs, multi-step agent reasoning, judge model reliability, scoring drift. You've built or operated systems that handle these.
Judgment under ambiguity. You know when to build a rapid prototype for quick validation and when to be disciplined (design doc, review, test). You can tell the difference in real time, not just in retrospect.
Communication as a core skill. You write clearly design docs, decision records, platform roadmaps. You speak clearly in meetings with researchers, in rooms with engineering leaders, and balance the needs and priorities of partner teams and contribute to the sequencing of execution.
Python as primary language. Strong with FastAPI, Pydantic, and the ecosystem. Experience with job orchestration frameworks (Temporal.io or similar). Bonus: Go or Rust for compute-hot paths.
Operational ownership. You've owned CI/CD, containerization (Docker/K8s), and monitoring for production services. You don't just ship, you keep things running.
Pay & Benefits
At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $201,300 and $302,200, and your base pay will depend on your skills, qualifications, experience, and location.
Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses - including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits
Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.
Perks and Benefits
Health and Wellness
Parental Benefits
Work Flexibility
Office Life and Perks
Vacation and Time Off
Financial and Retirement
Professional Development
Diversity and Inclusion
Company Videos
Hear directly from employees about what it is like to work at Apple.