Skip to main contentA logo with &quat;the muse&quat; in dark blue text.

Sr. Engineering Manager, AI Evaluation Platform

Yesterday Seattle, WA

Join Apple Services Engineering to build the next generation of AI evaluation systems. We are seeking a hands-on Engineering Manager to architect high-availability services and internal tools that enable self-service evaluation at scale. You will partner with researchers to operationalize their innovations, transforming complex workflows into intuitive, developer-first platforms. We are looking for a leader who thrives in the ambiguity of new initiatives and is passionate about building scalable infrastructure.

Description

You will build and lead the engineering team responsible for democratizing AI evaluation across the organization. Your focus will be on architecting the developer experience-designing the APIs, SDKs, and platform services that turn complex evaluation metrics into simple, self-service calls. You will work hand-in-hand with researchers to operationalize sophisticated measurement techniques, ensuring they scale reliably within our high-availability infrastructure. In this role, you will define the engineering standards for a new organization, establishing the code quality, automation, and testing rigor required to support the rapid evolution of Generative AI and Agentic systems.","responsibilities":"Team Building & Leadership: Hire, mentor, and grow a diverse, high-performing team of backend and platform engineers. Foster a culture of technical excellence and rapid delivery as you build this new team from the ground up.

Technical Strategy & Roadmap: Own the engineering roadmap for the core evaluation engine. Architect the APIs, SDKs, and distributed services that power our internal platform, enabling product teams to measure Generative AI performance autonomously.

Operationalizing Science: Partner closely with Applied Scientists to translate novel metrics, judge prompts, and scoring algorithms into scalable, production-grade services. Create frameworks to evaluate not just simple responses, but also multi-turn agent trajectories and tool usage.

System Integration: Serve as a technical bridge between the research organization and the broader engineering ecosystem, ensuring our tools integrate seamlessly with existing ML infrastructure and developer workflows.

Engineering Rigor: Establish the software development lifecycle (SDLC) for the team, defining standards for code quality, automated testing (CI/CD), and monitoring to ensure high availability and reliability.

Preferred Qualifications

Experience building MLOps & Platform Infrastructure: You have architected or managed teams that built the foundational infrastructure for AI, such as model registries, inference services, or feature stores (using tools like Kubernetes, Ray, or Kubeflow).

Deep familiarity with AI Evaluation Frameworks: You have used or contributed to modern evaluation tools like DeepEval, Ragas, TruLens, or LangSmith. You understand how to implement and scale model-based evaluation workflows.

Deep understanding of Generative AI & Agents: You understand the engineering challenges of relying on LLMs and Agents as software components-specifically managing token economics, handling rate limits, and evaluating non-deterministic, multi-step reasoning capabilities.

Builder Experience: You have thrived in startup-like environments or incubated new teams within larger orgs, navigating high ambiguity to define roadmaps where none existed.

Minimum Qualifications

5+ years of direct engineering management experience, with a proven track record of hiring, mentoring, and retaining high-performing engineers. You have successfully managed teams that ship production-grade software.

7+ years of hands-on software engineering experience with deep proficiency in the Python ecosystem (e.g., FastAPI, Pydantic, Pandas). You are capable of contributing to code reviews and architectural discussions on day one.

Customer Obsession & Product Thinking: Experience acting as a technical partner to internal customers. You can translate vague requirements from other teams into concrete engineering specifications and are comfortable prioritizing the roadmap in the absence of a dedicated Product Manager.

Want more jobs like this?

Get jobs in Seattle, WA delivered to your inbox every week.

Job alert subscription


Demonstrated experience partnering with Data Scientists or Researchers: You have a history of taking experimental or "messy" code and refactoring it into reliable, scalable production systems.

Functional literacy in AI/ML concepts: You understand the fundamental lifecycle of machine learning (datasets, training vs. inference, evaluation metrics) and can discuss the engineering challenges involved in serving models.

Strong expertise in API Design & Internal Tools: You have architected APIs that other developers rely on, with a focus on versioning, backward compatibility, and developer experience.

Operational excellence background: You have practical experience establishing CI/CD pipelines, containerization (Docker/Kubernetes), and monitoring (Datadog/Prometheus).

Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant .

Client-provided location(s): Seattle, WA
Job ID: apple-200635890-3337_rxr-659
Employment Type: OTHER
Posted: 2025-12-12T19:32:16

Perks and Benefits

  • Health and Wellness

    • Parental Benefits

      • Work Flexibility

        • Office Life and Perks

          • Vacation and Time Off

            • Financial and Retirement

              • Professional Development

                • Diversity and Inclusion

                  Company Videos

                  Hear directly from employees about what it is like to work at Apple.