Machine Learning Platform Engineer, AI Evaluation Platform (All levels)

Yesterday• Seattle, WA

Join Apple Services Engineering to build the next generation of AI evaluation systems. We are seeking machine learning platform engineers at multiple levels (Mid-Level to Principal) to architect and build high-availability services and internal tools that enable self-service evaluation at scale. You will partner with researchers to operationalize their innovations, transforming complex workflows into intuitive, developer-first platforms. We are looking for builders who thrive in the ambiguity of new initiatives and are passionate about creating scalable infrastructure.

Description

You will join the engineering team responsible for democratizing AI evaluation across the organization. Your focus will be on developing the developer experience-architecting and implementing the APIs, SDKs, and platform services that turn complex evaluation metrics into simple, self-service calls. You will work hand-in-hand with researchers to operationalize sophisticated measurement techniques, ensuring they scale reliably within our high-availability infrastructure. In this role, you will drive the engineering standards for a new organization, upholding the code quality, automation, and testing rigor required to support the rapid evolution of Generative AI and Agentic systems.","responsibilities":"System Design & Implementation: Design, code, and ship high-quality Python services. For senior candidates: Lead the architecture for the core evaluation engine and distributed services. For mid-level candidates: Own the end-to-end implementation of specific features and API endpoints.

Technical Leadership & Collaboration: Mentor junior engineers, conduct code reviews, and drive technical decision-making. Foster a culture of technical excellence and rapid delivery through example and collaboration.

Operationalizing Science: Partner closely with Applied Scientists to translate novel metrics, judge prompts, and scoring algorithms into scalable, production-grade services. Create frameworks to evaluate not just simple responses, but also multi-turn agent trajectories and tool usage.

System Integration: Serve as a technical bridge between the research organization and the broader engineering ecosystem, ensuring our tools integrate seamlessly with existing ML infrastructure and developer workflows.

Engineering Rigor: Champion the software development lifecycle (SDLC) for the team, writing comprehensive automated testing (CI/CD), and instrumenting monitoring to ensure high availability and reliability.

Preferred Qualifications

Experience building MLOps & Platform Infrastructure: You have architected the foundational infrastructure for AI, such as model registries, inference services, or feature stores (using tools like Kubernetes, Ray, or Kubeflow).

Deep familiarity with AI Evaluation Frameworks: You have used or contributed to modern evaluation tools like DeepEval, Ragas, TruLens, or LangSmith. You understand how to implement and scale model-based evaluation workflows.

Deep understanding of Generative AI & Agents: You understand the engineering challenges of relying on LLMs and Agents as software components-specifically managing token economics, handling rate limits, and evaluating non-deterministic, multi-step reasoning capabilities.

Builder Experience: You have thrived in startup-like environments, navigating high ambiguity to deliver complex technical roadmaps from scratch.

Minimum Qualifications

2+ years of hands-on software engineering experience (or Master's degree with relevant project experience). Note: We are hiring across multiple seniority levels; expectations will scale with experience.

Strong proficiency in the Python ecosystem (e.g., FastAPI, Pydantic, Pandas). You are capable of writing production-grade code and contributing to architectural discussions on day one.

Customer Obsession & Product Thinking: Experience acting as a technical partner to internal customers. You can translate vague requirements from other teams into concrete engineering specifications.

Demonstrated experience partnering with Data Scientists or Researchers: You have the ability to navigate the ambiguity of research workflows and operationalize scientific code.

Functional literacy in AI/ML concepts: You understand the fundamental lifecycle of machine learning (datasets, training vs. inference, evaluation metrics) and can discuss the engineering challenges involved in serving models.

Strong expertise in API Design & Internal Tools: You have built APIs that other developers rely on, with a focus on versioning, backward compatibility, and developer experience.

Want more jobs like this?

Get jobs in Seattle, WA delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.

Operational excellence background: You have practical experience using CI/CD pipelines, containerization (Docker/Kubernetes), and monitoring (Datadog/Prometheus).

Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant .

Client-provided location(s): Seattle, WA

Job ID: apple-200636469-3337_rxr-659

Employment Type: OTHER

Posted: 2025-12-15T19:10:54

Perks and Benefits

Health and Wellness
Parental Benefits
Work Flexibility
Office Life and Perks
Vacation and Time Off
Financial and Retirement
Professional Development
Diversity and Inclusion

Company Videos

Hear directly from employees about what it is like to work at Apple.

Want more jobs like this?

Perks and Benefits

Health and Wellness

Parental Benefits

Work Flexibility

Office Life and Perks

Vacation and Time Off

Financial and Retirement

Professional Development

Diversity and Inclusion

Company Videos