Measurement Scientist, AI Evaluation Platform

3+ months ago• Seattle, WA

This job is no longer available.

Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create, or App Store experience we deliver is the result of us making each other's ideas stronger. That happens because every one of us shares a belief that we can make something wonderful and share it with the world, changing lives for the better. It's the diversity of our people and their thinking that inspires the innovation that runs through everything we do. When we bring everybody in, we can do the best work of our lives. Here, you'll do more than join something - you'll add something.

Description

Our team, part of Apple Services Engineering, is building the scientific foundation for how AI systems are evaluated across Apple. We are seeking a Measurement Scientist to ensure that our evaluation methods are not just sophisticated, but scientifically valid and trustworthy . In this role, you will apply psychometric theory , validity frameworks, and statistical rigor to establish measurement standards for AI evaluation - ensuring that when we claim an evaluator measures "helpfulness" or "safety ," it actually does. We are looking for individuals across a range of experience levels.

This role uniquely bridges measurement science and cutting-edge AI evaluation. You will develop methods for validating LLM-as-judge evaluators, automated benchmarks, and human evaluations. And you will create statistical tools that help engineers trust their evaluation results. You will work on an interdisciplinary team with ML researchers to solve new problems in AI evaluation. Your work will be both published at top measurement and ML venues and productionized into the evaluation SDK used across Apple.

The successful candidate will have deep expertise in psychometrics and measurement theory , with the ability to apply these principles to novel AI evaluation challenges. You will work collaboratively with ML researchers, platform engineers, and evaluation practitioners to translate measurement science into practical tools that scale across the organization.","responsibilities":"Design validity frameworks for AI evaluation, ensuring that automated metrics, LLM-as-judge systems, and human evaluation protocols measure what they claim to measure across diverse contexts.

Develop and apply psychometric methods to assessing the quality of benchmarks, for example drawing on frameworks like item response theory (IRT)

Create calibration and bias detection systems for automated evaluators, ensuring LLM-as-judge scores are interpretable, consistent, and free from systematic biases.

Build robust statistical tools for practitioners for sample-size planning, quantifying uncertainty , controlling error rates, and visualizing data.

Establish measurement standards for evaluator transfer and generalization, including methods to quantify or predict when evaluators will maintain validity across domains, languages, or contexts.

Validate novel evaluation methods in collaboration with ML researchers, ensuring intelligent search algorithms discover statistically meaningful patterns and synthetic data generation produces representative samples.

Collaborate with platform engineers to productionize measurement methods into evaluation infrastructure, creating self-service tools for validity checking, reliability testing, and interpretable outputs (report cards, warnings, confidence metrics).

Publish research at top measurement venues and/or ML conferences (NeurIPS, ICML, ICLR), advancing both measurement science and AI evaluation.

Collaborate across disciplines with ML researchers developing novel methods, platform engineers building scalable infrastructure, and evaluation practitioners using these tools in production.

Preferred Qualifications

Experience applying measurement science to AI/ML evaluation, automated scoring systems, or computational assessment.

Knowledge of modern ML evaluation challenges including LLM-as-judge, automated metrics, benchmark design, and agentic systems.

Publications at measurement venues or top ML conferences (NeurIPS, ICML, ICLR).

Want more jobs like this?

Get jobs in Seattle, WA delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.

Expertise in computational social or behavioral science using generative AI

Experience collaborating with engineers to turn research methods into production tools and scalable infrastructure.

Minimum Qualifications

PhD in Psychometrics, Educational Measurement, Quantitative Psychology , Statistics, or equivalent research/work experience.

Deep expertise in modeling test data (IRT or related methods) and construct validation.

Strong statistical foundation including experimental design, power analysis, sampling theory , and uncertaintyquantification.

Track record of designing and validating measurement instruments as demonstrated through publications or applied work.

Proficiency in Python (preferred) or R for statistical analysis, psychometric modeling, and method implementation.

Strong working knowledge of generative AI technology

Excellent communication skills with the ability to explain complex measurement concepts to engineers, ML researchers, and non-technical stakeholders.

Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant .

Pay & Benefits

At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $171,600 and $258,100, and your base pay will depend on your skills, qualifications, experience, and location.

Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses - including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits.

Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.

Client-provided location(s): Seattle, WA

Job ID: apple-200647378-3337_rxr-661

Employment Type: OTHER

Posted: 2026-02-21T19:09:02

Perks and Benefits

Health and Wellness
Parental Benefits
Work Flexibility
Office Life and Perks
Vacation and Time Off
Financial and Retirement
Professional Development
Diversity and Inclusion

Company Videos

Hear directly from employees about what it is like to work at Apple.

Want more jobs like this?

Perks and Benefits

Health and Wellness

Parental Benefits

Work Flexibility

Office Life and Perks

Vacation and Time Off

Financial and Retirement

Professional Development

Diversity and Inclusion

Company Videos