Skip to main contentA logo with &quat;the muse&quat; in dark blue text.
Meta

Research Engineer - Codec Avatar ML Compute Team

Pittsburgh, PA

Reality Labs Research (RL-R) brings together a diverse and highly interdisciplinary team of researchers and engineers to create the future of augmented and virtual reality. As a member of the Codec Avatars ML Compute Infrastructure team, you'll have the exciting opportunity to contribute to the advancement of our Codec Avatar technology. Your role will involve delivering data, tools, and libraries within our super clusters, playing a crucial part in our technological progress.Our team cultivates an honest and considerate environment where self-motivated individuals thrive. We encourage a strong sense of ownership and embrace the ambiguity that comes with working on the frontiers of research. In this research engineer role on the Codec Avatar ML Compute team, you will serve as the point of contact for Meta's research GPU super clusters, parallelizing massive ML models and data, and optimizing other compute resources to enable groundbreaking research in relightable avatars, full-body avatars, and generative AI for codec avatars.

Want more jobs like this?

Get Data and Analytics jobs in Pittsburgh, PA delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.


Research Engineer - Codec Avatar ML Compute Team Responsibilities:
  • Build efficient and scalable machine learning tooling for the GPU clusters within Meta research labs, a heterogeneous environment containing diverse system architectures and research workload
  • Build efficient and scalable data tooling for massive ML training data preprocessing and postprocessing using thousands of CPU / GPUs nodes
  • Provide on-call support and lead incident root cause analysis through multiple infrastructure layers (compute, storage, network) for GPU clusters and act as a final escalation point
  • Work side by side with research scientists and engineers to take full advantage of modern GPUs for large scale multi-GPU training jobs impact
  • Collaborate in a diverse team environment across multiple scientific and engineering disciplines, making the architectural tradeoffs required to rapidly train large scale ML models
  • Provide guidance to other engineers on best practices to build mature tools which are highly reliable, secure, and scalable
  • Influence outcomes within your immediate team, peer engineering teams, and with cross-functional stakeholders
  • Ability to work independently, handle large projects simultaneously, and prioritize team roadmap and deliverables by balancing required effort with resulting
Minimum Qualifications:
  • Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta.
  • 3+ years experience coding in C++ and Python
  • Experience in large scale ML system performance measurement, logging, and optimization
  • Experience in writing system level infrastructure, libraries, and applications
  • Prior experience in ML libraries such as PyTorch, TensorFlow, or cuDNN
  • Experience with software development practices such as source control, unit testing, debugging and profiling
  • Experience in developing performant software and systems
Preferred Qualifications:
  • Prior experience in large scale machine learning model training, including model parallelization strategies
  • Prior experience in machine learning model compiler
  • Prior experience in cluster coordination and strategy planning, including collecting/understanding needs of researchers, developing tools to improve research experience, providing guidance on best practices, coordinating distribution of compute/storage resources, forecasting compute/storage needs, and developing long-term user experience/compute/storage strategies
  • Prior experience building tooling for monitoring and telemetry for large scale supercomputers
  • Prior experience in debugging performance issues for large scale ML training tasks
  • Prior experience in GPGPU development with CUDA, OpenCL or DirectCompute
  • Familiar with Linux observability tools, such as eBPF
About Meta:

Meta builds technologies that help people connect, find communities, and grow businesses. When Facebook launched in 2004, it changed the way people connect. Apps like Messenger, Instagram and WhatsApp further empowered billions around the world. Now, Meta is moving beyond 2D screens toward immersive experiences like augmented and virtual reality to help build the next evolution in social technology. People who choose to build their careers by building with us at Meta help shape a future that will take us beyond what digital connection makes possible today-beyond the constraints of screens, the limits of distance, and even the rules of physics.

Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment.

Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at accommodations-ext@fb.com.

Client-provided location(s): Pittsburgh, PA, USA
Job ID: a1K2K00000908dlUAA
Employment Type: Other

Perks and Benefits

  • Health and Wellness

    • Health Insurance
    • Health Reimbursement Account
    • Dental Insurance
    • Vision Insurance
    • Life Insurance
    • Short-Term Disability
    • Long-Term Disability
    • FSA
    • FSA With Employer Contribution
    • HSA
    • HSA With Employer Contribution
    • Fitness Subsidies
    • On-Site Gym
    • Mental Health Benefits
  • Parental Benefits

    • Birth Parent or Maternity Leave
    • Non-Birth Parent or Paternity Leave
    • Fertility Benefits
    • Adoption Assistance Program
    • Family Support Resources
  • Work Flexibility

    • Flexible Work Hours
    • Remote Work Opportunities
    • Hybrid Work Opportunities
  • Office Life and Perks

    • Commuter Benefits Program
    • Casual Dress
    • Happy Hours
    • Snacks
    • Some Meals Provided
    • Company Outings
    • On-Site Cafeteria
    • Holiday Events
  • Vacation and Time Off

    • Paid Vacation
    • Unlimited Paid Time Off
    • Paid Holidays
    • Personal/Sick Days
    • Sabbatical
    • Leave of Absence
  • Financial and Retirement

    • 401(K)
    • 401(K) With Company Matching
    • Pension
    • Company Equity
    • Performance Bonus
    • Relocation Assistance
    • Financial Counseling
  • Professional Development

    • Learning and Development Stipend
    • Promote From Within
    • Mentor Program
    • Shadowing Opportunities
    • Access to Online Courses
    • Lunch and Learns
    • Internship Program
  • Diversity and Inclusion

    • Diversity, Equity, and Inclusion Program
    • Employee Resource Groups (ERG)
    • Founder led

Company Videos

Hear directly from employees about what it is like to work at Meta.