Skip to main contentA logo with &quat;the muse&quat; in dark blue text.

Site Reliability Engineer - AML Global Recommendation - USDS

3+ months ago Sydney, Australia

This job is no longer available.

Responsibilities

About the Team:
Site Reliability Engineering (SRE) of the AML (Applied Machine Learning) team combines system engineering and the art of machine learning to develop and run a massively distributed AI/ML recommendation system for the United States and all around the world.

On the SRE team, you'll have the opportunity to sharpen your expertise in coding, performance analysis, and large-scale systems operation. Join us and you'll have the chance to shape the future of AML systems and make a real, tangible impact on TikTok users.

Responsibilities:
- Design, build, and maintain highly available, scalable, and fault-tolerant systems.
- Monitor and analyze system performance, identifying and resolving issues before causing user impact.
- Develop and maintain automated monitoring, alerting, and incident response systems.
- Collaborate closely with software engineering teams to ensure that applications are designed with reliability, scalability, and performance in mind.
- Implement and maintain security best practices and ensure compliance with regulatory requirements.
- Participate in on-call rotations and respond to issues and incidents within and outside of normal business hours.
- Conduct root cause analysis of incidents, hold post-mortem reviews with stakeholders, and implement preventative measures to minimize the risk of similar incidents occurring in the future.

Qualifications

Minimum Qualifications
- Expertise in analyzing and troubleshooting Linux-based distributed systems.
- Bachelor's/Master's degree in Computer Science, Computer Engineering, or equivalent years of experience in a SRE or software engineering role.
- Experience programming with at least one commonly used language (C, C++, Python, Go).
- Strong understanding of data structures and algorithms.
- Competent knowledge of relational database systems.

Preferred Qualifications
- Ability to design and maintain large-scale systems.
- Strong understanding of code optimization and routine task automation.
- Proficiency in at least one machine learning framework: TensorFlow, PyTorch, MXNet or PaddlePaddle

Want more jobs like this?

Get Software Engineering jobs in Sydney, Australia delivered to your inbox every week.

Job alert subscription
Client-provided location(s): Sydney, Australia
Job ID: TikTok-7358126026768435466
Employment Type: OTHER
Posted: 2025-04-16T00:42:29

Perks and Benefits

  • Health and Wellness

    • Health Insurance
    • Dental Insurance
    • Vision Insurance
    • HSA
    • Life Insurance
    • Fitness Subsidies
    • Short-Term Disability
    • Long-Term Disability
    • On-Site Gym
    • Mental Health Benefits
    • Virtual Fitness Classes
  • Parental Benefits

    • Fertility Benefits
    • Adoption Assistance Program
    • Family Support Resources
  • Work Flexibility

    • Flexible Work Hours
    • Hybrid Work Opportunities
  • Office Life and Perks

    • Casual Dress
    • Snacks
    • Pet-friendly Office
    • Happy Hours
    • Some Meals Provided
    • Company Outings
    • On-Site Cafeteria
    • Holiday Events
  • Vacation and Time Off

    • Paid Vacation
    • Paid Holidays
    • Personal/Sick Days
    • Leave of Absence
  • Financial and Retirement

    • 401(K) With Company Matching
    • Performance Bonus
    • Company Equity
  • Professional Development

    • Promote From Within
    • Access to Online Courses
    • Leadership Training Program
    • Associate or Rotational Training Program
    • Mentor Program
  • Diversity and Inclusion

    • Diversity, Equity, and Inclusion Program
    • Employee Resource Groups (ERG)

Company Videos

Hear directly from employees about what it is like to work at TikTok.