Site Reliability Engineer - AML Global Recommendation - USDS
This job is no longer available.
Responsibilities
About the Team:
Site Reliability Engineering (SRE) of the AML (Applied Machine Learning) team combines system engineering and the art of machine learning to develop and run a massively distributed AI/ML recommendation system for the United States and all around the world.
On the SRE team, you'll have the opportunity to sharpen your expertise in coding, performance analysis, and large-scale systems operation. Join us and you'll have the chance to shape the future of AML systems and make a real, tangible impact on TikTok users.
Responsibilities:
- Design, build, and maintain highly available, scalable, and fault-tolerant systems.
- Monitor and analyze system performance, identifying and resolving issues before causing user impact.
- Develop and maintain automated monitoring, alerting, and incident response systems.
- Collaborate closely with software engineering teams to ensure that applications are designed with reliability, scalability, and performance in mind.
- Implement and maintain security best practices and ensure compliance with regulatory requirements.
- Participate in on-call rotations and respond to issues and incidents within and outside of normal business hours.
- Conduct root cause analysis of incidents, hold post-mortem reviews with stakeholders, and implement preventative measures to minimize the risk of similar incidents occurring in the future.
Qualifications
Minimum Qualifications
- Expertise in analyzing and troubleshooting Linux-based distributed systems.
- Bachelor's/Master's degree in Computer Science, Computer Engineering, or equivalent years of experience in a SRE or software engineering role.
- Experience programming with at least one commonly used language (C, C++, Python, Go).
- Strong understanding of data structures and algorithms.
- Competent knowledge of relational database systems.
Preferred Qualifications
- Ability to design and maintain large-scale systems.
- Strong understanding of code optimization and routine task automation.
- Proficiency in at least one machine learning framework: TensorFlow, PyTorch, MXNet or PaddlePaddle
Want more jobs like this?
Get Software Engineering jobs in Sydney, Australia delivered to your inbox every week.

Perks and Benefits
Health and Wellness
- Health Insurance
- Dental Insurance
- Vision Insurance
- HSA
- Life Insurance
- Fitness Subsidies
- Short-Term Disability
- Long-Term Disability
- On-Site Gym
- Mental Health Benefits
- Virtual Fitness Classes
Parental Benefits
- Fertility Benefits
- Adoption Assistance Program
- Family Support Resources
Work Flexibility
- Flexible Work Hours
- Hybrid Work Opportunities
Office Life and Perks
- Casual Dress
- Snacks
- Pet-friendly Office
- Happy Hours
- Some Meals Provided
- Company Outings
- On-Site Cafeteria
- Holiday Events
Vacation and Time Off
- Paid Vacation
- Paid Holidays
- Personal/Sick Days
- Leave of Absence
Financial and Retirement
- 401(K) With Company Matching
- Performance Bonus
- Company Equity
Professional Development
- Promote From Within
- Access to Online Courses
- Leadership Training Program
- Associate or Rotational Training Program
- Mentor Program
Diversity and Inclusion
- Diversity, Equity, and Inclusion Program
- Employee Resource Groups (ERG)
Company Videos
Hear directly from employees about what it is like to work at TikTok.