Backend Engineer , AML Engine Orchestration
Responsibilities
Team Introduction
The mission of our AML team is to push next-generation machine learning algorithms and platforms for the recommendation system, ads ranking and search ranking in our company. We also drive substantial impact on core businesses of the company.
Responsibilities:
1. Resource Efficiency Optimization in Distributed Orchestration and Scheduling:
- Develop and extend distributed orchestration frameworks within the Kubernetes/Godel ecosystem. Select appropriate frameworks based on different business scenarios, and optimize cluster utilization and load balancing strategies according to the specific characteristics of each scenario;
- Integrate and expand AutoScaling and automatic parallelization capabilities for various models and tasks. Employ load modeling and analytic methods for different models to automatically optimize resource requests, achieving large-scale improvements in resource usage efficiency and global optimality;
- Responsible for preemption and re-scheduling mechanisms for services with different prioritties, and manage automatic resource multiplexing across different clusters and resource types; handle scheduling and load adaptation across multi-datacenter, multi-region, and multi-cloud environments.
2. Building Training System Architecture for Next-Generation Ultra-Large and Ultra-Deep Recommendation Models:
Want more jobs like this?
Get Software Engineering jobs in Singapore delivered to your inbox every week.

- Develop a flexible, elastic and robust distributed training runtime focused on hyper-scaled embeddings and large-scale GPU training;
- Design and optimize distributed computing APIs and runtimes geared towards future recommendation and ads model paradigms (e.g., reinforcement learning, fine-tuning and/or distillation);
- Collaborate with platform teams to enhance the diagnosability and usability of distributed training systems.
3. Constructing Online Orchestration Architecture for Next-Generation Recommendation Systems:
- Build a robust distributed model inference architecture for online learning scenarios involving hyper-scaled embeddings;
- Optimize the usability of online recommendation and ads model architectures and MLops workflows.
Qualifications
Minimum Qualifications
- Bachelor's degree or above, majoring in Computer Science, Engineering or related fields.
- Strong programming and coding experience with at least one modern language such as Golang, Python.
- Experience contributing to the large scale distributed systems, multi-tenant systems (architecture, reliability and scaling).
- Strong analytical abilities and problem solving.
- Good communication, self-motivation, engineering practice, documentation, etc.
- At least 3 years of relevant experience.
Preferred Qualifications
- Familiar with large-scale distributed scheduling systems like Kubernetes, Yarn, Flink and/or Spark
- Familiar with opensourced orchestration frameworks like VeRL, vLLM, Ray or TFX, etc.
Perks and Benefits
Health and Wellness
- Health Insurance
- Dental Insurance
- Vision Insurance
- HSA
- Life Insurance
- Fitness Subsidies
- Short-Term Disability
- Long-Term Disability
- On-Site Gym
- Mental Health Benefits
- Virtual Fitness Classes
Parental Benefits
- Fertility Benefits
- Adoption Assistance Program
- Family Support Resources
Work Flexibility
- Flexible Work Hours
- Hybrid Work Opportunities
Office Life and Perks
- Casual Dress
- Snacks
- Pet-friendly Office
- Happy Hours
- Some Meals Provided
- Company Outings
- On-Site Cafeteria
- Holiday Events
Vacation and Time Off
- Paid Vacation
- Paid Holidays
- Personal/Sick Days
- Leave of Absence
Financial and Retirement
- 401(K) With Company Matching
- Performance Bonus
- Company Equity
Professional Development
- Promote From Within
- Access to Online Courses
- Leadership Training Program
- Associate or Rotational Training Program
- Mentor Program
Diversity and Inclusion
- Diversity, Equity, and Inclusion Program
- Employee Resource Groups (ERG)
Company Videos
Hear directly from employees about what it is like to work at TikTok.