Senior Platform Engineer - USDS
Responsibilities
About the Team
The Cyber Defense & Engineering team is missioned to run and operate security infrastructures, platforms and technologies, as well as to support cross-functional teams to protect our users, products and infrastructures. This team is responsible for enhancing security tools and identifying vulnerabilities, with a specific focus on content assurance and the application of large language models (LLMs). You'll collaborate cross-functionally with partners inside and outside TikTok to fortify our products and users' security, helping to establish TikTok as the most trusted platform.
In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time.
About the Role
We are seeking a highly skilled and hands-on technical person to design, build, and operate the on-premise platforms and systems that power our core technology. You will focus on creating highly available, reliable, scalable, and efficient infrastructure and tools. This role is ideal for someone with a strong background in systems engineering, distributed infrastructure, and backend development. You should enjoy solving complex technical problems and writing high-quality code.
Want more jobs like this?
Get jobs in San Jose, CA delivered to your inbox every week.

A key part of this role is to build a greenfield, AI-native development initiative focused on solving complex internal infrastructure and productivity challenges at scale. You will be responsible for the entire on-premise stack, leveraging technologies such as Apache Kafka, Apache Flink , Elasticsearch , PostgreSQL, Redis or Kubernetes. This will be a highly cross-functional role to foster a culture of innovation, collaboration, and continuous improvement.
Responsibilities
- Lead and perform hands-on technical work, including architecture design and code development for an on-premise, highly scalable, and parallelized infrastructure. The role includes developing internal tools to manage the entire lifecycle of a large scale RAG pipeline .
- Architect, implement, and manage a high-performance compute cluster for LLM workloads. This involves the selection and configuration of specialized hardware like GPUs, as well as the design of a robust network fabric to facilitate efficient inter-node communication for parallel processing.
- Oversee the end-to-end project lifecycle, from planning and requirements gathering to execution and delivery. You'll ensure that the infrastructure design aligns with our business goals for deploying LLM-powered applications. This includes developing internal tools and automation to support infrastructure operations.
- Develop and maintain automation scripts and configuration management to automate the deployment and management of the on-premise hardware and software stack. This ensures consistency and reproducibility across the entire environment.
- Implement security best practices for a private data center environment. This includes configuring network firewalls, managing access controls, and encrypting data at rest and in transit.
- Establish comprehensive monitoring and alerting systems to track the health and performance of the compute cluster and LLM workloads. This involves analyzing metrics related to GPU utilization, memory usage, network throughput, and model inference latency. You will proactively resolve performance issues to enhance platform reliability and operational support for internal teams.
- Collaborate with internal stakeholders to optimize resource utilization and improve the platform's efficiency. You'll work closely with data scientists and machine learning engineers to understand their compute needs and ensure the infrastructure is optimized for their specific workloads.
Qualifications
Minimum Qualifications:
- Bachelor's degree in Computer Science, Information Technology, or a related field, with 5+ years of experience in platform, systems, or infrastructure engineering.
- Proven expertise in infrastructure automation using tools like Ansible and Puppet, with strong hands-on experience in automating deployments and managing bare-metal hardware and virtual machines.
- Deep experience with on-premises infrastructure, with a solid understanding of large scale data processing, distributed computing and other big data infrastructure.
- Strong grasp of networking, storage, and virtualization technologies, with practical knowledge of building and supporting complex distributed systems for parallel processing of LLM workloads.
- Hands-on experience with container technologies (e.g., Docker) and managing on-premises Kubernetes clusters in production environments.
- Proficiency in scripting and automation using languages like Python and Go, and experience with full-stack development in languages such as Rust to build internal platforms.
- Demonstrated ability to work collaboratively with cross-functional teams to build scalable, reliable, and high-performance internal platforms and services specifically tailored for AI and LLM use cases.
Preferred Qualifications
- Relevant certifications such as Oracle Cloud Infrastructure Architect / Devops / Sytems / Security / Operations Professional
- Technical writing and communication skills that enable effective problem-solving and strengthen interpersonal relationships
- Critical thinking and architectural decision making skills to support effective, collaborative leadership in a fast-paced, dynamic environment
- Strong team and XFN co-ordination & management abilities
- Experience with cloud platforms other than OCI e.g., AWS, Azure, GCP
Job Information
[For Pay Transparency] Compensation Description (annually)
The base salary range for this position in the selected city is $187040 - $359720 annually.
Compensation may vary outside of this range depending on a number of factors, including a candidate's qualifications, skills, competencies and experience, and location. Base pay is one part of the Total Package that is provided to compensate and recognize employees for their work, and this role may be eligible for additional discretionary bonuses/incentives, and restricted stock units.
Benefits may vary depending on the nature of employment and the country work location. Employees have day one access to medical, dental, and vision insurance, a 401(k) savings plan with company match, paid parental leave, short-term and long-term disability coverage, life insurance, wellbeing benefits, among others. Employees also receive 10 paid holidays per year, 10 paid sick days per year and 17 days of Paid Personal Time (prorated upon hire with increasing accruals by tenure).
The Company reserves the right to modify or change these benefits programs at any time, with or without notice.
For Los Angeles County (unincorporated) Candidates:
Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state, and local laws including the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act. Our company believes that criminal history may have a direct, adverse and negative relationship on the following job duties, potentially resulting in the withdrawal of the conditional offer of employment:
1. Interacting and occasionally having unsupervised contact with internal/external clients and/or colleagues;
2. Appropriately handling and managing confidential information including proprietary and trade secret information and access to information technology systems; and
3. Exercising sound judgment.
Perks and Benefits
Health and Wellness
- Health Insurance
- Dental Insurance
- Vision Insurance
- HSA
- Life Insurance
- Fitness Subsidies
- Short-Term Disability
- Long-Term Disability
- On-Site Gym
- Mental Health Benefits
- Virtual Fitness Classes
Parental Benefits
- Fertility Benefits
- Adoption Assistance Program
- Family Support Resources
Work Flexibility
- Flexible Work Hours
- Hybrid Work Opportunities
Office Life and Perks
- Casual Dress
- Snacks
- Pet-friendly Office
- Happy Hours
- Some Meals Provided
- Company Outings
- On-Site Cafeteria
- Holiday Events
Vacation and Time Off
- Paid Vacation
- Paid Holidays
- Personal/Sick Days
- Leave of Absence
Financial and Retirement
- 401(K) With Company Matching
- Performance Bonus
- Company Equity
Professional Development
- Promote From Within
- Access to Online Courses
- Leadership Training Program
- Associate or Rotational Training Program
- Mentor Program
Diversity and Inclusion
- Diversity, Equity, and Inclusion Program
- Employee Resource Groups (ERG)
Company Videos
Hear directly from employees about what it is like to work at TikTok.