Responsibilities
Team Introduction:
The TikTok Data Ecosystem Team plays a critical role in supporting TikTok's personalized recommendation system, which serves over 1 billion users. We are responsible for building scalable, reliable, and high-performance infrastructure for storing and serving machine learning features - especially user behavior sequences and contextual embeddings used in large-scale recommendation and pretraining models.
Our work sits at the intersection of systems and machine learning: ensuring training-serving consistency, low-latency access to temporal features, and scalable ingestion pipelines across online and offline environments.
We explore and integrate with various underlying storage engines, including RocksDB, HBase, and time-series databases, depending on the access pattern, feature type, and serving latency required by ML models.
Want more jobs like this?
Get jobs in San Jose, CA delivered to your inbox every week.
Responsibilities:
- Build and optimize the core infrastructure of TikTok's feature store, powering both training data pipelines and real-time inference systems.
- Design efficient storage strategies for user behavior sequences, long-range contextual features, and sparse embeddings - ensuring freshness, consistency, and high availability.
- Work with underlying storage engines such as RocksDB, HBase, and time-series databases to support feature retention, versioning, compaction, and fast lookup.
- Collaborate with recommendation algorithm teams to design schemas and access patterns tailored to evolving model needs.
- Integrate online and offline data pipelines to reduce training-serving skew and support continuous training and A/B testing scenarios.
- Investigate techniques such as temporal sampling, embedding quantization, caching, and hybrid tiered storage to improve cost-efficiency and latency.
Qualifications
Minimum Qualifications:
- Currently pursuing a PhD's degree or above in Computer Science, Software Engineering, or a related technical field.
- Solid foundation in distributed systems, data storage, and stream/batch processing architectures.
- Experience in programming with Java, C++, or Python.
- Understanding of key-value stores, LSM-tree architectures, or time-series databases at a system level.
- Eagerness to work on ambiguous, real-world infrastructure problems that impact ML product outcomes.
Preferred Qualifications:
- Graduating in December 2025 or later with intent to return to your program.
- Experience working with RocksDB, HBase, or time-series storage engines like IoTDB, OpenTSDB, or custom LSM-tree variants.
- Familiarity with feature store design, feature lifecycle management, and streaming ingestion pipelines.
- Understanding of recommendation system workflows, such as two-tower models, real-time CTR prediction, or user intent modeling.
- Contributions to open-source storage/ML infra projects or participation in ML system hackathons.