Machine Learning Infrastructure, Software Engineer (Staff/Senior)
- San Francisco, CA
At Instacart, we use machine learning and internet scale data to elevate customer experience, improve efficiency and reduce cost in e-commerce, advertising, and fulfillment. For example, we build large, distributed machine learning models for personalization and recommendation. Machine learning infrastructure team's job is to provide an easy to use, flexible, elastic, and highly available platform to our machine learning engineers and enable them to train, debug, deploy, and monitor their models.
ABOUT THE JOB
Joining our team gives you a chance to grow your career and interests in a dynamic and fast paced environment as:
- You will design solutions for machine learning infrastructure that will support the current and future needs of our business. Many interesting and unique use cases that require innovation both in the application and infrastructure level, makes this a challenging and dynamic job.
- You will apply your system software and machine learning knowledge to build scalable, reliable, and easy-to-use machine learning workflows.
- You will enable machine learning teams to perform scalable training, evaluation, inference, debugging, monitoring in the cloud and on prem.
- You will enable software engineers across the company to use machine learning solutions.
- You will work closely with related teams, including Search, Ads, Personalization & Recommendation and other teams across Instacart to ensure that applications that require ML services work seamlessly.
- Background in Computer Science, Math, Statistics, or a related field.
- 5+ years of industry experience building ML infrastructure at scale (3+ years for junior level candidates)
- Proficient in Python or C++. Experience writing and maintaining high-quality production code
- Deep knowledge of Linux operating system and proficient in Bash
- Experience in using Kubernetes (or similar orchestration platforms), Kafka (or similar streaming platforms), low latency data management systems such as Redis, RocksDB, DynamoDB
- Knowledge of machine learning concepts and fundamentals
- Working knowledge of automation platforms such as Jenkins, Ansible, and Terraform
- Working knowledge of complete machine learning lifecycle using frameworks such as Tensorflow, Karas, and Sci-kit Learn
- Experience using AWS and/or other major cloud platforms to build data-intensive infrastructure, specially for machine learning applications
- Working knowledge of publicly available ML platforms such as Sagemaker, Kubeflow, and MLflow (Preferred)
- Contribution to, and interaction with, open source projects in big data and distributed systems is a bonus
Back to top