Engineering Manager - Machine Learning
What you'll do:
Job Responsibilities
The Manager will be responsible for:
• Lead and manage a team of Engineers to deploy and monitor machine learning models in production.
• Working with data engineers for designing data engineering pipelines and performs robust ETL processes to ensure reliable, high-quality data for analytics and ML workloads.
• Collaborate with cross-functional teams, including data science, engineering, and operations, to understand business requirements and translate them into scalable ML solutions.
• Architect and implement end-to-end machine learning pipelines for model training, testing, deployment, and monitoring.
• Establish best practices and standards for model versioning, deployment, and monitoring to ensure reliability, scalability, and performance.
• Implement automated processes for model training, hyperparameter tuning, and model evaluation using tools such as Weight and Biases, MLflow, Kubeflow, or similar.
• Design and implement infrastructure for scalable and efficient model serving and inference, leveraging technologies such as Kubernetes, Docker, and serverless computing.
• Develop and maintain monitoring and alerting systems to detect model drift, performance degradation, and other issues in production.
• Provide technical leadership and mentorship to team members, fostering their professional growth and development.
• Stay current with emerging technologies and industry trends in machine learning engineering, and evaluate their potential impact on our processes and infrastructure.
• Collaborate with product management to define requirements and priorities for machine learning model deployments and validation, ensuring alignment with business goals and objectives.
• Implement monitoring and logging solutions to track model performance metrics, resource utilization, and system health, enabling proactive issue detection and resolution.
• Lead efforts to optimize resource utilization and cost-effectiveness of machine learning infrastructure, including compute resources, storage, and data transfer.
• Stay abreast of advancements in machine learning technologies, evaluating their applicability and potential impact on our AI Operations strategy and roadmap.
• Foster a culture of innovation, collaboration, and continuous improvement within the AI Operations team, encouraging experimentation and learning from failures.
Qualifications:
- B.tech / M Tech in Computer Science, Electronics or related fields
- 8 Years +
Skills:
- Machine Learning, Software Development
- Research and development, Technology strategy, Global Project Management, Team Management, Mentoring, Risk Management.
- Desired Skills :
• Masters or Bachelor's degree in Computer Science, Engineering, or related field
• 8+ years of experience in software engineering, data engineering, or related roles, with at least 2 years in a managerial or leadership role.
• Experience in Designs and maintains scalable data engineering pipelines and performs robust ETL processes to ensure reliable, high-quality data for analytics and ML workloads
• Previous experience in a leadership or management role, with a track record of successfully leading technical teams and delivering high-impact projects.
• Experience with version control systems (e.g., Git) and collaboration tools (e.g., GitHub, GitLab) for managing code repositories and facilitating team collaboration.
• Familiarity with infrastructure as code (IaC) tools such as Terraform or CloudFormation for provisioning and managing cloud resources.
• Knowledge of software development methodologies (e.g., Agile, DevOps) and best practices for building scalable and reliable software systems.
• Ability to effectively communicate technical concepts and solutions to non-technical stakeholders, including executives, product managers, and business users.
• Strong proficiency in Python, JAVA and related IDEs
• Awareness of machine learning concepts, algorithms, and frameworks (e.g. TensorFlow, PyTorch, sci-kit-learn).
• Experience with cloud platforms and services (e.g., Azure, AWS, GCP) for building and deploying machine learning applications.
• Proficiency in containerization technologies (e.g., Docker) and orchestration tools (e.g., Kubernetes).
• Hands-on experience with MLOps tools and platforms such as Weight and Biase, MLflow, Kubeflow, TFX, or similar.
• Experience in DevOps and DevSecOps tools and practices
• Strong problem-solving skills and ability to troubleshoot complex issues in production environments.
• Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams.
Want more jobs like this?
Get jobs in Pune, India delivered to your inbox every week.

Perks and Benefits
Health and Wellness
- Health Insurance
- Health Reimbursement Account
- Dental Insurance
- Vision Insurance
- Life Insurance
- Short-Term Disability
- Long-Term Disability
- FSA
- HSA With Employer Contribution
- Fitness Subsidies
- On-Site Gym
- Pet Insurance
- Mental Health Benefits
- Virtual Fitness Classes
Parental Benefits
- Birth Parent or Maternity Leave
- Adoption Assistance Program
Work Flexibility
- Flexible Work Hours
- Remote Work Opportunities
- Hybrid Work Opportunities
Office Life and Perks
- Casual Dress
- On-Site Cafeteria
Vacation and Time Off
- Paid Vacation
- Paid Holidays
- Personal/Sick Days
- Leave of Absence
- Summer Fridays
Financial and Retirement
- 401(K) With Company Matching
- Performance Bonus
- Relocation Assistance
- Financial Counseling
Professional Development
- Tuition Reimbursement
- Promote From Within
- Mentor Program
- Shadowing Opportunities
- Access to Online Courses
- Internship Program
- Work Visa Sponsorship
- Leadership Training Program
- Associate or Rotational Training Program
Diversity and Inclusion
- Diversity, Equity, and Inclusion Program
- Employee Resource Groups (ERG)