Director, Technical Program Management, Machine Learning

Facebook's mission is to give people the power to build community and bring the world closer together. Through our family of apps and services, we're building a different kind of company that connects billions of people around the world, gives them ways to share what matters most to them, and helps bring people closer together. Whether we're creating new products or helping a small business expand its reach, people at Facebook are builders at heart. Our global teams are constantly iterating, solving problems, and working together to empower people around the world to build community and connect in meaningful ways. Together, we can help people build stronger communities - we're just getting started.

The Director of Technical Program Management for Machine Learning will be responsible for managing programs to deliver highly scalable, reliable, and efficient ML platforms for Facebook Infrastructure. Partner closely with the software engineering, hardware engineering, AI/ML engineering, and production engineering teams to drive critical programs across performance, reliability, efficiency, security, usability, and capacity in these areas. The individual will be closely involved in overseeing teams in the development of platforms and services that will allow data engineers, researchers, ML, and analytics teams access to exabytes of data to understand Product trends for Facebook and customer needs. A deep understanding of machine learning, hardware and software, and distributed training architectures is required.
The Director of TPM for Machine Learning will have demonstrated experience building, grooming, and mentoring technical program management or engineering teams of 50 or more, and a proven track record of scaling infrastructure in a highly transformative environment. Specifically, she/he will be an innovative, intellectually curious problem solver with deep technical expertise across machine learning, caching, storage, distributed systems, and cloud technologies. This person does not need to have the "correct" answer to everything but, rather, should be able to quickly drive the conversation toward a productive programmatic solution by including the right stakeholders, weighing pros/cons, business needs, and timelines for masterful execution and delivery.


  • Manage portions of Facebook's 24x7, always-available infrastructure to meet the high traffic needs, and strive to eliminate downtime and improve the manageability of services.
  • Build and lead a world-class team of TPM's capable of scaling with Facebook through a period of continued high-growth while forging tight partnerships with managers and engineers across infrastructure.
  • Demonstrate an ability to structure an organization and optimize it for execution, including attracting top tier talent and filing out gaps in the existing team quickly to accommodate growth.
  • Able to lead from the front, prioritize, and drive the bigger mission forward by translating vision into results.
  • Manage, mentor and grow the existing technical program management teams, able to manage as well as performance manage those who need more help.
  • Measure and improve efficiency and effectiveness of processes that are working well and build the next level of improvements, set standards for deployments at scale, infrastructure reliability, security, and scalability.
  • Partner with all technical functions across the company to ensure all organizations are in sync.
  • Continue to improve the thriving engineering culture across all tech functions.
  • Build and lead an organization with customer focus, world-class quality, and effective communication with a focus on decisive, fast-moving solutions, quick and constructive resolutions to conflicts, and a "no barriers" mentality.
  • Serve as an evangelist for the team and overall culture, both internally and externally.
  • Hands-on leadership experience either in an established data storage, real-time monitoring, or mid-sized growth stage company.
  • Experience in machine learning workload and AI powered products.
  • Experience communicating behind program architectures, including how and why they were run, and the systems put in place to run them.
  • Knowledge in technical program management methodologies.
  • Experience moving technical or engineering programs from concept to completion and experience articulating the impact using metrics, growth examples, return, etc.
  • Experience with Caffe2, Distributed Learning, GPU's, leveraging technologies such as Java, Python, Shell, C, C++, Objective C, Hack (PHP), JavaScript, Dataswarm.
  • Experience with scale, production with emphasis on Reliability, Efficiency, Scalability, etc. with Product mindset.
  • Experience implementing instrumentation and monitoring solutions.
  • Experience with large-scale commercial data/stream processing systems such as Cloudera, AWS, Google Cloud, and/or Azure.
Facebook is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If you need assistance or an accommodation due to a disability, you may contact us at .

Back to top