Lead, Data Center Engineering
- Fremont, CA
Facebook's mission is to give people the power to build community and bring the world closer together. Through our family of apps and services, we're building a different kind of company that connects billions of people around the world, gives them ways to share what matters most to them, and helps bring people closer together. Whether we're creating new products or helping a small business expand its reach, people at Facebook are builders at heart. Our global teams are constantly iterating, solving problems, and working together to empower people around the world to build community and connect in meaningful ways. Together, we can help people build stronger communities - we're just getting started.
Facebook's data centers are the foundation upon which our rapidly growing infrastructure efficiently operates and our innovative services are delivered. We are seeking an Engineering Director to join the operations team responsible for the mission critical IT hardware and tooling in our data centers. The successful candidate will assure we are delivering the right engineering solutions to our data center teams, with emphasis on crafting the right tools and landing the right production hardware meeting our needs in performance, quality, reliability, and serviceability. As part of the team operating the infrastructure within the data centers, the Engineering Director will require close working relationships with the teams responsible for design and release of production systems, collaborating on verification test, and working through production quality and performance issues, sharing high quality feedback reflecting business needs. The candidate will be responsible to specify the right tools supporting operations, from ticketing & workflow systems to application of machine learning to failure diagnosis, translating fleetwide data into actionable insights and proactive indicators of fleet health. The SiteOps Director will champion engineering initiatives that bring together field and engineering teams to deliver proactive, industry leading operations at hyperscale. Success will be gauged by meaningful improvements to the quality, efficiency, and effectiveness of our Capacity & Production Operations teams. The SiteOps Director will drive innovative initiatives; scale teams and operations; develop talent; and deliver the right tools, processes, and standards to enable Facebook's growth. This position is full-time, based in Fremont, California.
- Build and guide a lean, world-class sustaining engineering team, delivering insights, practices, and tools to proactively manage a hyper-scale fleet of servers distributed across the globe
- Build deep relationships with key Infra hardware and software engineering teams, technical specialists, program managers, and data center operations teams, and leverage these relationships to drive development of efficient and effective workflows and tools, and the right high quality, high performance infrastructure
- Lead and influence multi-disciplinary teams of deep technical experts and engineers in disciplines core to server operations, including information security, new product introduction, long-term data storage technologies, IaaS (Infrastructure as a Service), machine learning, and software automation
- Set and champion engineering strategy and technical direction, and establish engineering best practices across the Site Operations team. Plan and deliver engineering strategy efficiently and effectively, and ensure it is well communicated, fully bought into, and fully understood by key stakeholders and partners.
- Ensure that engineered solutions take a best practice approach to user privacy, security, and data protection, both logically and physically, in our data centers, including the integration of legal and compliance-related requirements into our tooling and automation
- Assure tight feedback between operations and design, representing full-lifecycle operability, serviceability, and production performance and quality at design
- Become an expert in Facebook's site operations infrastructure, tools, systems, and data
- Team with groups across Infra to develop dashboards and reporting to build a deep understanding of a complex technical environment, to improve the state of the fleet
- Formulate the right metrics, definitions of success, and high impact engineering initiatives to drive improved quality, effectiveness, efficiency, cost, and timeliness
- Align the global team behind key performance metrics, and be a key driver assuring rapid, coordinated, and effective responses to issues as they arise
- Anticipate potential engineering and tooling risks and develop strategies to minimize or mitigate
- Ability to travel 30-40% required
- BS in Engineering.
- 10+ years of engineering experience, in a mature operations environment
- Experience influence effectively and working on cross functional teams to advance the needs of the company, and adapting the team to meet these needs
- Experience working in a fast paced, hands-on environment
- M.S. In Engineering
Back to top