Data Center Tooling Engineer
Facebook's mission is to give people the power to build community and bring the world closer together. Through our family of apps and services, we're building a different kind of company that connects billions of people around the world, gives them ways to share what matters most to them, and helps bring people closer together. Whether we're creating new products or helping a small business expand its reach, people at Facebook are builders at heart. Our global teams are constantly iterating, solving problems, and working together to empower people around the world to build community and connect in meaningful ways. Together, we can help people build stronger communities â€" we're just getting started.
Facebook is seeking a forward thinking, experienced Data Center Operations Tooling & Automation Engineer to join the Data Center Site Operations team. Our data centers, and the tens of thousands of servers installed in them, are the foundation upon which our rapidly scaling infrastructure efficiently operates and upon which our innovative services are delivered. Facebook is at the leading edge of the global data center industry both in terms of how data centers are designed and operated. This person should enjoy working in a fast-paced environment where adaptability and flexibility will be key to their success. This position is based in Prineville, OR.
The candidate we seek is a forward thinking IT professional with deep experience applying and developing software tooling and automation solutions, to address complex operational issues. The ideal candidate should have a strong software architecture background and be comfortable working independently with little supervision and creating global direction. They should also be a natural collaborator, able to distill the demands of stakeholders and subject matter experts and translate these into long-term engineering solutions. The successful candidate will be a leader, capable of providing technical guidance and mentorship, to drive continuous improvement in global operational processes and tooling. Extensive knowledge of managing servers, programming/scripting, and performing complex projects in a large-scale, distributed data center environment is an advantage.
- Subject matter expert on operational processes and workflows, and their supporting automation and tooling.
- Collaboratively design, develop and execute software based automation and tooling solutions to drive global operational processes and efficiency.
- Work with our Engineering and Operations teams to evaluate and recommend tools, technologies and processes to ensure the highest quality operational tooling and platforms.
- Collaborate with stakeholders, functional owners and subject matter experts to interpret business and operations needs and articulate how they can be addressed in partnership with engineering and programs teams.
- Find opportunities to globally improve and innovate in key areas such as server integration and repairs, documentation and standardization, tooling and automation, Data Center design and capacity planning.
- Act as remote member of project teams developing new tools or enhancing existing ones, together with our engineering teams in Menlo Park, CA and Dublin, Ireland.
- Gather and define requirements from the Data Center teams, and act as the liaison between these and the engineering teams on technical project matters.
- Drive tooling improvements through prioritization in tooling roadmaps, in partnership with tooling and automation program managers.
- Work as technical lead globally, with cross-functional teams on large scale data center projects and initiatives.
- Lead work investigating complex technical matters globally and spanning multiple disciplines such as Hardware, Linux, Networking and Power & Cooling.
- Create and influence roadmaps based on operational escalations and scaling issues for tooling improvements and further automation.
- Build strong relationships with other groups within engineering and/or across the company. Actively solicit feedback from related teams, and use that feedback to improve tooling efficiency as infrastructure scales.
- Ability to travel up to 30% required.
- Masterâ€™s degree in Computer Science or Computer Engineering, or commensurate experience.
- 7+ years experience designing and building software applications.
- Experience in processing and analyzing large sets of data.
- Experience with revision control systems such as VCS (GIT & SVN).
- Experience interacting with SQL databases and distributed storage systems such as Hadoop.
- Knowledge of networking principles and technologies, protocols and standards.
- Experience managing multiple projects within the same time schedule and time management experience.
- Experience working individually as well as in groups on a regular basis.
- Experience working independently within a multi-disciplinary team of software and operations engineers.
- Communication experience.
- Large-scale data center environment experience, including deep system knowledge of Linux, Server Hardware, networking, network protocols, supply chain and Data Center automation.
- Experience working in Data Center environments, and solid understanding of key infrastructure commonly found in Data Centers such as cooling, power distribution and fiber-optic cabling.
- Knowledge of all aspects of large-scale supply chain, logistics and asset management in a Data Center environment.
Back to top