Data Center Operations Analytics Engineer
- Richmond, VA
Facebook's mission is to give people the power to build community and bring the world closer together. Through our family of apps and services, we're building a different kind of company that connects billions of people around the world, gives them ways to share what matters most to them, and helps bring people closer together. Whether we're creating new products or helping a small business expand its reach, people at Facebook are builders at heart. Our global teams are constantly iterating, solving problems, and working together to empower people around the world to build community and connect in meaningful ways. Together, we can help people build stronger communities - we're just getting started.
Facebook is seeking a forward thinking experienced operations leader to join our Data Center Site Operations team. Our data centers, and the tens of thousands of servers installed in them, are the foundation upon which our rapidly scaling infrastructure efficiently operates and upon which our innovative services are delivered. Facebook is at the leading edge of the global data center industry both in terms of how data centers are designed and operated. This person should enjoy working in a fast-paced environment where adaptability and flexibility will be key to their success. We seek an experienced Operations Analytics Engineer with advanced understanding of Data Center Operations and Infrastructure. Performing complex projects in a large-scale distributed data center environment is a core skill of this role. Solid communication is a requirement for this position.
- Act as key Subject Matter Expert on operation of servers in a hyper-scale data center environment.
- Formulate the right metrics and definitions of success to drive quality, efficiency, cost, and timeliness.
- Align the global team behind these metrics, and be a key driver ensuring rapid, coordinated, and effective responses to issues as they arise.
- Work with data analytics partners to use data mining and statistical techniques to validate or identify operational inefficiencies, exceptions, and fault/event correlation.
- Provide data driven narratives, determining what is important and what next steps should be taken.
- Build data pipelines, tables and dashboards, and leverage these for analyzing incidents and trends.
- Create a deep understanding of a complex technical environment through data and reporting.
- Create key metrics in areas such as operational efficiency and failure rates, and evolve these over time to match changes to the infrastructure and business requirements.
- Build cross-functional relationships and have the ability to influence policies and procedures to improve global data center operations.
- Help create policies and machine learning algorithms to solve complex business problems.
- Become an expert in Facebook's site operations infrastructure, tools, systems, and data.
- Anticipate potential operational risks and develop strategies to mitigate/minimize.
- Partner with other organizations and develop dashboard/reporting for different audience levels to improve the state of the fleet.
- Ability to travel up to 30% required.
- BS, BEng or BA in technical field or commensurate experience.
- 10+ years of technical IT experience.
- 5+ years of experience working with data sets and knowledge of relational databases.
- Knowledge of data driven analysis.
- Experience communicating the results of analysis and insights to cross functional teams and influence the strategy of those teams.
- Experience with Data Center Infrastructure and fleet management.
- Experience analyzing and processing large or complex sets of data.
- Experience working individually as well as in small and large groups on a regular basis.
- Communication experience.
- Experience working in a large-scale data center environment.
Back to top