Hardware Sustaining Engineer, Analytics
(Menlo Park, CA)
Facebook’s mission is to give people the power to share, and make the world more open and connected. Through our growing family of apps and services, we’re building a different kind of company that helps billions of people around the world connect and share what matters most to them. Whether we’re creating new products or helping a small business expand its reach, people at Facebook are builders at heart. Our global teams are constantly iterating, solving problems, and working together to make the world more open and accessible. Connecting the world takes every one of us—and we’re just getting started.
Facebook is seeking a Hardware Analytics Engineer to join our RTP (Release To Production) – Infrastructure Sustaining team. This team is responsible for the hardware health of all of Facebook’s servers. Our servers and datacenters are the foundation upon which our rapidly scaling infrastructure operates and hence hardware health is critical for services to be delivered. Engineers on the Infrastructure Sustaining team will work closely with compute, storage, network, and datacenter operations teams to build data systems and deliver insights to maintain hardware health of the entire fleet at a high level. This position is full-time and located in our Menlo Park office.
- Interface with internal hardware, software engineers and operations teams to understand system architectures and failure modes
- Proactively create experiments and data visualizations to detect and diagnose hardware health issues, focusing on systemic solutions
- Develop data pipelines and discover insights to answer relationship between hardware and data center parameters to server failures
- Troubleshoot, diagnose and root cause of system failures and isolate the components /failure scenarios while working with stakeholders internally and externally
- Share insights with stakeholders and software teams to develop architectures to handle server failures based on hardware health data
- Master’s degree in engineering, operations research or statistics (Ph.D/Advanced degree preferred)
- Minimum of 5 years of work experience in infrastructure (data center or server infrastructure experience preferred)
- Experience with Linux, scripting, programming and tooling
- Must be hands on hardware engineer with experience in CPU /memory / storage technologies
- Familiar with latest server architecture and components
- Strong trouble shooting and data tooling skills – expert in data analysis, big data, building analytical models and visualizations.
- Must have excellent communications skills
Meet Some of Facebook's Employees
Sr. Manager, WhatsApp Customer Support & Localization
Cristina manages the WhatsApp customer experience, translating the application into multiple languages and troubleshooting communication services worldwide.
Back to top