Responsibilities
About the Team
The Datacenter Infrastructure Engineering team supports the company's fast growth by building and operating hyperscale datacenters. The team manages the end to end lifecycle of server fleet, providing cloud solutions and various infrastructure services ensuring that they are scalable and are reliable.
About the Role
The team is looking for candidates with strong technical infrastructure backgrounds, passionate about engineering excellence, and providing effective and sustainable solutions with an eye to the future. You understand the challenges associated with deploying infrastructure at scale and have experience in designing systems and platforms that set the global standard for performance, availability, security, and cost. The work you do will influence processes globally.
Want more jobs like this?
Get jobs in Ashburn, VA delivered to your inbox every week.
Overview
As a Data Center Operations Engineer, you will play a critical role in maintaining the heart of our infrastructure. You will ensure our data centers operate efficiently, securely, and without interruption. Your responsibilities will include infrastructure monitoring, troubleshooting, and collaborating with various teams to uphold the highest standards of performance. You will work on server-related issues, contribute to the server rack lifecycle, and assist in building robust computing and storage environments. Success in this role requires excellent communication skills, adaptability, the ability to work both independently and collaboratively. Expertise in networking, scripting, and hardware repair is highly valued.
Key Responsibilities
- Troubleshooting and Issue Resolution: Act as the primary point of contact for resolving complex operational issues within the data center, minimizing downtime. Implement long-term solutions to reduce ticket volume and improve system reliability.
- Infrastructure Management: Oversee the operation and maintenance of critical data center infrastructure, including servers, networking equipment, storage, and cooling systems, ensuring high availability and optimal performance.
- Monitoring: Continuously monitor the data center environment's performance, security, and health using various tools. Participate in on-call rotations.
- Data Center Maintenance: Perform routine maintenance tasks, including firmware updates, hardware replacements, and software patching.
- Incident Management: Lead troubleshooting efforts during outages, coordinate with teams to restore services quickly, manage incident escalation, conduct root-cause analysis, and lead post-mortem reviews to prevent recurrence.
- Security Compliance: Enforce data center security protocols, including access controls, firewalls, and physical security measures.
- Documentation: Develop and maintain clear, up-to-date documentation of data center layouts, configurations, and maintenance schedules. Ensure all procedures align with current standards.
- Leadership and Collaboration: Represent the team in collaboration with engineering, network, and IT teams. Provide guidance to entry-level engineers, fostering a collaborative environment.
- Project Management and Support: Manage data center projects, including upgrades, migrations, and infrastructure improvements. Develop timelines, manage resources, and ensure projects are completed on time and within budget.
- Process Improvement: Identify and implement improvements to operational processes to enhance workflow, reduce costs, and improve system performance. Pilot new procedures and processes to ensure smooth operation.
Qualifications
Minimum Qualifications
- Skilled in managing physical network architecture, including cabling, hardware setup, and data flow design.
- Familiar with network protocols such as TCP/IP, BGP, and OSPF, server virtualization technologies, and Optical Transport Network (OTN) systems including installation, maintenance, and troubleshooting.
- Strong understanding of data center security protocols and compliance standards.
- Experienced in cross-team collaboration, communication, leadership, and mentoring team members.
- Proficient with ticketing systems such as Remedy, GDCO, or Jira; experienced in incident management, reporting, and escalation processes.
- Able to lift equipment up to 50 lbs and travel reliably between sites as needed.
Preferred Qualifications
- 3 years of data center experience in a fast paced environment, including Linux proficiency, project management, hardware lifecycle management, and infrastructure operations.
- Bachelor's degree in IT, Computer Science, or a related field (or equivalent experience).
- Certifications such as CompTIA Server+, Cisco CCNA, or equivalent.
- Understanding of cloud infrastructures like AWS, Azure, or GCP.
- Experience with ITIL best practices.
- Strong multi-tasking skill while following established Service Level Objectives (SLO)
- Strong team management skills with experience in leading and developing high-performance teams.
- Experience in drafting and reviewing post-mortems and lessons learned.