Engineering Manager - Multicloud Reliability

    • New York, NY

About Datadog:

We're on a mission to build the best platform in the world for engineers to understand and scale their systems, applications, and teams.  We operate at high scale—tens of trillions of data points per day—providing always-on alerting, metrics visualization, logs, application tracing, synthetics and more for thousands of companies. Our engineering culture values pragmatism, honesty, and simplicity to solve hard problems.

 

The opportunity:

We’re looking for an Engineering Manager to build our Multicloud Systems Reliability Engineering teams. Today Datadog runs across a few vendors in a handful of regions. As we move towards becoming the first-choice telemetry platform no matter where our customers run, we need to expand the footprint of where our infrastructure runs. With that, there are enough challenges specific to each cloud provider that we need to start building focused core reliability teams for each cloud provider.

A successful candidate will do much more than manage the infrastructure, tools, and people that enable this company wide multicloud expansion - this individual will lead the team that builds automation and services to make operating a massive distributed system across multiple clouds possible. This role requires a combination of technical depth and operational expertise, as well as the ability to recruit and manage a team of systems reliability and software engineers. This individual should have extensive experience managing system operations and developing software utilizing agile practices and methodologies. This individual will also be fluent with DevOps and Site Reliability Engineering principles and practices. 

As a major company priority this is a highly visible role that requires good business acumen when working with customers and senior management. It also requires strong technical communication skills; you will interact with a broad cross section of the Datadog organization as you trace through complex interconnected systems to improve the overall operational stance of the organization on each cloud.

We will support cloud solutions for U.S. FedRAMP Moderate customers. This requires candidates to be a U.S. citizen or national, U.S. permanent resident (i.e., current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum.

 

Requirements:

  • 3+ years of experience managing operations / reliability / development team(s) in a distributed cloud computing environment
  • 8+ years experience in software operations / reliability / development 
  • Experience with AWS platforms, services, and design patterns
  • Bachelor's degree in Computer Science, Engineering, or other technical discipline (or equivalent work experience)
  • You want to work in a fast-paced, high-growth environment
  • Excellent written and verbal communication skills with the ability to present complex technical information clearly and concisely to a variety of audiences

 

Bonus points:

  • Experience with Azure or Google platforms, services, and design patterns
  • Government and/or defense industry experience
  • Active TS/SCI security clearance

Datadog is a SaaS-based monitoring computer software company, serving as a unifying view of IT infrastructure development for multiple tech teams.

Datadog Company Image


Back to top