Engineering Team Lead - Site Reliability

About Datadog:

We're on a mission to build the best platform in the world for engineers to understand and scale their systems, applications, and teams.  We operate at high scale—trillions of data points per day—providing always-on alerting, metrics visualization, logs, and application tracing for tens of thousands of companies. Our engineering culture values pragmatism, honesty, and simplicity to solve hard problems the right way.

The team:

How do you keep a data-intensive, real-time service that monitors hundreds of thousands of servers up-and-running around the clock?

How do you respond to infrastructure failures or performance issues in a high-volume, low-latency computing environment?

What should the infrastructure look like when Datadog monitors millions of servers and containers? If you these are problems that you find interesting and want to work on, apply to work on the SRE team!

The opportunity:

As an Engineering Team Lead for SRE team, you will manage a team of engineers, own significant chunks of our architecture, design and build systems at scale, and shape product decisions. You'll work on challenging projects, make an impact, and grow as an engineer and a lead.

You will:

  • Solve a scaling bottleneck in a critical service
  • Mentor other engineers on your team
  • Design a new service and write an architecture RFC
  • Deploy a new feature to production, progressively rolling it out with feature flags
  • Investigate and fix a production issue from a service your team owns
  • Plan the most important projects to work on next

Requirements:

  • You have been building applications for 4+ years and know the systems you’ve worked on from top to bottom
  • You have significant backend programming experience
  • You have managed a team of software engineers
  • You have architected, built, and operated distributed systems to solve problems at high scale
  • You have a BS/MS/PhD in a scientific field or equivalent experience
  • You want to work in a fast-paced, high-growth startup environment that respects its engineers and customers

Bonus points:

  • You've shipped complex projects with teams of engineers
  • You've worked at high scale with systems like Redis, Cassandra, Kafka
  • You have significant experience with Go, C, or Python

Is this you? Let's chat! 




Back to top