Engineering Team Lead - Site Reliability
- New York, NY
At Datadog, we’re on a mission to build the best monitoring platform in the world. We operate at high scale—trillions of data points per day—providing always-on alerting, metrics visualization, logs, and application tracing for tens of thousands of companies. Our engineering culture values pragmatism, honesty, and simplicity to solve hard problems the right way.
The Site Reliability teams at Datadog are responsible for ensuring that our high-volume, low-latency environments continue to perform around the clock. These teams collaborate closely with our product engineers to ensure that Datadog can monitor millions of servers and containers, ensuring our customers always have dependable and actionable data at their fingertips. You’ll be responsible for shaping the infrastructure of our data-intensive, real-time services as we continue to grow at petabyte scale.
As an Engineering Team Lead for SRE team, you will manage a team of engineers, own significant chunks of our architecture, design and build systems at scale, and shape product decisions. You'll work on challenging projects, make an impact, and grow as an engineer and a lead.
- Solve a scaling bottleneck in a critical service
- Mentor other engineers on your team
- Design a new service and write an architecture RFC
- Deploy a new feature to production, progressively rolling it out with feature flags
- Investigate and fix a production issue from a service your team owns
- Plan the most important projects to work on next
- You have been building applications for 4+ years and know the systems you’ve worked on from top to bottom
- You have significant backend programming experience
- You have managed a team of software engineers
- You have architected, built, and operated distributed systems to solve problems at high scale
- You have a BS/MS/PhD in a scientific field or equivalent experience
- You want to work in a fast-paced, high-growth startup environment that respects its engineers and customers
- You've shipped complex projects with teams of engineers
- You've worked at high scale with systems like Redis, Cassandra, Kafka
- You have significant experience with Go, C, or Python
Is this you? Let's chat!
Back to top