Software Engineer - Api Reliability

3+ months agoParis, France / Remote

About Datadog:

We're on a mission to build the best platform in the world for engineers to understand and scale their systems, applications, and teams.  We operate at high scale—trillions of data points per day—providing always-on alerting, metrics visualization, logs, and application tracing for tens of thousands of companies. Our engineering culture values pragmatism, honesty, and simplicity to solve hard problems the right way.


The Team:

At Datadog, our API Platform Engineers are strong developers as well as have a good background in systems; blending deep and practical knowledge of infrastructure design and deployments. They are at the frontline maintaining and expanding the capabilities of our web-facing applications and infrastructure, and focus on the stability, scalability, observability, deployments, and developer experience around public APIs.

The Opportunity:

We’re looking for Software Engineers to join our new API Reliability Engineering team. Today Datadog runs in multiple datacenters, with hundreds of engineers contributing to our APIs. As we continue to grow we have found challenges specific to the tooling our teams use to develop on our API’s. Additionally, this team needs to build reliable tooling to enable developers to easily depend on a uniform set of tooling that can also give them accurate observability into their APIs.


You Will:

  • Provide internal tooling and frameworks that empower teams to develop, maintain and manage web-facing applications.
  • Codify proven practices to improve developer experience and service reliability.
  • Explore new ways to strengthen, automate, deploy and manage our web facing infrastructure
  • Define the future of our API platform and supporting infrastructure.
  • Enable engineering teams to self-service, self-report day-to-day operations dealing with web-facing applications.



  • You have experience contributing to a software engineering team
  • Experience with distributed web applications and running 24/7 production environments
  • You have production experience with distributed web applications, e.g. haproxy, ALB, redis, authn/authz
  • You have a track record as an engineer in the operations of a large site
  • You value correctness and efficiency; you leave no stone unturned when diagnosing production issues
  • You handle infrastructure with code because automation lets you focus on the more difficult and rewarding problemsBonus Points:
  • You have experience with building large scale web-facing applications in a heterogeneous environment.
  • You are fully fluent in python or go.



Equal Opportunity at Datadog:

Datadog is an Affirmative Action and Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements.


Your Privacy:

Any information you submit to Datadog as part of your application will be processed in accordance with Datadog’s Applicant and Candidate Privacy Notice.

Job ID: 2193417