Software Engineer - Web Reliability

3+ months agoBoston, MA / Remote

About Datadog:

We're on a mission to build the best platform in the world for engineers to understand and scale their systems, applications, and teams.  We operate at high scale—trillions of data points per day—providing always-on alerting, metrics visualization, logs, and application tracing for tens of thousands of companies. Our engineering culture values pragmatism, honesty, and simplicity to solve hard problems the right way.


The team:

The Web Reliability Engineering (WRE) Team drives the platformization of programmatic APIs at Datadog, which enables our product teams to build safe and reliable software. The engineers on this team use a blend of software engineering and infrastructure to create uniformity and standardization for external and internal APIs. As the owners of our high-scale web server, the reliability engineers on this team own the deployment and lifecycle of our APIs, giving our team the tools they need to build software that can scale with our growing customer base.



We are a globally distributed team with US Offices in New York (HQ), Boston, Denver, San Francisco, and South Bay and International Offices in Paris, Dublin, London, Madrid, the Netherlands, and Singapore. About 33% of our engineering team are remote.


Datadog values people from all walks of life. We understand that not everyone will meet these requirements on day one. If you’re passionate about reliability engineering and want to grow these skills but don’t meet all of these qualifications, we encourage you to apply.


You will:

  • Provide internal tooling and frameworks that empower teams to develop, maintain and manage web-facing applications.
  • Explore new ways to strengthen, automate, deploy and manage our web facing infrastructure.
  • Define the future of our API platform and supporting infrastructure.



  • 3+ years experience contributing to a software or platform engineering team
  • Experience with distributed web applications and running 24/7 production environments
  • Past work operating a large site or infrastructure
  • You value correctness and efficiency; you leave no stone unturned when diagnosing production issues
  • You handle infrastructure with code because automation lets you focus on the more difficult and rewarding problems


Bonus points:

  • You’ve built large scale, web-facing applications in a multi-cloud, multi-region environment (we use AWS, GCP and Azure).
  • Experience working with distributed systems infrastructure



#LI-Remote This is a remote position


Equal Opportunity at Datadog:

Datadog is an Affirmative Action and Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements.


Your Privacy:

Any information you submit to Datadog as part of your application will be processed in accordance with Datadog’s Applicant and Candidate Privacy Notice.

Job ID: 2241561