Skip to main contentA logo with &quat;the muse&quat; in dark blue text.
Sailthru

Senior Site Reliability Engineer

Sailthru is one of the fastest growing SaaS companies in NYC. Our retail and publishing customers are among the largest organizations worldwide and they use our platform to provide a connected customer experience across email, web and mobile. We use data science to drive predictive marketing. We’re a technology team that:

  • Is composed of small collaborative teams across engineering, data science, data platforms, and ops
  • Solves challenges that make a real impact on the day-to-day operations of our 400 customers
  • Scales our platforms to handle billions of monthly inbound and outbound messages
  • Works in a leading-edge, technology focused environment
  • Makes significant contributions to the scalability of our technology and has a voice in the direction of our product and operations

Want more jobs like this?

Get Science and Engineering jobs delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.

How we work

  • We work in 2-week sprints, focused on improving the performance and reliability of our existing distributed systems or building new monitoring and alerting capabilities
  • We are the first line of response for system level incidents and work to quickly assess and resolve them
  • We share a 24x7 on-call rotation and we’re passionate about keeping it uneventful

Who you are

  • You have experience working at scale with distributed systems in production
  • You’re fascinated by reliability and helping others get better at it through application metrics, monitoring, alerting, logging, and building for resiliency
  • You’re interested in building products and tooling that support a diverse set of technologies
  • You're calm under pressure, and use data to make decisions and communicate clearly

How you’ll grow

  • You’ll get to work with a production environment with hundreds of machines across physical and cloud infrastructure
  • You’ll contribute to a toolset that supports our production workflow and helps other people within Sailthru build, test and deploy
  • You’ll learn how to make and implement architecture decisions that make systems more stable and performant
  • You'll build and help other teams build monitored, performant, and reliable applications

What we’d like

  • 3+ years of experience in an SRE or DevOps role
  • 5+ years of experience in Java required; knowledge of PHP a plus
  • 3+ of production experience with SQL and NoSQL platforms
  • Experience with Ansible is a plus
  • Experience monitoring, analyzing, and tuning applications for scalability, performance, and resiliency
Job ID: 522464
Employment Type: Other

This job is no longer available.

Search all jobs