Senior Site Reliability Engineer

New York

Sailthru is one of the fastest growing SaaS companies in NYC. Our retail and publishing customers are among the largest organizations worldwide and they use our platform to provide a connected customer experience across email, web and mobile. We use data science to drive predictive marketing. We’re a technology team that:


  • Is composed of small collaborative teams across engineering, data science, data platforms, and ops
  • Solves challenges that make a real impact on the day-to-day operations of our 400 customers
  • Scales our platforms to handle billions of monthly inbound and outbound messages
  • Works in a leading-edge, technology focused environment
  • Makes significant contributions to the scalability of our technology and has a voice in the direction of our product and operations


How we work

  • We share a 24x7 on-call rotation and we’re passionate about keeping it uneventful
  • We work in 2-week sprints, focused on improving the performance and reliability of our existing distributed systems or building new monitoring and alerting capabilities
  • We are the first line of response for system level incidents and work to quickly assess and resolve them

Who you are

  • You have experience working at scale with distributed systems in production
  • You’re fascinated by reliability and helping others get better at it through application metrics,  monitoring, alerting, logging, and building for resiliency
  • You’re interested in building products and tooling that support a diverse set of technologies
  • You're calm under pressure, and use data to make decisions and communicate clearly


How you’ll grow

  • You’ll get to work with a production environment with hundreds of machines across physical and cloud infrastructure
  • You’ll contribute to a toolset that supports our production workflow and helps other people within Sailthru build, test and deploy
  • You’ll learn how to make and implement architecture decisions that make systems more stable and performant
  • You'll build and help other teams build monitored, performant, and reliable applications


What we’d like

  • Multiple (3+) years experience in an SRE or DevOps role
  • Multiple years (5-7) of experience in Java and PHP
  • Multiple years (3+) of production experience with SQL and NoSQL platforms
  • Experience monitoring, analyzing, and tuning applications for scalability, performance, and resiliency
  • Experience with Ansible is a plus


Back to top