Sailthru is one of the fastest growing SaaS companies in NYC. Our retail and publishing customers are among the largest organizations worldwide and they use our platform to provide a connected customer experience across email, web and mobile. We use data science to drive predictive marketing. We’re a technology team that:
- Is composed of small collaborative teams across engineering, data science, data platforms, and ops
- Solves challenges that make a real impact on the day-to-day operations of our 400 customers
- Scales our platforms to handle billions of monthly inbound and outbound messages
- Works in a leading-edge, technology focused environment
- Makes significant contributions to the scalability of our technology and has a voice in the direction of our product and operations
Want more jobs like this?
Get Science and Engineering jobs delivered to your inbox every week.
How we work
- We work in 2-week sprints, focused on improving the performance and reliability of our existing distributed systems or building new monitoring and alerting capabilities
- We are the first line of response for system level incidents and work to quickly assess and resolve them
- We share a 24x7 on-call rotation and we’re passionate about keeping it uneventful
Who you are
- You have experience working at scale with distributed systems in production
- You’re fascinated by reliability and helping others get better at it through application metrics, monitoring, alerting, logging, and building for resiliency
- You’re interested in building products and tooling that support a diverse set of technologies
- You're calm under pressure, and use data to make decisions and communicate clearly
How you’ll grow
- You’ll get to work with a production environment with hundreds of machines across physical and cloud infrastructure
- You’ll contribute to a toolset that supports our production workflow and helps other people within Sailthru build, test and deploy
- You’ll learn how to make and implement architecture decisions that make systems more stable and performant
- You'll build and help other teams build monitored, performant, and reliable applications
What we’d like
- 3+ years of experience in an SRE or DevOps role
- 5+ years of experience in Java required; knowledge of PHP a plus
- 3+ of production experience with SQL and NoSQL platforms
- Experience with Ansible is a plus
- Experience monitoring, analyzing, and tuning applications for scalability, performance, and resiliency