Senior Site Reliability Engineer

Today• Flexible / Remote

This job is no longer available.

Sailthru is one of the fastest growing SaaS companies in NYC. Our retail and publishing customers are among the largest organizations worldwide and they use our platform to provide a connected customer experience across email, web and mobile. We use data science to drive predictive marketing. We’re a technology team that:

Is composed of small collaborative teams across engineering, data science, data platforms, and ops
Solves challenges that make a real impact on the day-to-day operations of our 400 customers
Scales our platforms to handle billions of monthly inbound and outbound messages
Works in a leading-edge, technology focused environment
Makes significant contributions to the scalability of our technology and has a voice in the direction of our product and operations

How we work

We work in 2-week sprints, focused on improving the performance and reliability of our existing distributed systems or building new monitoring and alerting capabilities
We are the first line of response for system level incidents and work to quickly assess and resolve them
We share a 24x7 on-call rotation and we’re passionate about keeping it uneventful

Who you are

You have experience working at scale with distributed systems in production
You’re fascinated by reliability and helping others get better at it through application metrics, monitoring, alerting, logging, and building for resiliency
You’re interested in building products and tooling that support a diverse set of technologies
You're calm under pressure, and use data to make decisions and communicate clearly

How you’ll grow

You’ll get to work with a production environment with hundreds of machines across physical and cloud infrastructure
You’ll contribute to a toolset that supports our production workflow and helps other people within Sailthru build, test and deploy
You’ll learn how to make and implement architecture decisions that make systems more stable and performant
You'll build and help other teams build monitored, performant, and reliable applications

What we’d like

3+ years of experience in an SRE or DevOps role
5+ years of experience in Java required; knowledge of PHP a plus
3+ of production experience with SQL and NoSQL platforms
Experience with Ansible is a plus
Experience monitoring, analyzing, and tuning applications for scalability, performance, and resiliency

Want more jobs like this?

Get Science and Engineering jobs in Flexible / Remote delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.

Client-provided location(s): Flexible / Remote

Job ID: 522464

Employment Type: OTHER

Posted:

Perks and Benefits

Health and Wellness
Parental Benefits
Work Flexibility
Office Life and Perks
Vacation and Time Off
Financial and Retirement
Professional Development
Diversity and Inclusion

Want more jobs like this?

Perks and Benefits

Health and Wellness

Parental Benefits

Work Flexibility

Office Life and Perks

Vacation and Time Off

Financial and Retirement

Professional Development

Diversity and Inclusion