Site Reliability Engineer - IT Operations

Join us at Slack! The IT Operations team is looking for our first Site Reliability Engineer to build and manage our infrastructure that supports our growing team. We are looking for an engineer that has a passion for creating streamlined systems with an emphasis in scalable growth. The strongest candidates will have both solid Linux / Server Management expertise and demonstrated automation chops. Slack counts on our IT Operations team to provide exceptional availability, scalability and security for our internal infrastructure. You'll be expected to learn, grow and help others along their journey as well.

Responsibilities

  • Manage the availability, scalability and performance of Slack’s corporate-facing IT platforms
  • Design and document systems, including writing and reviewing code, to automate away problems within your squad’s domain
  • Diagnose and repair network and application bottlenecks
  • Test and tune network, hardware, and cloud infrastructure configurations to maximize availability and performance
  • Deploy and manage monitoring and diagnostic tools

Requirements

  • Comfortable with scaling production systems and technologies, for example load balancing, monitoring, distributed systems, and configuration management
  • Experience with Unix systems administration including solid scripting skills in Ruby, PHP or Python
  • Experience planning/deploying/running various types of AWS infrastructure (Route 53, S3, EC2, VPC, RDS, etc)
  • Experience with DevOps tooling (Chef, Puppet, etc.)
  • Experience with deployment and management of open source Network resources and tools (OpenVPN,  Logstash, Zenoss, Nagios, etc.)
  • BS/BA in Technical Field, Computer Science, Mathematics or equivalent work experience 
  • Ability to gracefully react to high-priority requirements with little or no notice, providing clear documentation and follow-through
  • Strong organizational and task management skills
  • Ability to balance contending priorities while working alone or within a team
  • Excellent reliability, dependability, and trustworthiness
  • Strong attention to detail and accuracy
  • Great people and communication skills are required! You should be driven to solve people's problems

Bonus points

  • Work experience in a startup environment
  • Experience with Casper

Back to top