Senior Infrastructure Engineer, Site Reliability

The Site Reliability Engineering team is charged with ensuring and improving the availability of the site through improving tools, processes, and communication. SREs are exposed to the whole technology stack, from code down to the infrastructure so that they can drive improvements in how the stack is designed, deployed, and operated. We want to make it possible for product teams to launch in production with certainty through guidance on how to leverage the power of our Platform to achieve a high level of quality and reliability.

What the Job Entails

  • Ensure and improve uptime and performance SLOs are met through improving tools, processes, and collaboration
  • Write tools in PHP, Scala, Python, and shell script to automate all the things
  • Drive a constant feedback cycle between product and platform teams through product lifecycle to identify error-prone or inefficient processes for automation and produce deterministic behavior in services
  • Conduct reviews of code and technical designs with an eye towards reliability, scalability, and observability
  • Lead incident resolution to reduce downtime and increase clarity in communications
  • Participate in 24x7 on-call support rotation

Our Ideal Candidate

  • 4+ years of professional experience in server-side website development or 6+ years in a DevOps role
  • Strong foundation in OOP, design patterns, algorithms, and programming languages
  • Comfortable with operations and decision making in large scale production environments
  • Deep understanding of Linux (we run CentOS), networks, MySQL, distributed caches, configuration management, and storage in production environments
  • Track record of root cause analysis in fast-paced production environments
  • Strong experience in application and application performance scaling
  • World-class written and verbal communicator
  • BS Degree in Computer Science or similar technical field

Back to top