Site Reliability Engineer

We are looking for a Site Reliability Engineer (SRE) to support scaling and enhance the stability of our SaaS platform. If you have strong web application development knowledge, experience designing and implementing scalable systems, familiarity with networking fundamentals, and a disdain for tedious manual work then we want you on our team! We deal with many different technologies; a desire to learn and a hunger to work on challenging projects is a must. Our team troubleshoots and modifies production Java and Python web applications, interacts with RESTful APIs and SQL databases, automates common procedures, creates internal tooling, repairs systems and determines root cause analysis of outages, advises production teams on code optimizations, and creates meaningful alerts so that we can restore production systems before problems reach our customers. We write software solutions, not just apply band-aids.
 
 
Responsibilities
·       Design, develop, and improve application and server logging, metrics, and monitoring
·       Identify toil, document fix processes, and automate them away
·       Triage and troubleshoot production defects across “the stack”
·       Create actionable alerts to proactively fix system outages before they occur
·       Facilitate communication between Engineers, DevOps, and Business
·       Participate in on-call rotations
 
Skills
·       Comfortable reading and writing Python and Java in the context of web development
·       Knowledgeable about technical and philosophical principles of system design
·       Knows the ins and outs of RESTful APIs and web requests
·       Strong Linux background, especially use of CLI tools
·       Rapid identification and mitigation of problematic infrastructure and application code
·       Strong communication and an ability to clearly and concisely explain production outages and remediation efforts taken to non-technical parties
 
Technologies We Use
·       Python
·       Java
·       SQL/NoSQL
·       Git (Bitbucket)
·       Ansible
·       Jenkins
·       Kafka
·       RabbitMQ
·       Kubernetes
·       AWS
·       Elasticsearch
·       Greylog
·       Logstash
·       Prometheus
·       Grafana
·       Kibana
#LI-NEXTIVA1

Back to top