Lead Software Engineer, Site Reliability Engineering (IoT Kubernetes Chick-fil-A Cloud)
Chick-fil-A's Site Reliability Engineering team exists to ensure that IT products are delivered with well-engineered and reliable designs. This group of passionate of engineers work alongside enterprise architects, software engineers and security architects as members of product delivery teams. The focus of the SRE team is to be a force multiplier on software engineering teams and to ensure that they spend the maximum time delivering differentiating business functionality.
As an SRE, you will have the responsibility to influence the designs of Chick-fil-A's IT products and build self-healing capabilities into them. You will make the machines work for you and ensure that we squeeze every last drop of system performance out of the product.
Get to know the team: This group of passionate of engineers work alongside enterprise architects, software engineers and security architects as members of product delivery teams. They ensure that IT products are delivered with well-engineered and reliable designs. The goal of the SRE team is to be a force multiplier on software engineering teams and ensure that they spend the maximum time delivering differentiating business functionality.
See us in action:
Watch this video and article of how we are utilizing kubernetes to deploy microservices on the edge in Chick-fil-A restaurants.
- Establish Error Budgets for the products by monitoring SLIs, measuring SLOs and publishing them to a dashboard
- Design, build and implement software features for the product that increase reliability, availability and performance
- Own the pipeline of deployments to production, this includes establishing and maintaining the CI/CD pipeline for the product
- Identify and solve critical problems and build automation to prevent their recurrence. Drive blameless post-mortems with the product team and use the Error Budget to establish priorities for any necessary changes. The goal is to have automated responses to all known failure scenarios.
- Review architectural designs for products to ensure reliable and performant design patterns are implemented
- Ensure availability, performance, efficiency, change management, monitoring, emergency response, and capacity planning are covered for delivered products
- Develop specific engineering specializations and consult with software engineering teams
- Examples: Networking, Linux OS, Security, Data Persistence, Containers, etc.
- Participate in on-call rotation in support of critical products
- 3 years' work experience in a similar job role
- Proven software engineering experience in Java or Python
- Familiarity with running and scaling distributed software systems (load balancing, high availability, systems monitoring, etc.)
- Proven application production support experience
- Strong analytical and problem-solving skills
- Passion for automating rote tasks
- Bachelor's degree, preferably in Computer Science or related engineering field of study
- Experience with AWS ecosystem
- Preferred services are API Gateway, Boto3, Lambda, Elastic Beanstalk, RDS, DynamoDB, ECS, Cloud Front, Athena
- Experience administering and/or designing databases - SQL and NoSQL
- Understanding of networking: TCP, UDP, firewalls, DNS, OSI layers, etc.
- Experience with Docker
- Experience with log analysis and monitoring tools such as Splunk, Logstash, etc.
Minimum Years of Experience
Preferred Level of Education
Back to top