Staff Reliability Engineer - IE07KE
We're determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals - and to help others accomplish theirs, too. Join our team as we help shape the future.
The Central Reliability and Automation CoE team is looking for a driven and highly motivated Reliability Engineering Coach to join the team. In this role you will be responsible for guiding and coaching Reliability Engineering teams in implementing best practices related to reliability, scalability, and performance of large-scale software systems and services both in the cloud and on premise. You will work closely with product owners, reliability engineers, development, operations, and other cross-functional teams to drive a culture of reliability and continuous improvement. We are committed to fostering a culture of reliability and continuous improvement. As a RE Coach, you will play a crucial role in helping our teams achieve their reliability goals and deliver high-quality services to our customers.
Want more jobs like this?
Get jobs in Hartford, CT delivered to your inbox every week.
This role will have a Hybrid work schedule, with the expectation of working in an office (Columbus, OH, Chicago, IL, Hartford, CT or Charlotte, NC) 3 days a week (Tuesday through Thursday).
Key Responsibilities:
- Coaching and Mentoring: Provide guidance and mentorship to RE teams, helping them adopt best practices in reliability engineering, incident management, and performance optimization.
- Performance Monitoring: Oversee the implementation and use of monitoring tools and practices to ensure systems are performing optimally. Provide guidance on setting up and maintaining service-level indicators (SLIs), service-level objectives (SLOs), and advise on service-level agreements (SLAs) while using Critical User Journeys and establishing Error Budgets.
- Training and Development: Develop and enhance the skills and knowledge of RE teams. This includes creating learning materials, conducting workshops, holding open office hours, participating in RE bootcamps and scaling existing RE Teams.
- Process Improvement: Identify areas for improvement in existing processes and workflows. Implement changes to enhance efficiency, reduce downtime, and improve overall system reliability.
- Collaboration: Work closely with development, operations, and other cross-functional teams to ensure alignment on reliability goals and objectives. Facilitate communication and collaboration between teams. Lead maturity assessment review procedure of RE Squads.
- Incident Management: Assist in the management of incidents, including root cause analysis, post-mortem reviews, and the implementation of corrective actions to prevent recurrence.
- Automation: Guide and build automation solutions to speed up product delivery and to eliminate toil in the reliability engineer's processes.
- Documentation: Maintain comprehensive documentation of processes, best practices, and lessons learned. Ensure that all team members have access to up-to-date information and resources.
- Hands On Development: Hands on development work to be delivered in a working Agile team of RE coaches. This could be through improving existing observability tools and processes, providing or enhancing automated solutions, improving existing reliability engineering workflows, or enhancing in-house products and solutions.
Candidate must be authorized to work in the US without company sponsorship. The company will not support the STEM OPT I-983 Training Plan endorsement for this position.
Qualifications:
- Bachelor's degree in computer science, Engineering, or a related field.
- 5+ years of experience in a technical role, with at least 2 years in a Site Reliability Engineering or similar position.
- 3-5 years of experience in languages such as Python, Ruby, Bash
- Experience developing and administering software in a predominantly Linux AWS/other cloud environment
- Demonstrated ability to coach and mentor teams, with a focus on reliability and performance.
- Strong technical background, with experience in software development, systems administration, or a related field.
- Demonstrable cross-functional knowledge with systems, storage, networking, security and databases
- Excellent written and verbal communication skills.
- Proven experience in a Site Reliability Engineering role or similar engineering position.
- Strong understanding of reliability engineering principles, including incident management, performance optimization, and monitoring.
- Expertise designing, scaling, analyzing and troubleshooting large-scale distributed systems.
- Experience in monitoring infrastructure and application service level objectives to ensure functional and performance objectives.
- Excellent communication and collaboration skills, with the ability to work effectively with cross-functional teams.
- Experience in developing and delivering training programs/workshops a plus
- Familiarity with tools and technologies commonly used in SRE, such as monitoring and logging tools, CI/CD pipelines, IaC, and cloud platforms.
- • Experience with container orchestration tools and container management (Docker, Kubernetes, etc.)
- Strong problem-solving skills and the ability to think critically and analytically.
- Experience and passion for working within a DevSecOps team culture
Compensation
The listed annualized base pay range is primarily based on analysis of similar positions in the external market. Actual base pay could vary and may be above or below the listed range based on factors including but not limited to performance, proficiency and demonstration of competencies required for the role. The base pay is just one component of The Hartford's total compensation package for employees. Other rewards may include short-term or annual bonuses, long-term incentives, and on-the-spot recognition. The annualized base pay range for this role is:
$126,160 - $189,240
Equal Opportunity Employer/Sex/Race/Color/Veterans/Disability/Sexual Orientation/Gender Identity or Expression/Religion/Age
About Us | Culture & Employee Insights | Diversity, Equity and Inclusion | Benefits