Skip to main contentA logo with &quat;the muse&quat; in dark blue text.
The Hartford

Reliability Engineering Coach (Remote)

Chicago, IL

Staff Reliability Engineer - IE07KE

We're determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals - and to help others accomplish theirs, too. Join our team as we help shape the future.

The Central Reliability and Automation team is looking for a driven and highly motivated Staff Reliability Engineer Coach to join the team. In this role you will have responsibility for designing and maintaining a given IT solution for CI/CD pipeline, observability suite (monitoring/alerting/logging tools/processes) and automation suite consumed by REs, and Software Engineers. The Site Reliability Engineer will work with the consumers and stakeholder of the solution to define functional and non-functional requirements for the service. Leveraging Open Source or Commercial of the Shelf (COTS) products, they will design, build and maintain the solution, meet current and future demand. They will apply key SRE tenets across the life-cycle of the solution.

Want more jobs like this?

Get Software Engineering jobs delivered to your inbox every week.

Select a location
By signing up, you agree to our Terms of Service & Privacy Policy.


A prerequisite to the role will be a "build-to-manage", problem-solving and innovative mindset applied to the design, build, test, deploy, change and maintenance of services drawing from deep engineering expertise. Key measures of success will include service stability, effective delivery and environment instrumentation, deployment quality, technical debt reduction, asset resiliency, risk/security compliance, cost efficiency, proactive and preventative maintenance mechanisms, top quartile operating norms. The Senior Site Reliability Engineer will actively contribute to sustained advancement of the SRE practice within and beyond a given area of responsibility.

Responsibilities:

  • Influence and design architecture, infrastructure, standards and methods for large-scale cloud systems
  • Engage in and improve the software development life-cycle through CI/CD; Improve build to deployment process to establish greater reliability and a sustainable release process; Oversee release gating; establish deployment metrics (DORA)
  • Monitor and develop SLOs and SLIs through customer user journey; Advise on SLA; Establish error budgets
  • Observability and custom monitoring tool integrations; introduce telemetry to support SLOs
  • Automate system scalability and continually work to improve system resiliency, performance and efficiency; Makes recommendations for design changes for improved reliability
  • Deploy software through highly available practices, rolling, blue-green or canary
  • Provide mentorship to reliability engineering squads under a consistent framework for the Development, Testing and Alerting processes
  • Practice sustainable incident response through blameless RCA and postmortems
  • Advise performance testing and capacity planning
  • Communicate proactively with colleagues and formally present work product outcomes and risk analysis to product team and management.
  • Follow the Agile/Scrum working methodologies
  • Establish dashboarding for monitoring capabilities and metrics

Qualifications:

  • 7+ years of experience in related field
  • 3-+ years of experience in languages such as Python, Ruby, Bash, Perl
  • BS degree in Engineering, Computer Science, or equivalent practical experience
  • Experience in monitoring infrastructure and application service level objectives to ensure functional and performance objectives.
  • Experience in implementing service dashboards for monitoring. objectives, and metrics
  • Experience developing and/or administering software in AWS cloud infrastructure
  • System administration skills, including automation and orchestration of environments using Terraform or CloudFormation and configuration management
  • Demonstrable cross-functional knowledge with systems, storage, networking, security and databases
  • Experience with container orchestration tools and container management (Docker, Kubernetes, etc.)
  • Proficiency with continuous integration and continuous delivery tooling and practices
  • Strong analytical and troubleshooting skills; Experience with runbooks

Preferred Qualifications:

  • Expertise designing, analyzing and troubleshooting large-scale distributed systems.
  • Systematic problem-solving approach coupled with strong communication skills and a sense of ownership and drive
  • Experience in implementing Infrastructure as code
  • Experience building software and maintaining systems in a highly secure, regulated or compliant industry
  • Experience and passion for working within a DevSecOps team culture

Additional Details:

  • Must be authorized to work in the US without company sponsorship.
  • This role can have a Hybrid or Remote work arrangement. Candidates who live near one of our office locations will have the expectation of working in an office 3 days a week (Tuesday through Thursday). Candidates who do not live near an office will have a remote work arrangement, with the expectation of coming into an office as business needs arise.

Compensation

The listed annualized base pay range is primarily based on analysis of similar positions in the external market. Actual base pay could vary and may be above or below the listed range based on factors including but not limited to performance, proficiency and demonstration of competencies required for the role. The base pay is just one component of The Hartford's total compensation package for employees. Other rewards may include short-term or annual bonuses, long-term incentives, and on-the-spot recognition. The annualized base pay range for this role is:

$126,160 - $189,240

Equal Opportunity Employer/Females/Minorities/Veterans/Disability/Sexual Orientation/Gender Identity or Expression/Religion/Age

About Us | Culture & Employee Insights | Diversity, Equity and Inclusion | Benefits

Client-provided location(s): Chicago, IL, USA; Hartford, CT, USA
Job ID: hartford-R2416319
Employment Type: Full Time

Perks and Benefits

  • Health and Wellness

    • Health Insurance
    • Health Reimbursement Account
    • Dental Insurance
    • Vision Insurance
    • Life Insurance
    • Short-Term Disability
    • Long-Term Disability
    • FSA With Employer Contribution
    • HSA With Employer Contribution
    • On-Site Gym
    • Mental Health Benefits
    • Virtual Fitness Classes
    • Fitness Subsidies
  • Parental Benefits

    • Birth Parent or Maternity Leave
    • Non-Birth Parent or Paternity Leave
    • Fertility Benefits
    • Adoption Assistance Program
    • Family Support Resources
    • Adoption Leave
  • Work Flexibility

    • Hybrid Work Opportunities
    • Remote Work Opportunities
    • Flexible Work Hours
  • Office Life and Perks

    • Commuter Benefits Program
    • Casual Dress
    • On-Site Cafeteria
    • Company Outings
    • Holiday Events
  • Vacation and Time Off

    • Paid Vacation
    • Paid Holidays
    • Volunteer Time Off
    • Personal/Sick Days
  • Financial and Retirement

    • 401(K) With Company Matching
    • Stock Purchase Program
    • Performance Bonus
    • Relocation Assistance
    • Financial Counseling
    • Profit Sharing
  • Professional Development

    • Internship Program
    • Leadership Training Program
    • Associate or Rotational Training Program
    • Tuition Reimbursement
    • Promote From Within
    • Mentor Program
    • Shadowing Opportunities
    • Access to Online Courses
    • Lunch and Learns
    • Learning and Development Stipend
  • Diversity and Inclusion

    • Employee Resource Groups (ERG)
    • Diversity, Equity, and Inclusion Program