Software Engineer, SRE
- Toronto, Canada
The Site Reliability Engineering (SRE) team at Instacart helps other Engineering teams improve the reliability, security, resilience, and performance of the services they own. SREs use their expertise in how Instacart Infrastructure works to ensure other teams are getting maximum leverage from it. SRE also owns several of the processes (and related tooling) that contribute to reliability, including how we run incidents (and how we learn from them), how we make infrastructure changes safely, and how we ensure adequate capacity for future needs. SRE is a big part of our strategy of using software to solve infrastructure problems at scale.
SRE is a new team at Instacart - this is an opportunity to define, build, evangelize, and optimize our SRE practice!
ABOUT THE JOB
- Participate in an on-call rotation to support critical Instacart services
- Participate in the definition and management of SLOs and error budgets for the Engineering teams that own services in production
- Define standard practices and build tooling around incidents, postmortems, changes, and capacity and work with other Engineering teams to help them adopt these practices to improve their services
- Work with other teams around Engineering to understand their systems and their challenges and identify how they can better leverage Instacart Infrastructure to improve the services they own
- You have 5+ years of experience in an SRE or Infrastructure Engineering role
- You've supported services at scale in production
- You have experience solving infrastructure problems with software
- You have a big-picture perspective on systems and tools
- You can collaborate with other Engineering teams to understand their systems and help to improve them
- You have strong technical knowledge of cloud infrastructure, distributed systems, and reliability practices
Back to top