Responsibilities
- Traige production incidents and attempt to mitigate or escalate to service support teams as necessary.
- This is not a helpdesk position where you are just creating tickets, but an opportunity to grow into a Site Reliability Engineering role.
- While your primary focus would be to triage and manage production incidents, there are plenty of opportunities to:
- Work with infrastructure, product and platform engineering teams to operate and deploy software services.
- Participate in capacity planning.
- Maintain sustainable reliability and scalability of software systems by improving automation to measure and monitor availability, latency and overall system health.
- Consistently evolve systems by pushing for changes that improve system reliability and release velocity.
Want more jobs like this?
Get Software Engineering jobs in Sydney, Australia delivered to your inbox every week.
- Practice sustainable incident response and Root Cause Analysis (RCA).
Qualifications
- BS degree in Computer Science, Computer Engineering, Electrical Engineering or relevant majors with 2+ years of working experience.
- Experience in scripting languages but not limited to: Bash, Python
- Experience in working with Unix Linux systems from kernel to shell and beyond;
- Experience in analyzing and debugging production issues at scale.
- Understanding of infrastructure-as-code concepts, approaches, methods, and tooling.
Preferred Qualifications:
- Hands on experience with large cloud providers such as AWS, Azure, GCP.
- Code Infrastructure and Service Orchestration tools such as Terraform, Ansible.