JOB RESPONSIBILITIES
- Follows software lifecycle, driving reliability, observability, and efficiency across product teams within your domain
- Identifies repeated toil and finds opportunities for automation and risk reduction
- On-call on a rotation to respond to production incidents and conduct blameless retros and root-cause analysis (RCAs) to drive a culture of continuous improvements
- Proactively identifies failures before it becomes an outage using chaos engineering techniques such as edge cases, failure modes, and DR
- Advises on capacity planning and provides continuous assessments on systems behavior and consumption working towards optimization and cost savings
- Works with product managers to identify and prioritize tech debt for reliability best practices (e.g. SLIs/SLOs/Error Budgets)
Want more jobs like this?
Get Software Engineering jobs in Menomonee Falls, WI delivered to your inbox every week.
QUALIFICATIONS
REQUIRED
- Bachelor's Degree or equivalent in MIS, Computer Science or related field
- 2+ years of experience in software development
- Have strong programming skills in one or more languages - Java, Python, Go or Node.js
- Experience working with one of major cloud platforms (GCP, AWS, or Azure)
- Experience supporting Mobile platforms
- Experience in one of more Observability platforms - Prometheus, InfluxDB, Grafana, ELK or APM
- Knowledge of application design patterns, event-driven architecture, database schemas, and testing strategies
- Experience with large scale application troubleshooting and performance tuning
- Knowledge and experience with continuous integration, continuous deployment, and test driven development
- Experience in at least one PasS & Containers - Openshift, Cloud Foundry, Kubernetes or equivalent
- Experience with one or more configuration management systems like Chef, Ansible, Puppet
- Good understanding of systems architecture, UNIX internals, networking topologies, multi-cluster applications, multi-tenant platforms, and systems/network security