Principal Engineer (Performance and Reliability)
Proposed role: Principal Engineer, System Reliability and Performance
This is a hands-on principal engineering role focused on improving Ascend's stability, performance, and operational maturity. The role will lead our approach to telemetry, observability, and proactive reliability engineering, helping us detect and resolve systemic issues before they impact customers.
The person in this role will work across services, databases, infrastructure, and engineering teams to improve how we measure system behaviour, diagnose performance problems, and prioritise reliability work. This is a critical role for reducing reactive incident response, improving customer experience, and supporting Ascend's growth safely and sustainably.
Title: Principal Engineer, System Telemetry and Performance
Role purpose
The Principal Engineer, System Telemetry and Performance, is a senior hands-on technical leader responsible for improving Ascend's stability, performance, and operational maturity at scale. This role will lead the strategy and execution for telemetry, observability, performance analysis, and proactive reliability engineering across the platform, helping teams identify and resolve issues before they impact customers.
This role is critical to protecting customer experience, supporting growth, and reducing operational risk for Ascend. The successful candidate will work across application, database, infrastructure, and delivery teams to build stronger engineering discipline around system behaviour, measurement, and performance improvement.
Key responsibilities
- Lead platform-wide efforts to improve system stability, performance, telemetry, and operational visibility.
- Define and evolve Ascend's observability strategy across metrics, logs, traces, alerting, dashboards, and diagnostics.
- Investigate complex system performance issues across services, databases, infrastructure, and integrations.
- Drive a more proactive operating model by identifying risks and bottlenecks before they become customer-facing incidents.
- Partner closely with engineering teams to improve reliability under load and performance in production.
- Establish standards, patterns, and engineering practices for performance testing, instrumentation, and operational readiness.
- Use telemetry and production data to guide prioritisation, root cause analysis, and continuous improvement.
- Support major incident diagnosis where needed, while reducing long-term dependence on reactive firefighting.
- Influence architecture and platform decisions to improve scalability, resilience, and transparency of system behaviour.
- Mentor engineers and technical leaders in performance thinking, observability, and reliability engineering practices.
Want more jobs like this?
Get jobs in Auckland, New Zealand delivered to your inbox every week.

- Proven experience operating at Principal or Staff Engineer level in complex, distributed software environments.
- Deep expertise in observability, telemetry, monitoring, and performance engineering.
- Strong hands-on capability across services, databases, APIs, infrastructure, and production diagnostics.
- Demonstrated ability to identify systemic issues and implement durable engineering improvements.
- Experience improving reliability and performance in growing SaaS or enterprise platforms.
- Strong incident analysis and root cause investigation skills.
- Ability to work cross-functionally and influence multiple teams without direct authority.
- Strong communication skills, with the ability to translate technical risks and trade-offs for engineering and business leaders.
- Track record of balancing strategic leadership with practical delivery and hands-on technical problem solving.
- Improved visibility into system health, performance, and emerging risks.
- Reduced customer-facing incidents caused by avoidable stability and performance issues.
- Faster diagnosis and resolution of complex production problems.
- Clear telemetry, alerting, and performance standards adopted across teams.
- A stronger culture of proactive reliability engineering rather than reactive incident response.
- Measurable improvement in platform resilience as Ascend continues to grow.
Perks and Benefits
Health and Wellness
Parental Benefits
Work Flexibility
Office Life and Perks
Vacation and Time Off
Financial and Retirement
Professional Development
Diversity and Inclusion