Principal Engineer (Performance and Reliability)

Yesterday• Auckland, New Zealand

Proposed role: Principal Engineer, System Reliability and Performance
This is a hands-on principal engineering role focused on improving Ascend's stability, performance, and operational maturity. The role will lead our approach to telemetry, observability, and proactive reliability engineering, helping us detect and resolve systemic issues before they impact customers.
The person in this role will work across services, databases, infrastructure, and engineering teams to improve how we measure system behaviour, diagnose performance problems, and prioritise reliability work. This is a critical role for reducing reactive incident response, improving customer experience, and supporting Ascend's growth safely and sustainably.

Title: Principal Engineer, System Telemetry and Performance
Role purpose
The Principal Engineer, System Telemetry and Performance, is a senior hands-on technical leader responsible for improving Ascend's stability, performance, and operational maturity at scale. This role will lead the strategy and execution for telemetry, observability, performance analysis, and proactive reliability engineering across the platform, helping teams identify and resolve issues before they impact customers.
This role is critical to protecting customer experience, supporting growth, and reducing operational risk for Ascend. The successful candidate will work across application, database, infrastructure, and delivery teams to build stronger engineering discipline around system behaviour, measurement, and performance improvement.
Key responsibilities

Lead platform-wide efforts to improve system stability, performance, telemetry, and operational visibility.
Define and evolve Ascend's observability strategy across metrics, logs, traces, alerting, dashboards, and diagnostics.
Investigate complex system performance issues across services, databases, infrastructure, and integrations.
Drive a more proactive operating model by identifying risks and bottlenecks before they become customer-facing incidents.
Partner closely with engineering teams to improve reliability under load and performance in production.
Establish standards, patterns, and engineering practices for performance testing, instrumentation, and operational readiness.
Use telemetry and production data to guide prioritisation, root cause analysis, and continuous improvement.
Support major incident diagnosis where needed, while reducing long-term dependence on reactive firefighting.
Influence architecture and platform decisions to improve scalability, resilience, and transparency of system behaviour.
Mentor engineers and technical leaders in performance thinking, observability, and reliability engineering practices.

Skills and experience

Want more jobs like this?

Get jobs in Auckland, New Zealand delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.

Proven experience operating at Principal or Staff Engineer level in complex, distributed software environments.
Deep expertise in observability, telemetry, monitoring, and performance engineering.
Strong hands-on capability across services, databases, APIs, infrastructure, and production diagnostics.
Demonstrated ability to identify systemic issues and implement durable engineering improvements.
Experience improving reliability and performance in growing SaaS or enterprise platforms.
Strong incident analysis and root cause investigation skills.
Ability to work cross-functionally and influence multiple teams without direct authority.
Strong communication skills, with the ability to translate technical risks and trade-offs for engineering and business leaders.
Track record of balancing strategic leadership with practical delivery and hands-on technical problem solving.

What success looks like

Improved visibility into system health, performance, and emerging risks.
Reduced customer-facing incidents caused by avoidable stability and performance issues.
Faster diagnosis and resolution of complex production problems.
Clear telemetry, alerting, and performance standards adopted across teams.
A stronger culture of proactive reliability engineering rather than reactive incident response.
Measurable improvement in platform resilience as Ascend continues to grow.

Client-provided location(s): Auckland, New Zealand

Job ID: Henry_Schein-R133316

Employment Type: FULL_TIME

Posted: 2026-03-31T19:00:46

Perks and Benefits

Health and Wellness
Parental Benefits
Work Flexibility
Office Life and Perks
Vacation and Time Off
Financial and Retirement
Professional Development
Diversity and Inclusion

Want more jobs like this?

Perks and Benefits

Health and Wellness

Parental Benefits

Work Flexibility

Office Life and Perks

Vacation and Time Off

Financial and Retirement

Professional Development

Diversity and Inclusion