Skip to main contentA logo with &quat;the muse&quat; in dark blue text.

Technical Senior Principal | Site Reliability Engineer | Kubernetes

5 days ago Arlington, TX

JOB DESCRIPTION

Why GMF Technology?

GM Financial is set to change the auto finance industry and is leading the path of embarking on tech modernization - we have a startup mindset, and preserve our small company culture, in a public company environment with financial stability and intense growth over a decade-plus history. We are data junkies and trust in data and insights to advance our business objectives. We take our goal of zero emission, zero collision, zero congestion, and zero friction very seriously. We believe as an auto finance market leader we are in the driver's seat to lead us in the GM EV mission to change the world. We are building global platforms, in LATAM, Europe, China, U.S. and Canada- and we are looking to grow our high-performing team. GMF is comprised of over 10,000 team members globally. Join our fintech culture within a Blue-Chip company where we are changing the way we use technology to support our customers, dealers and business.

Want more jobs like this?

Get jobs in Arlington, TX delivered to your inbox every week.

Job alert subscription


Flexible hybrid work environment (onsite 3 days a week/2 days remote) at our Arlington (AOC1), TX office.

RESPONSIBILITIES

About the Role
As a Senior Principal SRE, you will be the technical bar‑raiser for our centralized Kubernetes platform-setting strategy, owning reliability at fleet scale, and leading cross‑org engineering to deliver a self‑service, secure, and compliant platform. You will partner with Architecture, BPS, Cloud Ops, and Cyber to turn our roadmap into durable, automated capabilities that product teams adopt with minimal toil.

Top Outcomes You Will Drive

  • Fleet‑level reliability strategy for shared and dedicated clusters, defining SLOs/SLIs and error budgets for the platform and golden patterns, with automated enforcement and reporting.
  • Self‑service at scale: deliver Namespace‑as‑a‑Service and developer‑portal workflows that shrink onboarding from weeks to hours and unlock safe autonomy for product teams.
  • Observability by default: land built‑in cluster/workload dashboards (Splunk APM + Azure Monitor/App Insights) and a robust RCA/Problem‑Management loop that closes the gap between incidents and engineering improvements.
  • Multi‑cloud readiness: guide centralized Kubernetes deployment expansion to AWS and design portable patterns (identity, networking, GitOps) that remain cloud‑agnostic.
  • Secure networking & policy: lead adoption of Calico Enterprise (DNS‑based policy, honey pods, central policy mgmt.) and staged rollout of stretched mesh/identity‑based access across clusters.
  • Path to a Kubernetes-as-a-Serverless : influence the architecture that abstracts K8s, integrates pre‑connected services, and enforces governance/consistency with a service catalog and on‑demand APIs.
  • Scale the operating model: codify the RACI, reduce reactive workload, shift‑left with support enablement, and build automation that lets a small core team support a large fleet.

Core Responsibilities

  • Own multi‑cluster reliability: capacity modeling, failure domain strategy, upgrade design (blue/green, surge, or secondary‑cluster) and chaos/DR exercises across shared & dedicated environments.
  • Define and implement platform SLOs/SLIs (control plane, base stack, onboarding, GitOps, network policy propagation, secret/cert rotation) with automated alerts and error‑budget policies.
  • Lead the design/implementation of Namespace‑as‑a‑Service; measure adoption, lead time, and customer effort score.
  • Establish GitOps standards (Argo CD) for app and cluster configuration, including bootstrap, drift detection, and progressive delivery (blue/green, canary).
  • Architect and land Calico/Tigera Enterprise and/or service mesh patterns (east‑west controls, identity‑based policies, multi‑cluster traffic mgmt.), with guardrails and paved‑road configs.
  • Lead security & compliance by default: SR controls, RBAC baselines (Azure RBAC/workload identity), cert‑manager automation, patch cadence, and auditable change pipelines.
  • Serve as principal‑level incident commander and RCA owner for platform incidents; convert findings into backlog items, patterns, and training.
  • Partner with the necessary teams to scale operations and refine RACI; implement charge/show‑back models for high‑touch migrations when appropriate.
  • Mentor Staff/Principal engineers; raise the bar on design docs, ADRs, runbooks, and knowledge sharing across the platform and product teams.


QUALIFICATIONS

What makes you a dream candidate?

Knowledge and Skills

  • Deep experience with GitOps (Argo CD), service mesh (Istio/Linkerd), Calico/Tigera, cert‑manager, secret engines, and workload identity.
  • Strong IaC/automation: Terraform, Azure DevOps (YAML), CI/CD policy gates, automated security controls.
  • Observability at scale: Splunk APM, Azure Monitor, Application Insights; golden dashboards and SLO pipelines.
  • Distributed systems fundamentals: performance, scalability, capacity, and reliability.
  • Excellent communication; ability to lead across org boundaries and mentor senior engineers.

Experience and Education

  • High School Diploma or equivalent required
  • Bachelor's Degree or Associate Degree plus 2 additional years of relevant experience required
  • 12+ years in related function(s) required
  • 5-7 years of experience leading through mentorship in related field required
  • 5-7 years of experience driving thought leadership and innovation across products required

Preferred Skills

  • Multi‑cluster and multi‑region upgrade strategies (surge/blue‑green), active‑active patterns, and zero‑downtime migrations.
  • Network policy at scale (DNS‑based policies), L7 authorization, east‑west security controls.
  • Self‑service developer portals and onboarding workflows; measuring adoption and customer effort.
  • FinOps for Kubernetes (charge/show‑back, pod‑level cost breakdown), quota guardrails, and capacity/right‑sizing automation.
  • Experience with Kubernetes platform abstraction and curated service catalogs.
  • Expert in SRE: SLO/SLI design, error budgets, incident command, RCA/Problem Management, chaos/DR.

What We Offer: Generous benefits package available on day one to include: 401K matching, bonding leave for new parents (12 weeks, 100% paid), tuition assistance, training, GM employee auto discount, community service pay and nine company holidays.
Our Culture: Our team members define and shape our culture - an environment that welcomes innovative ideas, fosters integrity, and creates a sense of community and belonging. Here we do more than work - we thrive.
Compensation: Competitive pay and bonus eligibility
Work Life Balance: Flexible hybrid work environment, 2-days a week in office
#LI-DW1 #LI-Hybrid #GMFjobs

Client-provided location(s): Arlington, TX
Job ID: GM_Financial-1151
Employment Type: FULL_TIME
Posted: 2025-10-24T19:27:29

Perks and Benefits

  • Health and Wellness

    • Health Insurance
    • Dental Insurance
    • Vision Insurance
    • Life Insurance
    • Short-Term Disability
    • Long-Term Disability
    • FSA
    • FSA With Employer Contribution
    • HSA
    • HSA With Employer Contribution
    • Mental Health Benefits
    • Fitness Subsidies
  • Parental Benefits

    • Birth Parent or Maternity Leave
    • Non-Birth Parent or Paternity Leave
    • Adoption Leave
  • Work Flexibility

    • Remote Work Opportunities
    • Hybrid Work Opportunities
  • Office Life and Perks

    • Happy Hours
    • Company Outings
    • On-Site Cafeteria
    • Holiday Events
  • Vacation and Time Off

    • Paid Vacation
    • Paid Holidays
    • Personal/Sick Days
    • Leave of Absence
    • Volunteer Time Off
  • Financial and Retirement

    • 401(K) With Company Matching
    • Performance Bonus
    • Profit Sharing
  • Professional Development

    • Tuition Reimbursement
    • Promote From Within
    • Mentor Program
    • Shadowing Opportunities
    • Access to Online Courses
    • Lunch and Learns
    • Internship Program
    • Leadership Training Program
  • Diversity and Inclusion

    • Unconscious Bias Training
    • Employee Resource Groups (ERG)

Company Videos

Hear directly from employees about what it is like to work at GM Financial.