Principal Site Reliability Engineer

3+ months ago Alpharetta, GA

This job is no longer available.

Why UKG:

At UKG, the work you do matters. The code you ship, the decisions you make, and the care you show a customer all add up to real impact. Today, tens of millions of workers start and end their days with our workforce operating platform. Helping people get paid, grow in their careers, and shape the future of their industries. That's what we do.

We never stop learning. We never stop challenging the norm. We push for better, and we celebrate the wins along the way. Here, you'll get flexibility that's real, benefits you can count on, and a team that succeeds together. Because at UKG, your work matters-and so do you.

About the Team:

Site Reliability Engineers at UKG are critical team members that have a breadth of knowledge encompassing all aspects of service delivery. They develop software solutions to enhance, harden and support our service delivery processes. This can include building and managing CI/CD deployment pipelines, automated testing, capacity planning, performance analysis, monitoring, alerting, chaos engineering and auto remediation.

Want more jobs like this?

Get Science and Engineering jobs in Alpharetta, GA delivered to your inbox every week.

Job alert subscription


Site Reliability Engineers must be passionate about learning and evolving with current technology trends. They strive to innovate and are relentless in pursuing a flawless customer experience. They have an automate everything mindset, helping us bring value to our customers by deploying services with incredible speed, consistency, and availability.

About the Role

Site Reliability Engineers (SREs) at UKG play a critical role in delivering scalable, reliable, and secure services to our customers. As Principal SRE, you will be a force multiplier-combining deep software engineering expertise with systems knowledge to build robust automation, drive operational excellence, and elevate the overall reliability of our services.

This role is highly technical and hands-on. You will design and implement solutions that eliminate toil and optimize performance, including developing automated testing frameworks, intelligent alerting systems, and self-healing mechanisms.

Responsibilities

-Architect, develop, and maintain scalable automation, internal tools, health checks, monitoring, auto-remediation to improve service availability, reliability, latency, scalability, and system resiliency-ensuring services withstand failures and recover gracefully to maintain high availability.

-Lead incident response effort to minimize customer impact and reduce MTTx, including leading post-incident reviews to identify root causes and implement long-term solutions.

-Provide strategic guidance and design consultation throughout the full-service lifecycle-from architecture and capacity planning to production readiness-while establishing and enforcing SRE standards for system architecture, observability, incident response, and reliability metrics.

- Partnership closely with product, infrastructure, and engineering teams to integrate reliability goals into the development process.

- Mentor and guide engineers across the organization on reliability principles and best practices and serve as a reliability evangelist to drive cultural and operational changes that improve engineering velocity.

- Leverage generative AI agents and automation tools to enhance operational efficiency, automate health checks, incident detection and resolution, and drive innovative solutions in site reliability engineering.

- Define, implement, and measure SLIs and SLOs to guide reliability-focused engineering decisions.

Basic Qualifications

- Minimum 8 years of engineering experience, including 5+ years in Site Reliability, DevOps, or Production Engineering roles.

- Advanced proficiency in one or more programming languages (e.g., Python, Go, Java, or C++) with the ability to write production-grade software.

- Strong Linux systems expertise, including scripting, performance tuning, and debugging.

- Hands-on experience operating large-scale distributed systems in public cloud environments, preferably GCP.

- Deep knowledge of Kubernetes and container orchestration patterns in production environments.

- Experience with GitHub Actions and modern CI/CD practices.

- Deep experience with SLI/SLO design, service health instrumentation, and production telemetry.

- Proven ability to build dashboards and alerts using Splunk and Grafana.

- Strong understanding of observability systems, including: Metrics pipelines, Distributed tracing, Log aggregation, Alerting strategies and incident triage

- Familiarity with infrastructure-as-code tools (e.g., Terraform, Ansible).

-Experience building and supporting highly available, customer-facing systems.

- Experience working with generative AI agents or AI-driven automation tools to support incident management, monitoring, or operational workflows.

- Broad grounding in at least two of the following: Cloud Architecture, Nginx, Security, or Database Technologies

- Strong troubleshooting skills for complex system issues, with proven experience leading incident response efforts.

- Excellent communication and collaboration skills, with experience mentoring and guiding engineers.

Preferred Qualifications

- Experience implementing chaos engineering, load testing, and resilience modeling.

-Google Cloud Professional Architect Certification is a plus.

-Understanding of OpenTelemetry (metrics, tracing, logs) and its integration into observability pipelines.

Company Overview:

UKG is the Workforce Operating Platform that puts workforce understanding to work. With the world's largest collection of workforce insights, and people-first AI, our ability to reveal unseen ways to build trust, amplify productivity, and empower talent, is unmatched. It's this expertise that equips our customers with the intelligence to solve any challenge in any industry - because great organizations know their workforce is their competitive edge. Learn more at ukg.com.

Equal Opportunity Employer

UKG is an equal opportunity employer. We evaluate qualified applicants without regard to race, color, disability, religion, sex, age, national origin, veteran status, genetic information, and other legally protected categories. View The EEO Know Your Rights poster UKG participates in E-Verify. View the E-Verify posters here.

It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability.

Disability Accommodation in the Application and Interview Process

For individuals with disabilities that need additional assistance at any point in the application and interview process, please email UKGCareers@ukg.com.

The pay range for this position is $142,100 to $204,200, however, base pay offered may vary depending on skills, experience, job-related knowledge and location. This position is also eligible for a short-term incentive and a long-term incentive as part of total compensation. Information about UKG's comprehensive benefits can be reviewed on our careers site at https://www.ukg.com/about-us/careers/benefits

Client-provided location(s): Alpharetta, GA, Lowell, MA
Job ID: ukg-893379048898
Employment Type: OTHER
Posted: 2024-12-05T11:55:15

Perks and Benefits

  • Health and Wellness

    • Health Insurance
    • Health Reimbursement Account
    • Dental Insurance
    • Vision Insurance
    • Life Insurance
    • Short-Term Disability
    • Long-Term Disability
    • FSA
    • FSA With Employer Contribution
    • HSA
    • HSA With Employer Contribution
    • Fitness Subsidies
    • On-Site Gym
    • Virtual Fitness Classes
  • Parental Benefits

    • Birth Parent or Maternity Leave
    • Non-Birth Parent or Paternity Leave
    • Adoption Assistance Program
    • Family Support Resources
    • Adoption Leave
  • Work Flexibility

    • Flexible Work Hours
    • Remote Work Opportunities
    • Hybrid Work Opportunities
  • Office Life and Perks

    • Casual Dress
    • Happy Hours
    • Company Outings
    • Holiday Events
  • Vacation and Time Off

    • Paid Vacation
    • Unlimited Paid Time Off
    • Paid Holidays
    • Personal/Sick Days
    • Volunteer Time Off
  • Financial and Retirement

    • 401(K) With Company Matching
    • Company Equity
    • Performance Bonus
    • Profit Sharing
  • Professional Development

    • Tuition Reimbursement
    • Mentor Program
    • Shadowing Opportunities
    • Access to Online Courses
    • Internship Program
  • Diversity and Inclusion