Staff Cloud Operations Specialist- Eng
Why UKG:
At UKG, the work you do matters. The code you ship, the decisions you make, and the care you show a customer all add up to real impact. Today, tens of millions of workers start and end their days with our workforce operating platform. Helping people get paid, grow in their careers, and shape the future of their industries. That's what we do.
We never stop learning. We never stop challenging the norm. We push for better, and we celebrate the wins along the way. Here, you'll get flexibility that's real, benefits you can count on, and a team that succeeds together. Because at UKG, your work matters-and so do you.
About the Team:
We are seeking a skilled and experienced Problem Manager to join our enterprise Site Operations organization. In this role you will own the end-to-end problem management lifecycle for our SaaS production environment, lead blameless root-cause investigations, and drive systemic engineering and operational improvements that increase platform availability, reliability, and customer satisfaction. You will partner closely with SRE, Engineering, Product, Customer Support, and Cloud Infrastructure teams to convert incident learnings into durable fixes and measurable reliability gains.
About the Role:
• Own and govern the problem management process: identification, triage, prioritization, remediation tracking and closure.
Want more jobs like this?
Get jobs in Plantation, FL delivered to your inbox every week.

• Lead facilitation of blameless post-incident reviews and structured RCA sessions (e.g., 5 Whys, Fishbone).
• Produce and maintain high-quality postmortems and remediation plans; ensure timely execution and verification of corrective actions.
• Translate operational failures into prioritized engineering work and track closure through to verification.
• Monitor and analyze incident trends and recurring failure modes; recommend and coordinate systemic mitigations.
• Align problem management with SLAs/SLOs, error budget practices, and availability targets.
• Drive cross-functional accountability and escalate material reliability risks to leadership with clear impact analysis.
• Partner with Observability, Release Engineering, and Security teams to close monitoring, testing, and dependency gaps.
• Define and track problem-management metrics and reports for leadership (e.g., problem aging, action completion rate, recurring incident rate).
• Maintain compliance with governance and change-control requirements applicable to enterprise SaaS operations.
About You:
Basic Qualifications:
- 5+ years of experience in SaaS operations, SRE, incident response, or problem management in enterprise environments.
- Bachelor's degree in Computer Science, Information Systems, Engineering, or equivalent experience.
- Demonstrated experience leading RCAs and driving cross-functional remediation in cloud-native systems (AWS, Azure, or GCP).
- Strong working knowledge of distributed systems concepts, microservices, containers (Kubernetes/Docker), CI/CD, and Infrastructure as Code.
- Proficiency with observability and incident tooling (examples: Datadog, Prometheus, Splunk, New Relic, or equivalent) and ITSM platforms (e.g., ServiceNow, JSM, PowerBI, Appfire).
- Proven ability to influence engineering teams and stakeholders without direct authority and to manage competing priorities.
Preferred Qualifications
- ITIL v4 certification or equivalent experience implementing ITIL-aligned practices.
- Experience in multi-region, high-availability SaaS deployments and with formal SLO/SLA/error budget management.
- Familiarity with chaos engineering, capacity planning, and reliability engineering practices.
- Experience working in regulated industries or environments with strict compliance/audit requirements.
Core Competencies:
• Analytical Rigor: Data-driven problem identification and trend analysis.
• Technical Fluency: Ability to validate technical root causes and remediation.
• Facilitation & Leadership: Run blameless reviews and cross-team follow-through.
• Risk & Stakeholder Management: Communicate impact, priorities, and mitigation plans to business leaders.
• Process Orientation: Implement and improve problem governance, controls, and metrics.
Success Metrics (KPIs)
• Reduction in recurring incident rate and repeat failures.
• Improvement in remediation action closure and verification rates.
• Reduction in average problem lifecycle time and problem aging.
• Tangible improvement in SLO/SLA attainment or error budget utilization.
• Increased quality and completeness of postmortem documentation.
Company Overview:
UKG is the Workforce Operating Platform that puts workforce understanding to work. With the world's largest collection of workforce insights, and people-first AI, our ability to reveal unseen ways to build trust, amplify productivity, and empower talent, is unmatched. It's this expertise that equips our customers with the intelligence to solve any challenge in any industry - because great organizations know their workforce is their competitive edge. Learn more at ukg.com.
Equal Opportunity Employer
UKG is an equal opportunity employer. We evaluate qualified applicants without regard to race, color, disability, religion, sex, age, national origin, veteran status, genetic information, and other legally protected categories.
View The EEO Know Your Rights poster
UKG participates in E-Verify. View the E-Verify posters here.
It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability.
Disability Accommodation in the Application and Interview Process
For individuals with disabilities that need additional assistance at any point in the application and interview process, please email UKGCareers@ukg.com.
The pay range for this position is $102,300.00 to $147,050.00 USD. The actual base pay offered may vary depending on skills, experience, job-related knowledge and work location. In addition to base pay, employees may be eligible to participate in a performance-based bonus plan and to receive restricted stock unit awards as part of total compensation. Learn more about UKG's benefits and rewards at https://www.ukg.com/about-us/careers/benefits
Perks and Benefits
Health and Wellness
- Health Insurance
- Health Reimbursement Account
- Dental Insurance
- Vision Insurance
- Life Insurance
- Short-Term Disability
- Long-Term Disability
- FSA
- FSA With Employer Contribution
- HSA
- HSA With Employer Contribution
- Fitness Subsidies
- On-Site Gym
- Virtual Fitness Classes
Parental Benefits
- Birth Parent or Maternity Leave
- Non-Birth Parent or Paternity Leave
- Adoption Assistance Program
- Family Support Resources
- Adoption Leave
Work Flexibility
- Flexible Work Hours
- Remote Work Opportunities
- Hybrid Work Opportunities
Office Life and Perks
- Casual Dress
- Happy Hours
- Company Outings
- Holiday Events
Vacation and Time Off
- Paid Vacation
- Unlimited Paid Time Off
- Paid Holidays
- Personal/Sick Days
- Volunteer Time Off
Financial and Retirement
- 401(K) With Company Matching
- Company Equity
- Performance Bonus
- Profit Sharing
Professional Development
- Tuition Reimbursement
- Mentor Program
- Shadowing Opportunities
- Access to Online Courses
- Internship Program
Diversity and Inclusion