Principal Cloud Service Reliability Engineer

    • Cambridge, MA

Meet Our Team:

As a member of Pega Service Reliability Team (SRT), you will be a key member responsible for managing mission critical services, hosted in a cloud environment, supporting fortune 50 clients. We encourage a culture of diversity, intellectual curiosity, problem solving and openness. Our team is staffed with people from a wide variety of backgrounds, experiences and perspectives. We encourage our team members to collaborate, think big and take risks in a blame-free environment. We consistently strive to create an environment that provides the support and mentorship needed to learn and grow.

Picture Yourself at Pega:

Cloud Service Reliability Engineers are expected to take an active role in maintaining Pega Cloud platform health and high performance. This role will focus on an Operational and product reliability aspect of Pega Cloud platform, especially on working on various operational tasks such as product upgrade, platform upgrade and troubleshooting the environment during incidents. You will perform the key role to bring operational reliability aspect back to core the product teams and work collaboratively to build more reliable and operational products. You will have the opportunity to work on complex problems and apply your expertise and experience to improve reliability of Pega Cloud Platform.

What You'll Do at Pega:

  • Take ownership of the systems you manage and operate and be responsible for problem discovery, analysis and resolution.
  • Handle Service Requests, Incidents, Alerts and phone calls within SLA
  • Proactively monitor and review Application and Infrastructure performance
  • Create and maintain operational runbooks
  • Help triage escalated support tickets
  • Build relationships with teams with-in and outside of the organization, foster the culture of collaboration and transparency
  • Influence product teams on defects, feature and enhancement requests to help build scalable, reliable, observable, available and highly performant services
  • Contribute to the overall product roadmap
  • Mentor and help other team members as required
  • Identify the needs and build tools to automate day-to-day operations


Who You Are:

  • Self-motivated, inquisitive and creative, with a passion for continuous improvement and excellent people skills
  • Ready to learn the Pega platform and have a mindset to solve complex challenges to implement SRE principles
  • The ability to work independently as well as with a team, lead by example, and demonstrate a commitment to quality development and operational practices

What You've Accomplished:

  • 10+ years of experience of developing distributed applications in a public cloud environment such as AWS, Azure or GCP
  • Knowledge of data structures, algorithms and system design architectures
  • Experience with object-oriented programming and/or scripting. We use Java and Python
  • Basic network troubleshooting skills (TCP/IP, routing, networking topologies etc.)
  • Experience working in Cloud based infrastructure - DB, Application, OS, and Network
  • Have become knowledgeable of configuration automation tools (Ansible, Chef, Puppet etc.)
  • Gained an understanding of the ITIL process

Pega Offers You:

  • Gartner Analyst acclaimed technology leadership across our categories of products
  • Continuous learning and development opportunities
  • An innovative, inclusive, agile, flexible, and fun work environment
  • Competitive global benefits program inclusive of pay + bonus incentive, employee equity in the company


Job ID: 10114


Back to top