Site Reliability Engineer

WHY WE NEED YOU

PagerDuty is the leading digital operations management platform for businesses. Our global teams work together, iterate constantly, and solve complex problems to help our 8,000+ customers deliver beautiful customer experiences powered by happy teams and healthy software. PagerDuty is a place where you can do some of the most interesting, impactful, challenging, and exciting work of your career.

At PagerDuty, we need to be up when our customers are down. The stability, performance, and resilience of our infrastructure are of paramount importance. We rely on the Site Reliability Engineering team to maintain the platforms and services that our development teams count on to deliver a four-9s experience. Whether it's provisioning, continuous integration/deployment, monitoring, or cloud platform management, SREs provide the foundation upon which the PagerDuty product is built. As a member of the SRE team, you will maintain, optimize, and troubleshoot the PagerDuty infrastructure of today while designing and architecting the platform of tomorrow.

HOW YOU CONTRIBUTE TO OUR VISION  

  • \t
    You architect, build, and automate the cloud production infrastructure on which PagerDuty runs.
    \t
  • \t
    You set up and implement security policies that protect us and our customers.
    \t
  • \t
    You partner with Engineering stakeholders to design and deliver a reliable, scalable, secure, and performant platform.
    \t
  • \t
    You continuously strive to improve the customer experience: Full lifecycle support (creation, development, deployment, retirement), observability, flexible connectivity, and monitoring.
    \t
  • \t
    You share your expertise with the entire Engineering organization.
    \t
  • You participate in a 24/7 on-call rotation. And yes, we use PagerDuty to manage our on-call schedules.

ABOUT YOU

  • \t
    You have solved multiple problems by writing code to automate your way out of them. You have replaced manual processes time and time again with your code.
    \t
  • \t
    You have been responsible for running critical services that multiple customers depend upon. You understand the importance and impact that operational optimization can have on a product and the positive ripple effects that it can have across an entire engineering organization.
    \t
  • \t
    You have pulled back the covers and know how this Internet thing works end-to-end. Networks, servers, protocols, operating systems, services, databases, query optimization, disks: To you, nothing is a black box.
    \t
  • \t
    You believe CI servers, push-button deploys, time-series datastores, metrics dashboards, and centralized logging are not just "nice to haves," they are critical pieces of infrastructure that rapidly pay for themselves. You are familiar with the tool-space and can suggest products in each of these areas.
    \t
  • \t
    You are empathetic: You take others' opinions into account and clearly communicate your thoughts to reach technical solutions quickly.
    \t
  • \t
    You consider it important to understand and appreciate your customers, and enjoy seeing your work improve the work of others.
    \t

MINIMUM QUALIFICATIONS

  • \t
    Excellent knowledge of at least one scripting language, such as Ruby, Python, or Perl
    \t
  • \t
    Experience managing an AWS-based, cloud-native infrastructure and its foundational services, including EC2, S3 and other storage options, VPCs, IAM, and more
    \t
  • \t
    Knowledge of at least one configuration management system (e.g. Puppet, Chef, Ansible or cfengine)
    \t
  • \t
    Experience running containers in a production environment
    \t
  • \t
    Applicants must be currently authorized to work in the United States on a full-time basis
    \t

PREFERRED QUALIFICATIONS

  • \t
    Experience with infrastructure as code 
    \t
  • \t
    Experience with a cluster manager 
    \t
  • \t
    Experience with Splunk or other log analysis platforms
    \t

BENEFITS TO GET EXCITED ABOUT

  • \t
    Competitive salaries and company equity
    \t
  • \t
    Comprehensive benefits package including: medical, dental, and vision plans for you, your spouse and family; 401K, pre-tax commuter benefits, corporate discounts, cell phone allowance and more!
    \t
  • \t
    Generous parental leave, paid vacation (3 weeks vacation your first year, 4 weeks afterwards) in addition to 12 paid holidays and ample sick leave.
    \t
  • \t
    Monthly company wide hack days
    \t
  • \t
    Catered lunch daily plus breakfast on Wednesdays, and plenty of snacks and drinks
    \t
  • \t
    Convenient, central Toronto office location, easily accessible to public transportation
    \t

PagerDuty does not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.

PagerDuty is committed to providing reasonable accommodations for qualified individuals with disabilities in our job application process upon request.  


Meet Some of PagerDuty's Employees

Sweta A.

Engineering Manager

Managing three teams of engineers—the Front-End UI Team, the Mobile Team, and the Platform Team—Sweta balances her focus between people management, career growth for her engineers, and advancing business goals.

JD C.

Software Engineer

As a full-stack developer working with the Data Teams, JD is in a hybrid development role, splitting his time between back-end storage services and front-end customer-oriented projects.


Back to top