Skip to main contentA logo with &quat;the muse&quat; in dark blue text.
Vanguard

Site Reliability Lead, Specialist

Scottsdale, AZ

Provides subject matter expertise in the maintenance of a reliable site environment, to ensure the stability and security of multiple systems/platforms. Develops and implements improvements for all aspects of software reliability.

Core Responsibilities

1. Collaborates with internal teams to evaluate the health, stability and reliability of systems/platforms. Provides subject matter expertise on architecture and programming design decisions related to availability and resilience.
2. Leads localized failure modes when new features and architecture patterns are introduced. Facilitates post-incident reviews for any client-impacting events local to the product family.
3. Leads the planning and execution of chaos experiments to meet the development and maintenance requirements of systems/platforms for the product family. Coordinates performance tests for the product family.

Want more jobs like this?

Get Data and Analytics jobs delivered to your inbox every week.

Select a location
By signing up, you agree to our Terms of Service & Privacy Policy.

4. Leads product teams in triage and troubleshooting during client impacting incidents.
5. Ensures alignment between service level indicators and objectives within the product family.
6. Maintains product-level runbooks for incident response, in collaboration with SRE Practitioners on each product team, to document the step-by-step process to recover from specific components within a system. Makes final decisions regarding usage of tools, libraries, and standards for SRE in situations where multiple options have been provided by SRE.
7. Participates in special projects and performs other duties as assigned.

Additional Details

This position is the initial role in a new Site Reliability team that will define and implement best practices for observability, establish and maintain service level indicators (SLIs) and service level objectives (SLO), tracking and addressing toil, conducting blameless root cause post-mortems, and incorporating preventative and proactive SRE practices. This will include working with Architects, Data Engineers, and Data Analysts to identify root causes, resolve issues, optimize existing systems, enhance infrastructure, and promote automation to reduce effort and increase reliability.

Additional Responsibilities
  • Works closely with leaders to establish and iteratively implement the SRE practice.
  • Gain insights into PI CDAO operations, demonstrates and champions site reliability culture and practices, builds relationships, and influences SRE ways of working.
  • Exhibits deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices with the ability to implement these practices within an application or platform.
  • Communicates progress, issues, and solutions to management and business clients to obtain their input or buy-in as appropriate. Provides written and verbal communication to multiple organizations and audiences within Vanguard on the status of assigned projects and issues.
  • Elevates more complex problems and client issues/concerns when necessary and follows up to ensure resolution.
  • Maintains proactive knowledge and understanding of pending elevations, enhancements, and infrastructure changes. Proactively identifies potential failure points and designs strategies to ensure that failures remain localized, preventing widespread disruption and contagion.
Qualifications
  • Minimum of eight years related experience, with at least two years of development experience.
  • Undergraduate degree or equivalent combination of training and experience. Graduate degree preferred.
  • 3-5+ years of Site Reliability Engineering experience
  • 3-5+ years of DevOps experience
  • Strong analytic and problem-solving skills.
  • Self-motivated individual with the ability to prioritize and manage changing priorities.
  • Extensive knowledge and understanding of working in AWS and with Python and SQL.
  • Proficiency and experience in observability, and telemetry tools such as Splunk, CloudWatch, Grafana, Datadog, etc.
Special Factors

Sponsorship

Vanguard is not offering visa sponsorship for this position.

About Vanguard

We are Vanguard. Together, we're changing the way the world invests.

For us, investing doesn't just end in value. It starts with values. Because when you invest with courage, when you invest with clarity, and when you invest with care, you can get so much more in return. We invest with purpose - and that's how we've become a global market leader. Here, we grow by doing the right thing for the people we serve. And so can you.

We want to make success accessible to everyone. This is our opportunity. Let's make it count.

Inclusion Statement

Vanguard's continued commitment to diversity and inclusion is firmly rooted in our culture. Every decision we make to best serve our clients, crew (internally employees are referred to as crew), and communities is guided by one simple statement: "Do the right thing."

We believe that a critical aspect of doing the right thing requires building diverse, inclusive, and highly effective teams of individuals who are as unique as the clients they serve. We empower our crew to contribute their distinct strengths to achieving Vanguard's core purpose through our values.

When all crew members feel valued and included, our ability to collaborate and innovate is amplified, and we are united in delivering on Vanguard's core purpose.

Our core purpose: To take a stand for all investors, to treat them fairly, and to give them the best chance for investment success.

How We Work

Vanguard has implemented a hybrid working model for the majority of our crew members, designed to capture the benefits of enhanced flexibility while enabling in-person learning, collaboration, and connection. We believe our mission-driven and highly collaborative culture is a critical enabler to support long-term client outcomes and enrich the employee experience.

Client-provided location(s): Scottsdale, AZ, USA; Charlotte, NC, USA; Malvern, PA 19355, USA; Texas, USA
Job ID: Vanguard-160380
Employment Type: Full Time

Perks and Benefits

  • Health and Wellness

    • FSA
    • HSA
    • Health Reimbursement Account
    • Fitness Subsidies
    • On-Site Gym
    • HSA With Employer Contribution
    • Health Insurance
    • Dental Insurance
    • Vision Insurance
    • Life Insurance
    • Short-Term Disability
    • Long-Term Disability
    • Mental Health Benefits
    • Virtual Fitness Classes
    • Pet Insurance
  • Parental Benefits

    • Non-Birth Parent or Paternity Leave
    • Birth Parent or Maternity Leave
    • Fertility Benefits
    • Adoption Assistance Program
    • Family Support Resources
    • Adoption Leave
  • Work Flexibility

    • Flexible Work Hours
    • Hybrid Work Opportunities
  • Office Life and Perks

    • Company Outings
    • Commuter Benefits Program
    • Casual Dress
    • Happy Hours
    • Snacks
    • Some Meals Provided
    • On-Site Cafeteria
  • Vacation and Time Off

    • Personal/Sick Days
    • Paid Holidays
    • Paid Vacation
    • Volunteer Time Off
    • Leave of Absence
  • Financial and Retirement

    • Relocation Assistance
    • Performance Bonus
    • 401(K) With Company Matching
    • 401(K)
    • Financial Counseling
    • Profit Sharing
  • Professional Development

    • Promote From Within
    • Mentor Program
    • Shadowing Opportunities
    • Access to Online Courses
    • Tuition Reimbursement
    • Internship Program
    • Lunch and Learns
    • Leadership Training Program
  • Diversity and Inclusion

    • Diversity, Equity, and Inclusion Program
    • Employee Resource Groups (ERG)

Company Videos

Hear directly from employees about what it is like to work at Vanguard.