Site Reliability Engineer

As a Site Reliability Engineer on Vanguard's Runtime Engineering team you'll have the opportunity to put your operational savvy-ness and engineering skills to work! On the job you'll be ensuring the \"-ilities\" (Availability, Reliability, Scalability, Usability; etc.) of our private and public cloud platforms in both test and production environments. You'll respond to incidents, apply upgrades to the platform and leverage a strategic thinking mindset to \"automate all the things\"(repetitive manual work is the worst!).

Additionally, you can anticipate working with real-time monitoring and diagnostic data, analyze trends, and plan for future infrastructure needs. As a caretaker of these platforms you'll be collaborating and planning activities with our internal development teams to ensure that application service level objectives are met. As the name might suggest, a passion for reliability is a must!

On the job you'll be...
•Maintaining, upgrading, and patching our private and public cloud platforms in test and production environments.
•Managing communications and coordinating change events with development and support teams.
•Identifying and resolving reliability issues and implementing long-term mitigation strategies - ideally through automation.
•Responding to production incidents and availability needs.
•Facilitating and documenting platform post-mortems.
•Training and mentoring junior staff members on reliability practices, processes and technologies.
•Participating in an off-hours on-call rotation
•Defining future state implementations for Microservice runtime platforms.

Duties & Responsibilities:

  • Ensures reliable operation of production and test environments.
  • Diagnoses and troubleshoots availability interruptions and other production issues.
  • Plans and coordinates enterprise-wide infrastructure projects with other IT and client teams.
  • Communicates with teams to keep them apprised of status and issues. Contacts vendors to resolve technical issues.
  • Tests, installs, and migrates, all software, patches, upgrades, applications, and/or hardware.
  • Develops technical standards. Tests and evaluates IT vendor products.
  • Writes documentation, including project plans, installation procedures, and troubleshooting tips. Creates diagrams, including technical topology.
  • Maintains, monitors, and tunes Production system and applications performance. Debugs source code and performance problems and/or provides debugging assistance to developers.
  • Identifies opportunities to improve system and applications performance (e.g., automating manual system tasks).
  • Trains and mentors staff. Resolves complex issues elevated from staff with less experience.
  • Adds, updates, and closes IT Problem Management database records. Researches and resolves complex issues, and reviews related technology records to mitigate impact on assigned system.
  • Reviews numerous IT knowledge repositories to update technical knowledge.
  • Learns and understands client area business functions and requirements. Has the ability to determine the appropriate technical tool to address the client's business needs.
  • Thoroughly understands and complies with IT policies and procedures, especially those for quality and productivity standards that enable the team to meet established client service levels.
  • Thoroughly understands and complies with Information Security policies and procedures, and verifies deliverables meet Information Security and VSA requirements. Participates in special projects and performs other duties as assigned.

Education & Experience:

  • Minimum of 3+ yrs of overall technical engineering experience
  • Bachelor's Degree preferred or equivalent technical experience
  • A deep understanding and practical experience with managing at least one service container based platform (ex. Docker, Kubernetes, AWS ECS, Cloud Foundry, OpenShift, etc).
  • Experience maintaining and monitoring distributed systems.
  • Understanding and application of monitoring, alerting and visualization solutions (ex. Splunk, ELK, Nagios, Grafana...).
  • Deep knowledge of Linux systems and cloud platforms/providers
  • Strong oral and written communication skills
  • Passion for problem solving and strategic thinking and a desire to own and execute
  • Advanced understanding and application of at least one scripting language (Shell, PHP, Python; etc.)
  • Basic development experience in a language such as Java is considered a plus
  • A flexible schedule - some activities you'll be performing may require off-hours or weekend support

Special Factors
Remote and off hours support: See additional information for the specific requirements for this posting.
On call: See additional information for the specific requirements for this posting.

Vanguard is not offering visa sponsorship for this position

Meet Some of Vanguard's Employees

Claire O.

Brokerage Investment Professional, Malvern, PA

Claire ensures that Vanguard clients have all the important and necessary industry information in order to make the best decisions for their personal investments.

Mohammad S.


Mohammad helps build digital website tools that answer important questions for company clients in an effort to eliminate lengthy phone calls for easily answerable questions.

Back to top