Site Reliability Engineer

This position is for a Site Reliability Engineer on our Retail Environments team.

The goal of this position include:

  • Continually improving the monitoring, alerting, and automation to increase environments reliability, availability, performance, and overall system health.
  • Champion the best Monitoring setup/Control reporting,Create/Update Procedures for Enterprise wide adoption
  • Employ Analytics & Define/Report Trends
  • Canary, Smoke Test and Manage Chaos Injection
  • Experiment, recommend new tools where needed
  • Identify process gaps and implement process improvements to increase operational efficiency.
  • Work with different groups to develop and improve monitors for products and infrastructure.
  • Collaborate and lead both within the team and across the organization.
  • Work with operations / Cloud Ops teams to ensure applications and services are highly available and reliable.
  • Modernizing the environments tools and technologies to support the IT strategic goals of continuous delivery and cloud migration.
  • Automating and implementing the build and maintenance of the environments for applications.

Additional details on qualifications include:

  • Knowledge of Linux systems and cloud platforms/providers
  • Strong Troubleshooting Skillset
  • Scripting experience
  • Hands on experience with the following tools: Splunk, App Dynamics & Other Monitoring Tools
  • Experience with Atlassian suite (Jira, Bitbucket, Bamboo, Confluence), Git, Maven, Jenkins, Selenium, Nexus, Artifactory
  • Understanding of key monitoring concepts (4 Golden Signals, RED, USE)
  • Java Development experience
  • Experience configuring (or administering) application or infrastructure monitoring tools
  • Development experience in scripting/query languages
  • Experience with dashboarding tools

Duties and Responsibilities:

1. Provides intermediate level system analysis, design, development, and implementation of applications and databases for mainframe-, client/server-, Web-, and/or PC-based software or middleware. Integrates third party products.
2. Translates technical specifications, and/or logical and physical design into code for new or enhancement projects for internal clients. Develops code and test artifacts that reuse subroutines or objects, is well structured, backed by automated tests, includes sufficient comments and is easy to maintain. Writes programs, appropriate test artifacts, ad hoc queries, and reports. Employs contemporary software development techniques to ensure tests are implemented in a way that supports automation.
3. Elevates code into the development, test, and Production environments on schedule. Provides follow up Production support. Submits change control requests and documents.
4. Follows software development methodology. Follows architecture standards.
5. Participates in design, code, and test Inspections throughout life cycle to identify issues. Participates in other meetings, such as those for use case creation.
6. Participates in systems analysis activities, including system requirements analysis and definition (e.g., prototyping), and logical and physical design.


  • Undergraduate degree in a related field or the equivalent combination of training and experience.
  • Minimum 3 years developer or systems analyst experience.

Intermediate knowledge of the following development practices and concepts:
•Quality assurance methodology and Inspections
•Use case standards
•Systems analysis and design techniques
•System/subsystem requirements
•Libraries, reusable code, and/or object oriented standards
•Screen, report, and query design

Vanguard is not offering sponsorship for this position.

Back to top