Site Reliability Engineer
- Atlanta, GA
VMware's Command Center team continues to grow! We are looking for the right candidate to start their journey with VMware by becoming a member of the Command Center. Our team is the focal point for the success of our enterprise class SaaS service offerings across all of VMware. This position requires experience fostering relationships across different business units, developing integrations between monitoring and collaboration tools with service owners, establishing processes to help vCC SREs exercise good judgment.
The Command Center Engineer is a member of the highly visible Cloud Services Operations team and is a core member of the VMware Engineering Services (VES). As a member of the Command Center, you will be functioning in a world-class team respected for its innovation, execution and collaboration operating a world-wide organization. The team ensures continuity of VMware SaaS Services that impacts any significant disruption of normal operations of our enterprise service offerings and operates 24/7/365 days a year. The Command Center is expected to provide a reliable service with an enterprise level SLA and must strive for 100% customer support satisfaction. The primary objective of this team is to oversee and ensure critical applications and services provided are available and working as expected for customers and subscribers. The secondary objective is to develop and improve existing service monitoring tools through additional integrations, automation and collaboration.
The ideal candidate serves as the focal point for the success of our enterprise class SaaS service offerings across all of VMware providing technical skill and knowledge to the command center. Along with working on complex issues where analysis of situations or data requires an in-depth evaluation of various factors this role will be required to help level up the technical skills of the team, develop tools and automation for VMware services, and assist services in automated problem resolution. The ideal candidate will have strong technical background with cloud computing, various operating systems (Windows, Linux), enterprise virtualization software (vSphere), exposure to serverless and containerization as well as development skills in modern scripting language like python, ruby, or go.
- Strong communication skills with an ability to relay incident details expeditiously, concisely, and accurately
- Proficient leading remote online collaborative meetings adhering to project management principles and documentation
- Strong organizational skills with extremely high level of attention to detail
- Highly motivated, quality conscious self-starter that requires little to no supervision, able to own tasks from start to finish
- Customer focused - Investigates and resolves customer issues and inquiries (i.e., emergency and non-emergency)
- Identify, receive, triage and act upon events and incidents coming from various SaaS services
- Consistently meets or exceeds established Command Center key performance indicators (KPI's)
- Work per escalation, notification and incident practices
- Monitor the availability or the CI/CD environments
- Working under pressure in production environments running production customer workloads and services
- Previous knowledge or strong desire to learn about crisis management issues.
- Ability to work with geographically disperse teams part of a world-wide operations team
Success in this role requires very strong technical and communication skills, a broad background and understanding of every layer of the software development life-cycle, SaaS ecosystems, and the ability to identify issues and escalate to the correct Dev Ops team. The ability to work independently and as part of a specialized team in diverse environment is a requirement.
- 2-3 years of experience working with production SaaS/Cloud based systems.
- 2-3 years building and supporting enterprise IT solutions
- Experience with building and improving world class monitoring and alerting systems
- Domain knowledge of systems management: ITSM and ITIL framework
- Experience working with Atlassian products: Jira, Jira Service Desk, Confluence, Statuspage, etc.
- Experience working with escalation applications: Pagerduty, VictorOps, OpsGenie, etc.
- Experience working with communication tools: Slack, MS Teams, Azendoo, Hipchat
- Experience working with internal or external notification tools: Statuspage.io, status.io
- Demonstrable computer science expertise (i.e. Degree of Higher Learning, previous work experience, etc.)
Additional Qualifications - Preferred Candidates
- VMware Product Knowledge in Virtualization (vSphere, vCenter) and other VMware Product families
- Next generation application delivery and deployment methods, containerization, orchestration, etc.
- Automation background (Rest-API, Ruby, Chef and more)
- Good working knowledge of at least one public cloud such as AWS, Azure or GCP.
- Experience with infrastructure configuration and/or workflows tools; i.e. Puppet, Ansible, etc.
- Knowledge of build automation and continuous integration/delivery processes and tools: Git, Gerrit, Jenkins, Docker, Nexus, Artifactory, Selenium.
- Experience developing in or supporting one or more of the following languages: Python, Java, Go and/or NodeJS
VMware Inc. is an equal opportunity employer and prohibits discrimination and harassment of any kind . All applicants will be treated fairly and given equal opportunity in all aspects of the employment, including compensation and benefits, irrespective of their race, color, nationality, religion, sex, sexual orientation, marital status, age, disability, or ethnic origin.
Category : Engineering and Technology
Subcategory: Site Reliability
Experience: Manager and Professional
Full Time/ Part Time: Full Time
Posted Date: 2020-10-08
VMware Company Overview: At VMware, we believe that software has the power to unlock new opportunities for people and our planet. We look beyond the barriers of compromise to engineer new ways to make technologies work together seamlessly. Our cloud, mobility, and security software form a flexible, consistent digital foundation for securely delivering the apps, services and experiences that are transforming business innovation around the globe. At the core of what we do are our people who deeply value execution, passion, integrity, customers, and community. Shape what's possible today at http://careers.vmware.com.
Equal Employment Opportunity Statement: VMware is an Equal Opportunity Employer and Prohibits Discrimination and Harassment of Any Kind: VMware is committed to the principle of equal employment opportunity for all employees and to providing employees with a work environment free of discrimination and harassment. All employment decisions at VMware are based on business needs, job requirements and individual qualifications, without regard to race, color, religion or belief, national, social or ethnic origin, sex (including pregnancy), age, physical, mental or sensory disability, HIV Status, sexual orientation, gender identity and/or expression, marital, civil union or domestic partnership status, past or present military service, family medical history or genetic information, family or parental status, or any other status protected by the laws or regulations in the locations where we operate. VMware will not tolerate discrimination or harassment based on any of these characteristics. VMware encourages applicants of all ages. Vmware will provide reasonable accommodation to employees who have protected disabilities consistent with local law.
Back to top