Site Reliability Engineer - Tanzu Observability
- Los Angeles, CA
Job Description
The Elevator Pitch: Why will you enjoy this new opportunity?
The Tanzu Observability by Wavefront (TObs) Site Reliability Engineering team is growing with a laser focus ensuring the SaaS offering operates reliability and at scale. We are looking for someone who will play a key role in operating the world's best real-time data collection and visualization system and work closely with developers to provide them with systems that increase their productivity. You will support thousands of cloud instances in multiple regions at scale and share your learnings and best practices with others. You have an exposure to cloud platforms, linux systems and automation experience. Finally, you are experienced in working remotely within a fully remote and distributed team.
What is the primary need, technical challenge, and/or problem you will be responsible for?
We need someone who has considerable working knowledge of infrastructure, cloud deployment and configuration automation with good knowledge of infrastructure as a code. Ideal candidate shall contribute to development of software that will drive improvements towards the availability, management, and visibility of TObs services. In this role, you will take part in the on-call rotation for running the TObs services and drive improvements to continuously increase the signal-to-noise ratio. Additionally, you will contribute to the development of tools for metrics gathering, monitoring and automated remediation and orchestration.
Additionally, we are looking for someone who is willing to obtain security clearance.
Success in the Role: What are the performance goals over the first 6-12 months you will work toward completing?
- Quickly understand the architecture and start contributing towards team goals.
- Be part of on call rotations and help in alert remediations.
- Develop, deliver and improve automation projects duly assigned
- Work with the larger team and collaborate towards effective delivery of team objectives.
- You will deploy and maintain production services using container technology such as Kubernetes or EKS or ECS.
- "Infrastructure as code" is something we vouch by. Terraform and Ansible are the two tools which we use extensively. Candidate is expected to understand and contribute towards furthering automation causes in this area.
- Additionally we use Python and Java for some product scripting/functional scenarios. knowledge of any of these will be added plus.
- Have a keen eye to learn, understand and contribute towards the reliability of Tanzu Observability service.
- Be part of discussions on automation and effectively suggest improvements.
- Be part of on call and effectively contribute towards betterring mean remediation time.
- Passionate about learning new technologies and tools to manage production services, effectively keeping SLAs and MTTR in mind at all times.
- Understand wavefront architecture, discover failure points and work with other teams to prevent issues in the future.
- Align yourself with team objectives, raise the right queries and be an effective contributor towards overall production reliability.
The hiring manager for this role is Elisa Binette, Senior Manager of Site Reliability Engineering overseeing the Wavefront SRE group, a critical component of the Reliability Engineering program for Tanzu products under the Run, Manage and Observe banner. Elisa recently joined VMware to add her SRE industry expertise to this fast growing group. Prior to this role, Elisa has spent almost two decades leading engineering teams across a broad range of industries and technologies.
What are the benefits and perks of working at VMware?
You and your loved ones will be supported with a competitive and comprehensive benefits package. Below are some highlights, or you can view the complete benefits package by visiting benefits.vmware.com.
- Employee Stock Purchase Plan
- Medical Coverage, Retirement, and Parental Leave Plans for All Family Types
- Generous Time Off Programs
- 40 hours of paid time to volunteer in your community
- Rethink's Neurodiversity program to support parents raising children with learning or behavior challenges, or developmental disabilities
- Financial contributions to your ongoing development (conference participation, trainings, course work, etc.)
- Healthy and local inspired snacks in all our pantries
This position is eligible for TanzuChallenge referral campaign
#TeamTanzu
VMware is an Equal Opportunity Employer and Prohibits Discrimination and Harassment of Any Kind: VMware is committed to the principle of equal employment opportunity for all employees and to providing employees with a work environment free of discrimination and harassment. All employment decisions at VMware are based on business needs, job requirements and individual qualifications, without regard to race, color, religion or belief, national, social or ethnic origin, sex (including pregnancy), age, physical, mental or sensory disability, HIV Status, sexual orientation, gender identity and/or expression, marital, civil union or domestic partnership status, past or present military service, family medical history or genetic information, family or parental status, or any other status protected by the laws or regulations in the locations where we operate. VMware will not tolerate discrimination or harassment based on any of these characteristics. VMware encourages applicants of all ages. VMware will provide reasonable accommodation to employees who have protected disabilities consistent with local law.
Back to top