Site Reliability Engineer, Tanzu Observability
- Los Angeles, CA
The Elevator Pitch: Why will you enjoy this new opportunity?
The Tanzu Observability Site Reliability Engineering team is growing with a laser focus ensuring the SaaS offering operates reliability and at scale. We are looking for someone who will play a key role in building and operating the world's best real-time data collection and visualization system and work closely with developers to provide them with systems that increase their productivity. You will support thousands of cloud instances in multiple regions at scale and share your learnings and best practices with others. You have a strong cloud platform, linux systems and automation experience, and knowledge of running workloads at scale. Finally, you are experienced in working remotely within a fully remote and distributed team.
What is the primary need, technical challenge, and/or problem you will be responsible for?
We need someone who is passionate about automation, infrastructure as a code and configuration as a code, hands-on experience with Terraform and Ansible is desirable. You can develop and deploy software that will help drive improvements towards the availability, management, and visibility of our services. In this role, you will take part in the on-call rotation for running the Tanzu Observability services and implement improvements to continuously increase the signal-to-noise ratio. Additionally, you will contribute to the development of tools for metrics gathering, introspection, monitoring, automated remediation and orchestration
Success in the Role: What are the performance goals over the first 6-12 months you will work toward completing?
- You will demonstrate a commitment to reducing Mean Time to Resolution (MTTR), solving each technical issue with the goal of taking steps to ensure it doesn't happen again.
- You will support continuous improvements in our products by providing opinionated input in feature workstreams.
- You will drive assigned projects to completion, being clear when tradeoffs are needed and deadlines need to be adjusted to accommodate higher-priority work.
- You will demonstrate knowledge of cloud architecture security, scaling, and management principles and have experience working with AWS, GCE or Azure cloud infrastructures.
- You will deploy and maintain production services using container technology such as Kubernetes or EKS or ECS.
- Act as a strong team contributor, working collaboratively with the Wavefront product engineering team, as well as strong scoping and project execution.
- Passionate about learning new technologies and adopting the right tools to manage these services in production, keeping SLAs and MTTR in mind at all times.
- Understand Tanzu Observability architecture, discover failure points and work with other teams to design tools/alerts to prevent issues in the future.
- Identify and contribute to reliability improvements within the product by providing feedback to the product management team, influenced by a commitment to using the Tanzu Observability service for monitoring production environment and act as customer zero.
- Identify, scope and build tools to reduce the operational load on engineers.
The hiring manager for this role is Elisa Binette, Senior Manager of Site Reliability Engineering overseeing the Wavefront SRE group, a critical component of the Reliability Engineering program for Tanzu products under the Run, Manage and Observe banner. Elisa recently joined VMware to add her SRE industry expertise to this fast growing group. Prior to this role, Elisa has spent almost two decades leading engineering teams across a broad range of industries and technologies.
What are the benefits and perks of working at VMware?
You and your loved ones will be supported with a competitive and comprehensive benefits package. Below are some highlights, or you can view the complete benefits package by visiting www.benefits.vmware.com.
- Employee Stock Purchase Plan
- Medical Coverage, Retirement, and Parental Leave Plans for All Family Types
- Generous Time Off Programs
- 40 hours of paid time to volunteer in your community
- Rethink's Neurodiversity program to support parents raising children with learning or behavior challenges, or developmental disabilities
- Financial contributions to your ongoing development (conference participation, trainings, course work, etc.)
- Healthy and local inspired snacks in all our pantries
This position is eligible for TanzuChallenge referral campaign
VMware is an Equal Opportunity Employer and Prohibits Discrimination and Harassment of Any Kind: VMware is committed to the principle of equal employment opportunity for all employees and to providing employees with a work environment free of discrimination and harassment. All employment decisions at VMware are based on business needs, job requirements and individual qualifications, without regard to race, color, religion or belief, national, social or ethnic origin, sex (including pregnancy), age, physical, mental or sensory disability, HIV Status, sexual orientation, gender identity and/or expression, marital, civil union or domestic partnership status, past or present military service, family medical history or genetic information, family or parental status, or any other status protected by the laws or regulations in the locations where we operate. VMware will not tolerate discrimination or harassment based on any of these characteristics. VMware encourages applicants of all ages. VMware will provide reasonable accommodation to employees who have protected disabilities consistent with local law.
Back to top