Skip to main contentA logo with &quat;the muse&quat; in dark blue text.

Senior SRE

Yesterday San Jose, CA

Introduction

A career in IBM Software means you'll be part of a team that transforms our customer's challenges into solutions. Seeking new possibilities and always staying curious, we are a team dedicated to creating the world's leading AI-powered, cloud-native software solutions for our customers. Our renowned legacy creates endless global opportunities for our IBMers, so the door is always open for those who want to grow their career.

IBM's product and technology landscape includes Research, Software, and Infrastructure. Entering this domain positions you at the heart of IBM, where growth and innovation thrive.

Your role and responsibilities

We are seeking a Sr Customer Support / SRE to join our team who is responsible for delivering Astra Streaming (Apache Pulsar as a Service). You will help our users succeed by resolving complex incidents, improving service reliability, and driving operational excellence across environments.

You will work closely with engineering, product, and customer support teams to ensure the Astra Streaming platform runs with high availability, low latency, and predictable performance in support of meeting and exceeding enterprise workload expectations.

Key Responsibilities

  • Serve as Tier2/Tier3 escalation point for customer-reported incidents, performance issues, and operational anomalies.
  • Troubleshoot issues across the full-stack
  • Develop and maintain runbooks, monitoring dashboards, altering rules
  • Participate in and improve on-call rotation, including leading incident response and post-mortems when necessary
  • Collaborate with Engineering to identify root causes and drive fixes for long-term improvements
  • Implement SLOs, SLIs and error budgets to ensure platform reliability aligns with customer expectations
  • Automate common tasks (toil)
  • Contribute to and lead observability and telemetry improvements (Prometheus, Grafana, Thanos, or equivalent).
  • Provide detailed and empathetic customer communication during incidents and post-incident reviews.
  • Act as a voice of the customer in reliability, scalability, and usability discussions
  • Mentor junior support and operations engineers

Success in this Role

In the first six months, success means:

  • Handling escalations independently and guiding complex incident responses.
  • Improving MTTR through new automation or monitoring enhancements.
  • Earning customer trust by delivering transparent communication and reliable resolution
  • Identifying recurring failure modes and driving engineering changes to eliminate them.

Required education

High School Diploma/GED

Preferred education

Bachelor's Degree

Required technical and professional expertise

  • 5+ years of experience in SRE, DevOps, or Production Engineering for large-scale distributed systems.
  • Deep understanding of Apache Pulsar, Apache Bookeeper, or similar messaging systems (Kafka, Rabbit MQ)
  • Experience operating Pulsar clusters in Kubernetes in public clouds
  • Solid troubleshooting skills across Linux, Networking, JVM based applications and Containers / Kubernetes as a service.
  • Strong knowledge of monitoring, logging, and tracing tools (Prometheus, Grafana, Splunk, etc)

Preferred technical and professional experience

  • Experience contributing to Opensource Apache Pulsar or Bookeeper
  • Familiarity with multi-tenant architectures and managed-service operations
  • Experience with IaC and GitOps workflows

ABOUT BUSINESS UNIT

IBM Software infuses core business operations with intelligence-from machine learning to generative AI-to help make organizations more responsive, productive, and resilient. IBM Software helps clients put AI into action now to create real value with trust, speed, and confidence across digital labor, IT automation, application modernization, security, and sustainability. Critical to this is the ability to make use of all data, because AI is only as good as the data that fuels it. In most organizations data is spread across multiple clouds, on premises, in private datacenters, and at the edge. IBM's AI and data platform scales and accelerates the impact of AI with trusted data, and provides leading capabilities to train, tune and deploy AI across business. IBM's hybrid cloud platform is one of the most comprehensive and consistent approach to development, security, and operations across hybrid environments-a flexible foundation for leveraging data, wherever it resides, to extend AI deep into a business.

YOUR LIFE @ IBM

In a world where technology never stands still, we understand that, dedication to our clients success, innovation that matters, and trust and personal responsibility in all our relationships, lives in what we do as IBMers as we strive to be the catalyst that makes the world work better.

Want more jobs like this?

Get jobs in San Jose, CA delivered to your inbox every week.

Job alert subscription


Being an IBMer means you'll be able to learn and develop yourself and your career, you'll be encouraged to be courageous and experiment everyday, all whilst having continuous trust and support in an environment where everyone can thrive whatever their personal or professional background.

Our IBMers are growth minded, always staying curious, open to feedback and learning new information and skills to constantly transform themselves and our company. They are trusted to provide on-going feedback to help other IBMers grow, as well as collaborate with colleagues keeping in mind a team focused approach to include different perspectives to drive exceptional outcomes for our customers. The courage our IBMers have to make critical decisions everyday is essential to IBM becoming the catalyst for progress, always embracing challenges with resources they have to hand, a can-do attitude and always striving for an outcome focused approach within everything that they do.

Are you ready to be an IBMer?

ABOUT IBM

IBM's greatest invention is the IBMer. We believe that through the application of intelligence, reason and science, we can improve business, society and the human condition, bringing the power of an open hybrid cloud and AI strategy to life for our clients and partners around the world.

Restlessly reinventing since 1911, we are not only one of the largest corporate organizations in the world, we're also one of the biggest technology and consulting employers, with many of the Fortune 500 companies relying on the IBM Cloud to run their business.

At IBM, we pride ourselves on being an early adopter of artificial intelligence, quantum computing and blockchain. Now it's time for you to join us on our journey to being a responsible technology innovator and a force for good in the world.

IBM is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, genetics, pregnancy, disability, neurodivergence, age, or other characteristics protected by the applicable law. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.

OTHER RELEVANT JOB DETAILS

IBM will not be providing visa sponsorship for this position now or in the future. Therefore, in order to be considered for this position, you must have the ability to work without a need for current or future visa sponsorship.

The compensation range and benefits for this position are based on a full-time schedule for a full calendar year. The salary will vary depending on your job-related skills, experience and location. Pay increment and frequency of pay will be in accordance with employment classification and applicable laws. For part time roles, your compensation and benefits will be adjusted to reflect your hours. Benefits may be pro-rated for those who start working during the calendar year.

Client-provided location(s): San Jose, CA
Job ID: IBM-69180
Employment Type: OTHER
Posted: 2025-11-04T18:52:07

Perks and Benefits

  • Health and Wellness

    • Parental Benefits

      • Work Flexibility

        • Office Life and Perks

          • Vacation and Time Off

            • Financial and Retirement

              • Professional Development

                • Diversity and Inclusion

                  Company Videos

                  Hear directly from employees about what it is like to work at IBM.