Skip to main contentA logo with &quat;the muse&quat; in dark blue text.

Cloud Site Reliability Engineer

7 days ago Mumbai, India

We are seeking a highly skilled Site Reliability Engineer (SRE) with strong experience in Kubernetes troubleshooting, incident response, and deep knowledge of monitoring and alerting systems, along with solid experience in CI/CD pipeline design and maintenance. You will play a key role in building and maintaining reliable infrastructure, enhancing observability, and ensuring uptime for mission-critical systems.

In this role, you will...

  • Diagnose and resolve issues in Kubernetes clusters, including deployments, pod failures, networking issues, and autoscaling.
  • Lead incident management efforts including on-call response, root cause analysis, and continuous improvement of incident playbooks.
  • Design and maintain monitoring, logging, and alerting systems using tools such as Prometheus, Grafana, and ELK (Elasticsearch, Logstash, Kibana).
  • Set up and manage Kibana dashboards and maintain the ELK stack to ensure high availability and performance of logging infrastructure.
  • Integrate metrics, logs, and traces into a unified observability platform.
  • Build and maintain alerting pipelines to reduce noise and improve signal-to-noise ratio for production incidents.
  • Contribute to infrastructure automation using tools like Terraform, Helm.
  • Set up and support CI/CD pipelines for automated testing, deployment, and rollback across multiple environments.
  • Participate in shift rotations and continuously improve observability and response systems.

You've Got What It Takes If You Have...


  • 2+ years in an SRE, DevOps, or Infrastructure Engineer role.
  • Bachelor's degree in computer science, IT, or related technical field.
  • Hands-on experience on AWS and GCP Cloud
  • Deep hands-on experience with Kubernetes (EKS, AKS, GKE)
  • Strong understanding of Linux internals, container orchestration, and microservice architecture.
  • Hands-on experience with monitoring/logging tools:
  • Prometheus, Grafana, InfluxDB
  • ELK stack (Elasticsearch, Logstash, Kibana)
  • Proficient in incident response and alerting tools (PagerDuty etc.).
  • Basic knowledge of:
  • Kafka - topic monitoring, consumer health
  • ElastiCache / Redis - caching patterns and troubleshooting
  • InfluxDB - time-series metrics storage
  • Experience writing and maintaining automation scripts in Bash, Python, or Go.

#LI-Onsite

Want more jobs like this?

Get jobs in Mumbai, India delivered to your inbox every week.

Job alert subscription
Client-provided location(s): Mumbai, India
Job ID: CornerstoneOnDemand-req11065
Employment Type: OTHER
Posted: 2026-03-25T20:04:38

Perks and Benefits

  • Health and Wellness

    • Health Insurance
    • Health Reimbursement Account
    • Dental Insurance
    • Vision Insurance
    • Life Insurance
    • Short-Term Disability
    • Long-Term Disability
    • FSA
    • HSA
    • HSA With Employer Contribution
    • Pet Insurance
    • Mental Health Benefits
  • Parental Benefits

    • Birth Parent or Maternity Leave
    • Non-Birth Parent or Paternity Leave
    • Fertility Benefits
    • Family Support Resources
    • Adoption Leave
  • Work Flexibility

    • Flexible Work Hours
    • Remote Work Opportunities
    • Hybrid Work Opportunities
  • Office Life and Perks

    • Casual Dress
    • Snacks
    • Company Outings
    • On-Site Cafeteria
    • Holiday Events
  • Vacation and Time Off

    • Paid Vacation
    • Unlimited Paid Time Off
    • Paid Holidays
    • Personal/Sick Days
    • Leave of Absence
    • Summer Fridays
  • Financial and Retirement

    • 401(K) With Company Matching
    • Stock Purchase Program
    • Performance Bonus
    • Relocation Assistance
    • Financial Counseling
    • Profit Sharing
  • Professional Development

    • Tuition Reimbursement
    • Promote From Within
    • Work Visa Sponsorship
    • Leadership Training Program
    • Internship Program
    • Shadowing Opportunities
    • Access to Online Courses
  • Diversity and Inclusion

    • Employee Resource Groups (ERG)
    • Unconscious Bias Training
    • Diversity, Equity, and Inclusion Program