Cloud Site Reliability Engineer
7 days ago• Mumbai, India
We are seeking a highly skilled Site Reliability Engineer (SRE) with strong experience in Kubernetes troubleshooting, incident response, and deep knowledge of monitoring and alerting systems, along with solid experience in CI/CD pipeline design and maintenance. You will play a key role in building and maintaining reliable infrastructure, enhancing observability, and ensuring uptime for mission-critical systems.
In this role, you will...
- Diagnose and resolve issues in Kubernetes clusters, including deployments, pod failures, networking issues, and autoscaling.
- Lead incident management efforts including on-call response, root cause analysis, and continuous improvement of incident playbooks.
- Design and maintain monitoring, logging, and alerting systems using tools such as Prometheus, Grafana, and ELK (Elasticsearch, Logstash, Kibana).
- Set up and manage Kibana dashboards and maintain the ELK stack to ensure high availability and performance of logging infrastructure.
- Integrate metrics, logs, and traces into a unified observability platform.
- Build and maintain alerting pipelines to reduce noise and improve signal-to-noise ratio for production incidents.
- Contribute to infrastructure automation using tools like Terraform, Helm.
- Set up and support CI/CD pipelines for automated testing, deployment, and rollback across multiple environments.
- Participate in shift rotations and continuously improve observability and response systems.
You've Got What It Takes If You Have...
- 2+ years in an SRE, DevOps, or Infrastructure Engineer role.
- Bachelor's degree in computer science, IT, or related technical field.
- Hands-on experience on AWS and GCP Cloud
- Deep hands-on experience with Kubernetes (EKS, AKS, GKE)
- Strong understanding of Linux internals, container orchestration, and microservice architecture.
- Hands-on experience with monitoring/logging tools:
- Prometheus, Grafana, InfluxDB
- ELK stack (Elasticsearch, Logstash, Kibana)
- Proficient in incident response and alerting tools (PagerDuty etc.).
- Basic knowledge of:
- Kafka - topic monitoring, consumer health
- ElastiCache / Redis - caching patterns and troubleshooting
- InfluxDB - time-series metrics storage
- Experience writing and maintaining automation scripts in Bash, Python, or Go.
#LI-Onsite
Want more jobs like this?
Get jobs in Mumbai, India delivered to your inbox every week.

Client-provided location(s): Mumbai, India
Job ID: CornerstoneOnDemand-req11065
Employment Type: OTHER
Posted: 2026-03-25T20:04:38
Perks and Benefits
Health and Wellness
- Health Insurance
- Health Reimbursement Account
- Dental Insurance
- Vision Insurance
- Life Insurance
- Short-Term Disability
- Long-Term Disability
- FSA
- HSA
- HSA With Employer Contribution
- Pet Insurance
- Mental Health Benefits
Parental Benefits
- Birth Parent or Maternity Leave
- Non-Birth Parent or Paternity Leave
- Fertility Benefits
- Family Support Resources
- Adoption Leave
Work Flexibility
- Flexible Work Hours
- Remote Work Opportunities
- Hybrid Work Opportunities
Office Life and Perks
- Casual Dress
- Snacks
- Company Outings
- On-Site Cafeteria
- Holiday Events
Vacation and Time Off
- Paid Vacation
- Unlimited Paid Time Off
- Paid Holidays
- Personal/Sick Days
- Leave of Absence
- Summer Fridays
Financial and Retirement
- 401(K) With Company Matching
- Stock Purchase Program
- Performance Bonus
- Relocation Assistance
- Financial Counseling
- Profit Sharing
Professional Development
- Tuition Reimbursement
- Promote From Within
- Work Visa Sponsorship
- Leadership Training Program
- Internship Program
- Shadowing Opportunities
- Access to Online Courses
Diversity and Inclusion
- Employee Resource Groups (ERG)
- Unconscious Bias Training
- Diversity, Equity, and Inclusion Program