ABOUT LAB45

Lab45 is a visionary space developing ground-breaking solutions to foster and accelerate ideation throughout Wipro. At Lab45, teams of engineers, research analysts, and scientists come together to infuse creative ways of incubating solutions for customers that will transform the future. It is a space filled with ambition at the vanguard of far-reaching research across cutting-edge technologies. Established with the Silicon Valley culture of free-flowing creativity, Lab45's goal is to make bold ideas reality and to invent the future of enterprise. So come, collaborate and see what happens when ideas are left unbound.

ROLE SUMMARY

We are seeking a highly skilled and experienced Principal Site Reliability Engineer (SRE) to join the Lab45 team in Wipro. As a Principal SRE, you will play a critical role in ensuring the reliability, availability, and performance of our systems and applications. Your expertise and leadership will be essential in driving the adoption of best practices, designing scalable architectures, and improving the overall reliability of our infrastructure

Want more jobs like this?

Get Software Engineering jobs in Mumbai, India delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.

Roles & Responsibilities:

RESPONSIBILITIES AND DUTIES

Design, implement, and maintain highly available and scalable systems, services, and architectures to support our organization's applications and infrastructure.
Lead efforts to improve system reliability, monitoring, and performance, utilizing automation and best practices for continuous integration and deployment.
Collaborate with cross-functional teams to identify and resolve performance bottlenecks, scalability issues, and architectural challenges.
Develop and implement incident response procedures, conduct post-incident analysis, and drive root cause analysis to prevent future incidents.
Define and enforce service-level objectives (SLOs) and service-level agreements (SLAs) to ensure the reliability and availability of our systems and applications.
Automate deployment and configuration processes, utilizing infrastructure-as-code and configuration management tools.
Stay up-to-date with the latest industry trends and technologies related to site reliability engineering, and proactively recommend and implement improvements.
Mentor and provide technical leadership to SRE and engineering teams, promoting a culture of reliability, performance, and scalability.
Collaborate with external partners, vendors, and service providers to manage and optimize our infrastructure and services.

Qualifications:

QUALIFICATIONS AND SKILLS

Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
Extensive experience (10+ years) in site reliability engineering or a similar role, with a strong focus on designing and maintaining scalable, reliable, and high-performance systems.
Deep understanding of cloud technologies and platforms (e.g., AWS, Azure, Google Cloud) and experience with cloud-based infrastructure management.
Proficiency in scripting and programming languages (e.g., Python, Go, Java) for automation and infrastructure-as-code.
Strong knowledge of containerization and orchestration technologies (e.g., Docker, Kubernetes) and experience with microservices architectures.
Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack) to ensure system health and performance.
Familiarity with incident management and response processes, including on-call rotations and post-incident analysis.
Excellent troubleshooting and problem-solving skills, with the ability to quickly diagnose and resolve complex system issues.
Strong communication and collaboration skills, with the ability to effectively interact with technical and non-technical stakeholders.Relevant certifications (e.g., AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer) are highly desirable.

Lab 45 Principal Site Reliability Manager

Want more jobs like this?

Search Additional Jobs