Lead Site Reliability Engineer
3+ months ago• Zestap’oni, Georgia
This job is no longer available.
We are seeking a highly skilled Lead Site Reliability Engineer to join our team.
The ideal candidate will have a strong background in software engineering and systems engineering, with a focus on reliability and scalability in cloud environments, specifically Azure.
To discover more about Cloud practice at EPAM Georgia, visit this page .
Experience the freedom of remote work from anywhere in Georgia, whether from the comfort of your home, our modern offices in Tbilisi and Batumi or a coworking space in Kutaisi.
#LI-DNI
Responsibilities
- Design, implement, and maintain highly available and scalable systems across multi-region Azure cloud architectures
- Ensure disaster recovery plans are in place and tested regularly
- Configure and enhance monitoring and alerting processes using Prometheus, Grafana, Alertmanager, and OpsGenie
- Develop dashboards to visualize system performance and reliability metrics
- Utilize Terraform for infrastructure provisioning and management
- Implement best practices for continuous deployment and infrastructure changes
- Work closely with the development team to support ongoing development efforts
- Communicate with the customer's DevOps team to elaborate on requirements and collaborate on implementations
- Enhance release management and CI/CD processes using Jenkins
- Improve system security based on recommendations from the security team
- Write and test runbooks to streamline operational tasks and incident response
- Manage and optimize services running on Kubernetes, Docker/Linux environments
- Handle data persistence using Cosmos DB (Mongo API & SQL API) and MS SQL Server
- Work with messaging systems like RabbitMQ, Kafka, and EventHub
- Utilize Azure Networking for secure and efficient communication
Want more jobs like this?
Get Science and Engineering jobs in Zestap’oni, Georgia delivered to your inbox every week.

- 5+ years experience as a DevOps or SRE engineer
- Proven experience with multi-region Azure cloud architectures
- Proficiency in Kubernetes and containerization technologies
- Strong knowledge of Cosmos DB (both Mongo API & SQL API) and MS SQL Server
- Familiarity with monitoring tools like Prometheus, Grafana, Alertmanager, OpsGenie
- Experience with .NET Core and ASP.NET Core applications
- Competency in Docker and Linux environments
- Expertise in Terraform for infrastructure as code
- Experience with CI/CD tools
- Solid understanding of Azure Networking concepts
- Excellent communication skills, both verbal and written
- Strong self-motivation and ability to self-manage tasks and projects
- Experience with Azure IoT Hub and EventHub
- We connect like-minded people:
- Delivering innovative solutions to industry leaders, making a global impact
- Enjoyable working environment, whether it is the vibrant office or the comfort of your own home
- Opportunity to work abroad for up to two months per year
- Relocation opportunities within our offices in 55+ countries
- Corporate and social events
- We invest in your growth:
- Leadership development, career advising, soft skills and well-being programs
- Certifications, including GCP, Azure and AWS
- Unlimited access to LinkedIn Learning and Get Abstract
- Free English classes with certified teachers
- We cover it all:
- Participation in the Employee Stock Purchase Plan
- Monetary bonuses for engaging in the referral program
- Comprehensive medical & family care package
- Five trust days per year (sick leave without a medical certificate)
- Benefits package (sports activities, a variety of stores and services)
Client-provided location(s): Zestap’oni, Georgia
Job ID: EPAM-epamgdo_bltcae32f2c42939710_en-us_Other_Georgia
Employment Type: OTHER
Posted: 2024-12-28T15:32:08
Perks and Benefits
Health and Wellness
Parental Benefits
Work Flexibility
Office Life and Perks
Vacation and Time Off
Financial and Retirement
Professional Development
Diversity and Inclusion