EPAM Systems

Senior/Lead Site Reliability Engineer

2 months agoDublin, GA

EPAM is committed to providing our global team of more than 41,150 EPAMers with inspiring careers from day one. EPAMers think creatively and lead with passion and honesty. Our people are the source of our success. We value collaboration, work in partnership with our customers, and strive for the highest standards of excellence. In today's market conditions, we're supporting operations for hundreds of clients around the world remotely. No matter where you are located, you'll join a dedicated, diverse community that will help you discover your fullest potential.

We are looking for a Site Reliability Engineer with 5+ years' experience and a proven track of implementing solutions for IT and support teams.
What You'll Do

  • Build the framework with the repeatable result, and have an understanding of how to convert "FIX" to automated recovery
  • Develop solutions for complex custom applications in Java, Angular
  • Enhance application availability, scalability, reliability, and performance in a microservice architecture
  • Identify areas for improvement, including performance tuning, infra improvements, better monitoring and pro-active alerting, reducing the gap between incidents, triage, and their resolution
  • Guide development teams in high availability, high performance, and cloud-native environment
  • Build complex frameworks for infrastructure and application provisioning
  • Work with distributed systems and with 24x7 Production Services/environments
What You Have
  • Engineering mindset, ability to build the framework with the repeatable result
  • Minimum 5+ years experience in developing solutions for complex custom applications in Java, Angular
  • Experience with Azure cloud (3+ years) and multiple Azure services (AKS, Virtual Machines, Resource Manager, Azure DevOps, Container Registry and more), Docker, Kubernetes, and Helm
  • Strong Linux and Java troubleshooting skills
  • Knowledge of Java, Python, GO and related technologies, frameworks
  • Experience with Azure Monitoring or at least Prometheus, Grafana, New Relic
  • Skilled in Cloud Architecture and Operations including migration, resilience, maintainability, and cost-efficiency
  • Understanding of CI/CD and SCM tools such as GIT, Maven, Jenkins, Sonar, Artifactory, Ansible
  • An operational background: adding value to the role as you bridge between the Dev team and the Ops team
  • Ability to enhance application availability, scalability, reliability, and performance in a microservice architecture
  • Good understanding of containerization concepts Docker
  • Familiarity with Continuous Integration and Continuous Deployment concepts along-with necessary tools like GIT, Maven, Gradle, Jenkins
  • Experience with working with Azure cloud services
  • Demonstrated ability to think strategically about business, product, and technical issues
  • Strong verbal and written communication skills with the ability to work effectively across various stakeholder in the organization
  • Analytical and problem-solving skills
  • Ability to work under pressure and handle multiple priorities
  • Java, Angular, Python, GO
  • Azure
  • Docker, Kubernetes, Helm
We offer
  • Innovative solutions delivery to the world's digital changes
  • Experience exchange with colleagues all around the world
  • Opportunities for self-realization
  • Unlimited access to LinkedIn learning solutions
  • Friendly team and enjoyable working environment
  • Competitive compensation
  • Social package: professional & soft skills trainings, medical & family care programs, sports
  • Free English classes
  • Corporate and social events
  • Regular assessments and salary reviews

Client-provided location(s): Georgia, USA
Job ID: EPAM-60990