Site Reliability Engineer

Job Description
IBM Cloud Brokerage Services is IBM's solution for Hybrid Cloud Enablement, giving our client's IT organization visibility and governance, without sacrificing speed and business agility.
Our solution is built on our recent acquisition of Gravitant. We continue to operate with a startup mentality but with access to the tremendous market reach of IBM. We are global in scale, with customers in Europe, North America, South American and Asia Pacific. We are panindustry in scope, delivering to a client base representing a range of industries including: telecommunications, retail, aerospace, financial services and others.

IBM Cloud Brokerage is a purpose-built suite of applications that enables a self-service ability to browse, search, order and fulfill services powered by a comprehensive, curated IT as a Service catalog spanning Public, Private and Hybrid Clouds and Traditional IT providers. It is a core component of IBM's strategic investment in the IBM Services Platform with Watson (ISPW), a complete and automated IT as a Service environment powered by the unmatched cognitive capability of Watson.

The Cloud Brokerage Site Reliability Engineer will be part of a group deploying and managing complex Enterprise software solutions in the areas of cloud brokerage, cloud management, data center transformation, Enterprise Hybrid Cloud Architectures and IT Governance. Our delivery organization is made up of functional teams managing (a) Client Advocacy, (b) Client Onboarding and Transformation, (c) Client Solution Engineering and (d) Client Services and Enablement.

The Brokerage Site Reliability Engineer position is responsible for:

  • Designing, analyzing, and troubleshooting large-scale distributed systems
  • Participation in on-call rotation
  • Engage with product teams to fix production outages and carry forward action items to improve ongoing reliability
  • Develop effective tooling, alerts, and response to both identify and address reliability risks including automatic problem detection and mitigation
  • Manage end-to-end availability and performance of Cloud Brokerage services and build automation to prevent problem recurrence. Eventually automate response to all non-exceptional service conditions.
  • Design, write and deliver software to improve the availability, scalability, latency, and efficiency of Cloud Brokerage services.
As Cloud Brokerage Site Reliability Engineer you should possess the following skills:
  • DevOps Mindset
  • You enjoy solving difficult engineering problems and don't mind getting your hands dirty
  • Approach troubleshooting systematically and have a deep sense of ownership for whatever you work on
  • Ability to root cause sources of instability in a high-traffic, distributed system
  • Passion for resolving reliability issues and identify strategies to mitigate going forward
  • Willingness to work in an ever-changing environment
  • You are passionate about automation and innovations that improve productivity
  • Experience with Cloud Computing platforms - IBM Cloud, AWS, Azure, Google Cloud Platform - 3+ years
  • Proficiency in algorithms, data structures, complexity analysis and software design and expertise in Unix/Linux systems, IP networking, performance and application issues. - 5+ years
  • Strong Linux system-level analysis capabilities - 5+ years
  • Experience in operating highly available distributed systems, in particular microservices, in a cloud environment - 1+ years
  • Experience in at least one scripting language, Python preferred. - 2+ years
  • Sound understanding of CI/CD systems as well as experience in running containerized applications using tools such as Docker and Kubernetes. - 1+ years
  • Experience with configuration and troubleshooting of Linux, Java/Scala, Docker systems - 1+ years
  • Experience in operating RDBMS and NoSQL databases. - 3+ years
  • Experience in Java, Elasticsearch, Kibana, Logstash, Grafana - 2+ years
  • Understanding of large-scale complex systems from a reliability perspective - 5+ years
Additional benefits:
  • Training and certifications
  • Private medical package and insurance package
  • Multisport Card
  • Working on international projects in multicultural teams
  • Good to be an IBMer discounts
  • Cinema & trips for IBMers
  • Language classes
  • Summer camps for children

Required Technical and Professional Expertise

As in job description

Preferred Tech and Prof Experience

As in Job description

EO Statement
IBM is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.

Meet Some of IBM's Employees

Peter M.

Leadership Development Solutions Leader

Peter works with a variety of teams within IBM to increase organizational clarity, equip leaders to serve well, and provide opportunities for employees to continually grow and expand their skills.

Rashida H.

Director, IBM Watson Client Delivery

Rashida leads the IBM Watson Delivery Team, which focuses on providing Watson implementation training for clients around the world, helping companies achieve the solutions they seek.

Back to top