Site Reliability Engineer
Are you passionate about technology? Do you love building new things? Do you want to develop the future of IBM's Cloud offerings? If you answered YES, then we have the right opportunity for you!
The shift toward the consumption of IT as a service, i.e., the cloud, is one of the most important changes to happen to our industry in decades. At IBM, we are driven to shift our technology to an as-a-service model and to help our clients transform themselves to take full advantage of the cloud. With industry leadership in analytics, security, commerce, and cognitive computing and with unmatched hardware and software design and industrial research capabilities, no other company is as well positioned to address the full opportunity of cloud computing.
We are looking for dynamic, Site Reliability Engineers in Austin, TX to join our Cloud Innovation Lab (CIL) Team, who is responsive to market needs, to deliver value to our clients in a fast changing cloud landscape. The CIL team dedicated to ensuring that the IBM Cloud is at the forefront of cloud technology, from data center design to network architecture to storage and compute clusters to flexible infrastructure services. We are building IBM's next generation cloud platform to deliver performance and predictability for our customers' most demanding workloads, at global scale and with leadership efficiency, resiliency and security. It is an exciting time, and as a team we are driven by this incredible opportunity to thrill our clients.
In this Site Reliability Engineer role, you will work closely with the Data Center, the entire Cloud Innovation Lab development organization and IBM vendors to support, maintain and operationally improve the cloud infrastructure. This position is for either a 2nd or 3rd shift. You should have excellent written and verbal communication skills and you should be comfortable operating in fast paced environment. You will focus on the following key responsibilities:
- Availability to work 2nd/3rd shift and/or weekends is a requirement for this particular job
- Monitor the health of production and test systems 24x7
- Ability to respond promptly to production issues and alerts 24x7
- Work with Engineering to:
- Provide initial assessment and possible workaround of production issue
- Troubleshoot and resolve production issues
- Work with Partners to:
- Identify and resolve issues
- Discuss and plan integration task
Required Technical and Professional Expertise
- Must have knowledge of script writing
- Must be extremely comfortable using and navigating within a Linux environment
- Experience with or knowledge of open source products
- Must understand how DNS works
- Must have knowledge of virtualization
Preferred Tech and Prof Experience
- Experience in a hands-on production administration of large system environment
- Ability to do debugging and problem analysis by examining logs and running Unix commands
- Exposure to configuration management systems (Ansible / Chef)
- Exposure to splunk or ELK
IBM is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
Meet Some of IBM's Employees
Leadership Development Solutions Leader
Peter works with a variety of teams within IBM to increase organizational clarity, equip leaders to serve well, and provide opportunities for employees to continually grow and expand their skills.
Back to top