Site Reliability Engineer
If you are passionate about working on technologies and designing approaches that are disruptive to incumbent Enterprise players, like to leverage Open Source, tooling and automation and agile concepts, and are interested in opportunities to work at massive scale, we have just the job for you.
Site Reliability Engineering is a discipline that is in and of itself not new, but at Box we like to think that we are considering it in a way that makes it one of the most important roles within our Technology organization. As an SRE at Box you will be responsible for managing our production services and will be working very closely with developers and other Ops teams to ensure reliability, scalability and performance of the next-generation of systems. As such, you will be a core driver of operational excellence for all of Box's production services.
We are looking for smart and driven engineers, who have a strong acumen for troubleshooting technical issues and tuning of distributed systems, having done this ideally in a highly scalable, consumer facing web service. Experience with best practices around monitoring, deployment and configuration management are a must, as are strong fundamentals in systems administration.
If this sounds like you, please read on for a more detailed description of the scope and requirements.
- Responsible for managing the overall operations of all production services at Box.
- Collaborate with developers and other internal groups to identify, prioritize and develop service reliability and manageability improvements.
- Work with the NOC and other Ops teams to troubleshoot site issues.
- Be a change management Guru! Change is the only constant in our team and you will be responsible for walking the thin line between failing fast and maintaining SLAs.
- Develop tools/scripts to improve our ability to rapidly deploy and effectively monitor production applications in a large-scale Linux environment.
- Participate in a 24x7 on-call rotation for second-tier escalations.
- 2+ years of Unix/Linux systems administrator level experience.
- Proven production service troubleshooting skills that span applications, systems and network.
- Demonstrated programming skills in one or more of: Bash, Python, Perl, PHP, Ruby, Java, C.
- 2+ years experience working in a consumer web-scale Technical Operations role.
- Solid understanding of operational principles, such as capacity planning, monitoring and incident handling.
- CS or equivalent university degree.
- Committer to relevant Open Source projects and prior experience working in a DevOps capacity, both nice to have!
- Passion for cloud technologies.
About Box: Founded in 2005, Box (NYSE:BOX) is transforming the way people and organizations work so they can achieve their greatest ambitions. As the world's leading enterprise software platform for secure content collaboration, Box helps business of all sizes in every industry securely access and manage their critical information in the cloud. Box is headquartered in Redwood City, CA, with offices across the United States, Europe and Asia. To learn more about Box, visit www.box.com.
Meet Some of Box's Employees
Field Customer Success Manager
Christian works with Box customers post-implementation to ensure they’re successful with their new software—and help them best use the services they subscribe to.
Back to top