Software Engineer, Site Reliability Engineering - Storage
Uber has become an integral part of people’s lives and it is critical that the systems that power the platform are always available to everyone in the world. Uber continues to expand into new cities and its growth and popularity presents interesting challenges for matching developer agility with the scale and complexity of its infrastructure. We are looking for engineers that are able to apply the principles and discipline of robust distributed systems to solve infrastructure challenges, while enhancing developer productivity and maintaining operational simplicity.
Our systems span physical data centers and cloud providers. You will be part of a team of engineers with a healthy mix of systems and software engineering skills, executing in a fast paced environment. If you love working on software products that touch the physical world and want some serious experience under exponential growth conditions, this is the role for you.
A Site Reliability Engineer is a Software Engineer with strong systems and operational skills in one or more functional domains, where the domain may range from Databases to Networking. Most importantly, they have a passion for designing and maintaining highly reliable systems. Empirically, strong engineers with this skillset and passion tend to be a rare breed.
As a member of Storage SRE you’ll be responsible for SQL and NoSQL database management systems, data caching, queuing and distributed file systems. You will be building cluster management and automation tools which will enable you to provide infrastructure components as a service for fellow product engineers. Petabytes of data, self healing and automatic scaling are just a few of the challenges you will be presented with. We are not about operations, we are building the systems and tooling that make operations unnecessary.
- Automation. SREs are obsessed with automation and tooling
- System Architecture, including upstream and downstream dependencies
- Deployment & Change Management, Canary and Release processes
- Resiliency strategies, such as Load and Failure testing
- Capacity Planning, Turn-ups and Turn-downs
- Availability, Performance, Efficiency & Scaling, including Availability and Latency
- Instrumentation, Monitoring, Alerting & Reporting on key metrics and SLAs
- Incident Response (improving the on-call experience, tools, and procedures) including a comprehensive Postmortem process
- Operational Readiness, such as Runbooks and other Documentation, Escalation Paths, and Incident Response Training exercises.
- Partner with fellow engineers to architect and build mission critical software and systems that can stand the test of scale and availability, while reducing operational overhead.
- Drive efficiencies in systems and processes: capacity planning, configuration management, performance tuning, monitoring and root cause analysis.
- Participate in an on-call rotation and be available for escalations.
- Grit, drive and a strong feeling of ownership.
- BS or MS in Computer Science or a related technical discipline. Equivalent practical experience is a reasonable substitute.
- Advanced level knowledge on at least one of the following: MySQL, PostgreSQL, Cassandra and, ELK stack or similar technologies
- Caching and queuing technologies knowledge (Redis, memcache, RabbitMQ, Apache Kafka)
- Good programming skills in at least in one of the following: Go, Java, C/C++, Python, .NET, PHP and as well as good scripting skills and ability to pick up new ones.
- Power-user Linux knowledge and willingness to explore Linux internals
- A good understanding of large-scale distributed systems in practice, including multi-tier architectures, application security, monitoring and storage systems.
- Good scripting skills in bash, perl, etc.
- Experience with AWS, GCP or Azure is a plus
- Configuration management knowledge is a plus
Be sure to check out the Uber Engineering Blog to learn more about the team.
- Employees are showered with Uber credits each month.
- The rare opportunity to change the world such that everyone around you is using the product you built. We’re not just another social web app, we’re moving real people and assets and reinventing transportation and logistics globally.
- Sharp, motivated co-workers in a fun office environment.
- 401(k) plan, gym reimbursement, nine paid company holidays.
- Full medical/dental/vision package to fit your needs.
- Unlimited vacation policy; work hard and take time when you need it.
We’re bringing Uber to every major city in the world. We need the brains and passion to make it happen and to make it happen in style.
Meet Some of Uber's Employees
Community Management Specialist
Brian makes sure that every Uber user has an amazing experience. He troubleshoots roadblocks to customer happiness and also does outreach to attract new Uber users.
Back to top