Software Engineer, Site Reliability Engineering - Storage
Uber is a technology company that is changing the way the world thinks about transportation. We are building technology people use everyday. Whether it's heading home from work, getting a meal delivered from a favorite restaurant, or a way to earn extra income, Uber is becoming part of the fabric of daily life.
We're making cities safer, smarter, and more connected. And we're doing it at a global scale-energizing local economies and bringing opportunity to millions of people around the world.
Uber's positive impact is tangible in the communities we operate in, and that drives us to keep moving forward.
Uber has become an integral part of people's lives and it is critical that the systems that fuel the platform are always available to everyone in the world. Uber continues to expand into new cities and its growth and popularity presents interesting opportunities for matching developer agility with the scale and complexity of its infrastructure. We are looking for engineers that are able to apply the concepts and discipline of robust distributed systems to resolve infrastructure challenges, while enhancing developer productivity and maintaining operational simplicity.
Our systems span physical data centers and cloud providers. You will be part of a team of engineers with a healthy mix of systems and software engineering skills, executing in a constantly evolving work environment. If you love working on software products that touch the physical world and want some serious experience under exponential growth conditions, this is the role for you.
A Site Reliability Engineer is a Software Engineer with excellent systems and operational skills in one or more functional domains, where the domain may range from Databases to Networking. Most importantly, you have a passion for designing and maintaining highly reliable systems. Empirically, exceptional engineers with this skillset and passion tend to be a rare breed.
As a member of Storage SRE you'll be responsible for SQL and NoSQL database management systems, data caching, queuing and distributed file systems. You will be developing cluster management and automation tools which will enable you to provide infrastructure components as a service for fellow product engineers. Petabytes of data, self healing and automatic scaling are just a few of the engineering puzzles you will be presented with. We are not about operations, we are creating the systems and tooling that make operations unnecessary.
- Automation. SREs are passionate about automation and tooling
- System Architecture, including upstream and downstream dependencies
- Deployment & Change Management, Canary and Release processes
- Resiliency strategies, such as Load and Failure testing
- Capacity Planning, Turn-ups and Turn-downs
- Availability, Performance, Efficiency & Scaling, including Availability and Latency
- Instrumentation, Monitoring, Alerting & Reporting on key metrics and SLAs
- Incident Response (improving the on-call experience, tools, and procedures) including a comprehensive Postmortem process
- Operational Readiness, such as Runbooks and other Documentation, Escalation Paths, and Incident Response Training exercises.
- Partner with fellow engineers to architect and build mission critical software and systems that can stand the test of scale and availability, while reducing operational overhead.
- Drive efficiencies in systems and processes: capacity planning, configuration management, performance tuning, monitoring and root cause analysis.
- Participate in an on-call rotation and contribute to needed escalations.
What you'll need
- BS or MS in Computer Science or a related technical discipline. Equivalent practical experience is a reasonable substitute.
- Advanced level knowledge on at least one of the following: MySQL, PostgreSQL, Cassandra and, ELK stack or similar technologies
- Caching and queuing technologies knowledge (Redis, memcache, RabbitMQ, Apache Kafka)
- Grit, self-motivation and a deep feeling of ownership.
- Good programming skills in at least one of the following: Go, Java, C/C++, Python, .NET, PHP and as well as good scripting skills and ability to pick up new ones.
- Outstanding Linux knowledge and willingness to explore Linux internals
- A good understanding of large-scale distributed systems in practice, including multi-tier architectures, application security, monitoring and storage systems.
- Good scripting skills in bash, perl, etc.
- Experience with AWS, GCP or Azure is a plus
- Configuration management knowledge is a plus
We're bringing Uber to every major city in the world. We need your skills and passion to help make it happen!
Be sure to check out the Uber Engineering Blog to learn more about the team.
- Employees are given Uber credits every month.
- The rare opportunity to change the way the world moves. We're not just another social web app, we're moving real people and assets and reinventing transportation and logistics globally.
- Smart, engaged co-workers.
- 401(k) plan, gym reimbursement, nine paid company holidays.
- Full medical/dental/vision package to fit your needs.
- Unlimited vacation policy; take time when you need it.
Uber is an equal opportunity employer and enthusiastically encourages people from a wide variety of backgrounds and experiences to apply. Uber does not discriminate on the basis of race, color, religion, sex (including pregnancy), gender, national origin, citizenship, age, mental or physical disability, veteran status, marital status, sexual orientation or any other basis prohibited by law.
Meet Some of Uber's Employees
Community Management Specialist
Brian makes sure that every Uber user has an amazing experience. He troubleshoots roadblocks to customer happiness and also does outreach to attract new Uber users.
Back to top