Senior Software Engineer, Storage (SRE) - Seattle
Uber has become an integral part of many people’s lives. It is critical that the systems that power the platform are always available to everyone in the world. Uber continues to expand into new cities and its growth and popularity presents interesting challenges for matching developer agility with the scale and complexity of its infrastructure. We are looking for senior engineers that are able to apply sound engineering principles and build robust distributed systems to solve infrastructure challenges, while enhancing developer productivity and maintaining operational simplicity.
Our storage systems span physical data centers and cloud providers. You will be part of the storage site reliability engineering team with a healthy mix of software and systems engineering background, executing in a fast paced environment. If you love working on software products that touch the physical world and want some serious experience under exponential growth conditions, this is the role for you.
Are you among the rare breed of software engineers with strong systems engineering and operational background in storage and data? Do you have a passion for designing and maintaining highly reliable systems? As a Senior Software Engineer in the Storage SRE team, you’ll be responsible for one of the following areas: query engine, elastic search, SQL and NoSQL database management systems, caching, queuing and distributed file systems. You will be building cluster management and automation tools that will enable you to provide storage infrastructure components as a service for other engineering teams. Petabytes of data, self-healing and automatic scaling are just a few of the challenges you will be presented with. Site Reliability engineering is not about operations, we are building the systems and tooling that make operations unnecessary.
- Provide technical leadership, influence and partner with fellow engineers to architect, design and build mission critical storage systems that can stand the test of scale and availability, while reducing operational overhead.
- Drive efficiencies in systems and processes through automation: capacity planning, configuration management, performance tuning, monitoring and root cause analysis.
- Participate in a periodic on-call rotation and be available for escalations.
- Collaborate with product and security engineering teams, and enable successful use of mission critical storage systems.
- System Architecture design, including management of upstream and downstream dependencies
- Design and deliver software tools to advance reliability of storage systems including availability, performance, efficiency & scaling.
- Automation of Deployment & Change Management, Canary and Release processes
- Resiliency strategies, such as Load and Failure testing
- Capacity planning, Turn-ups and Turn-downs
- Instrumentation, Monitoring, Alerting & Reporting on key metrics and SLAs
- Incident Response (problem solving, improving the on-call rotation experience, tools, and procedures) including a comprehensive postmortem process, and automation to prevent recurrence.
- Operational Readiness, such as Runbooks and other Documentation, Escalation Paths, and Incident Response Training exercises.
- Grit, drive and a strong feeling of ownership coupled with collaboration and leadership.
- BS or MS in Computer Science or a related technical discipline, or equivalent experience.
- 5+ years of experience building and managing distributed systems. Sound understanding of fundamentals of distributed systems.
- Highly proficient in one of the following programming languages: Go, Java, C/C++, Python, C# and good scripting skills and ability to pick up new ones.
- Systematic problem solving approach and knowledge of algorithms, data structures and complexity analysis.
- Extensive experience in at least one of the following storage technologies: MySQL, PostgreSQL, Cassandra, ELK stack or similar
- Knowledge of caching and queuing systems (Redis, Memcached, Varnish, RabbitMQ, Apache Kafka)
- Power-user Linux knowledge and willingness to explore Linux internals
- A good understanding of large-scale distributed systems in practice, including multi-tier architectures, application security and monitoring systems.
- Experience with AWS, Azure or GCP is a plus
- Configuration management knowledge with Chef, Puppet is a plus.
- Familiarity with Docker, Mesos and container technologies
Be sure to check out the Uber Engineering Blog to learn more about the team.
- Employees are given Uber credits every month.
- 401(k) plan, gym reimbursement, nine paid company holidays.
- Full medical/dental/vision package to fit your needs.
- Unlimited vacation policy; work hard and take time when you need it.
We’re bringing Uber to every major city in the world. We need brains and passion to make it happen and to make it happen in style.
At Uber we don’t just accept difference—we celebrate it, we support it, and we thrive on it for the benefit of our employees, our products and our community. Uber is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status.
Meet Some of Uber's Employees
Community Management Specialist
Brian makes sure that every Uber user has an amazing experience. He troubleshoots roadblocks to customer happiness and also does outreach to attract new Uber users.
Back to top