The Amazon.com Incident Response team drives software initiatives to ensure the Amazon retail website customer experience restored as early as possible. Our teams leverage automation and machine learning models to reduce customer-impacting risks and improve overall site availability. Our software integrates existing with new telemetry and logging streams, analyzes anomalies in the Amazon retail customer experience in real-time, localizes software problems and triggers remediation actions so our customers do not perceive any downtime.
As an SDE on the Amazon.com Incident Response team, you will build and design software that analyzes anomalies detected in the retail customer experience within seconds, localizes the event that caused them and triggers immediate repairs. The scope for this role is Amazon's entire ecosystem of tens of thousands of services. Enhancing Amazon's incident response posture creates a flywheel of improvements that include adoption of best practices across all organizations.
You will thrive if you love fast-paced, startup-like work environments focused on building systems from the ground up. You will be responsible for scoping and delivering projects end-to-end, leveraging statistical evaluation, pattern recognition, defining how to interconnect services, and machine learning. You will deliver results personally and by leading your peers. Your strong communication skills will have the audience to present both your architecture and software solutions. Your proven track record of designing and building distributed software solutions through an agile methodology enables you to get a fast start. Your ability to dive deep into a wide variety of problems and technologies will guide the right technical decisions for the Amazon products you support. Your familiarity with DevOps concepts will apply at large scale.
• Programming experience with at least one modern language such as Java, C++, or C# including object-oriented design
• 1+ years of experience contributing to the architecture and design (architecture, design patterns, reliability and scaling) of new and current systems.
• 2+ years of non-internship professional software development experience
• Experience with Site Reliability Engineering (SRE) concepts, practices
• Experience influencing software engineers, infrastructure engineers and operators on best practices (full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations)
• Track record for being detail-oriented with a demonstrated ability to self-motivate and follow-through on projects
• Experience with capacity planning, infrastructure scaling
• Experience in communicating with users, other technical teams, and senior management to collect requirements, describe software product features, technical designs, and product strategy
• Ability to communicate effectively with both technical and non-technical individuals.
• Experience with Machine Learning
• Experience building software systems that have been successfully delivered to customers
• Strong understanding of core protocols and technologies such as: TCP/IP, HTTP, DNS, load balancers, distributed file systems, caches, and distributed data stores
Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status. For individuals with disabilities who would like to request an accommodation, please visit https://www.amazon.jobs/en/disability/us.