Software Development Engineer - Website Availability (Errors)


Technical Risk Reduction is 100% focused on improving the customer experience on We seek to continuously reduce the risks associated with the development operations required to deliver the best online retail customer experience in the world - without interruption. We implement mechanisms to detect and rapidly recover from shopping experience anomalies, and we drive teams to adopt practices and mechanisms for preventing incidents from occurring in the first place. This is a high impact, high visibility job that comes with exciting challenges and great reward. We interface with dozens of teams and systems across the company and our output is sent to executive management.

This position is for an engineer on the Fatals team which is one of six teams on Technical Risk Reduction. We work closely with our sister teams including: Chaos Engineering, Ever Ready, Policy Engine, Enterprise Campaign Management and Retail Monitoring. The Fatals team is focused on eliminating fatal exception errors which appear as "We're Sorry" pages to the customer. In a best case scenario the error is a minor flaw that can be fixed by refreshing the page. In cases where the entire page is broken, this can lead to a customer abandoning the website and thus revenue loss for the company. Our mission is to drive Fatals down to zero thereby preserving the utmost customer experience for our shoppers.

The Fatals team has built a distributed, scalable platform to ingest the error logs from all teams that contribute code to websites around the globe. This system runs on native-AWS technologies such as Lambda, DynamoDB, ElasticSearch, S3, SNS & SQS using primarily Java with an angular.js website. We provide a robust UI that tracks the top fatal-causing issues along with the ability to deep dive into a specific signature for root cause analysis. This information is used to drive the responsible page and feature owners to fix these negative customer experiences. The Fatals team has also built two fast detection and mitigation products which automatically rollback deployments when there is a fatal regression.

We are looking for an engineer who will guide our technical design for creating clever solutions to help us get better at identifying anomalies, identifying the responsible teams and scaling with ever-increasing traffic and uniqueness of requests. Our goal is to reduce total fatal counts year over year by building better prevention and detection mechanisms that integrate with our deployment services.

This person will contribute to the technical leadership of Technical Risk Reduction which includes a diverse group of over two dozen engineers (and growing!). The systems this person will be designing and guiding will have a direct impact on improving the customer experience across all websites around the globe.

Basic Qualifications

  • Bachelor's or Master's Degree in Computer Science or related field. Alternatively (no degree), minimum of five years of professional software development experience.
  • · 2-4+ years of work experience in software development
  • · Proficiency in, at least, one modern programming language such as C, C++, C# or Java.
  • · Good understanding of SQL.
  • · Proficiency with HTML, CSS, JavaScript, Angular
  • · Experience with all aspects of web development, databases, back end and front end.
  • · Solid coding practices including peer code reviews, unit testing, and a preference for agile development

Preferred Qualifications

Strong desire to build high-performance, highly-available and scalable distributed systems.

  • · Be highly innovative, flexible and self-directed.
  • · Excellent written and verbal communication skills.
  • · Ability to reason about SQL queries, an elementary knowledge of database inner workings, related data structures (Oracle and MySQL databases)
  • · Experience translating design mockups and prototypes into working application designs
  • · Experience with web-oriented UI/UX graphic design

Meet Some of Amazon's Employees

Mae M.

Senior UX Designer

Mae integrates human-centered design into tools that enable business partners to operate efficiently and intuitively. She analyzes customer needs and pain points to improve designs.

Heather Z.

Director of Alexa Engagement

Heather focuses on building great customer experiences for Alexa users. She heads a team of technical and creative professionals who bring the product to life.

Back to top