Do you enjoy broad yet equally deep scope that impacts all AWS systems globally? Are you up for leading the software teams that improve the health of the cloud? AWS Vetting is looking for a Support Engineer with a passion for operational support of highly reliable and low latency services for use by AWS service providers (EC2, S3, DynamoDB, etc), Data Center Operations, Hardware Engineering, and others.
We are embarking on major software programs to streamline and automate vetting of servers across the AWS fleet, automate data center provisioning operations, and collect component data across tens of millions of components, and it's a great time to join and help shape and deliver on this strategy.
Who do we do : Our systems help ensure the health and availability of AWS server hardware by testing every server across AWS data centers.
What you will do : You maintain operating systems, servers, and networks; generate metrics; make monitor configuration changes; and perform change management activities. You understand the business logic and architecture of your supported services that enable you to regularly resolve manually- and automatically-cut tickets. You are able to read and understand complex application code and make approved code fixes to resolve support issues. You use your coding skills to automate repetitive tasks and to fix bugs in the operational systems.
Why it's high-impact : Our systems ensure that world-wide AWS customers can get the capacity they need within an SLA, and to run their applications on healthy hardware.
What's the challenge : Our space is fast-moving, and has a large number of ambiguous and difficult challenges. You will be a support leader throughout the larger organization and are regularly engaged to work on cross-team planning. You lead large multi-team projects and resolve the most complex support issues. You understand the business impact of support decisions. You drive the team to improve operational efficiency for all services through the identification and development of SLAs, metrics, monitors, procedures, tools, and documentation. You think proactively and work to prevent support issues before they are realized.
• B.S. in Computer Science, Engineering or a related technical field or equivalent experience
• 4+ years experience in troubleshooting, root causing, and patching 24x7 production web services, distributed systems, databases, and large-scale software systems
• 3+ years of hands on experience with SQL, Linux administration, application servers, continuous integration/deployment tools, and service monitoring platforms
• 2+ year of programming experience with scripting language such as Ruby and Python or languages such as Java, C, C++, C#
• Understanding of Networking Fundamentals and ability to troubleshoot issues with related tools (e.g. ICMP, traceroute, netstat, pcap, etc.)
• Knowledge of Cloud Computing concepts and ability to perform deep, technical troubleshooting
• Strong documentation skills
Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status. For individuals with disabilities who would like to request an accommodation, please visit https://www.amazon.jobs/en/disability/us/ .
Hear directly from employees about what it's like to work at Amazon.