Systems Development Manager - Incident Management
- New York, NY
Amazon is seeking an experienced Systems Development Manager for Amazon's TechOps team. TechOps is a unique team that supports AWS and Retail businesses at Amazon - and we need a strong operations leader to push the team to the next level. In TechOps, we make a difference by working across the breadth of Amazon in understanding and reducing impacting incidents as well as building software to improve incident management capabilities. TechOps engineers will run incidents, and use data from those experiences to understand what patterns and practices enabled that incident to occur. This information can then be used to identify trends, and be the champions of the work required to make changes across the company. The mission of TechOps is to reduce the length of any impacting incident, and reduce the number of incidents through operational risk reduction and smart code.
As a Systems Development Manager, you will provide exceptional leadership & management needs for our growing number of Engineers. A love for operational excellence, working with new technologies and pushing the envelope on existing technology is required.
Define and Drive Business Priorities
You will be a key contributor and owner of the direction of the global TechOps team. This includes defining strategic goals for the team; participating in defining, planning, and documenting key projects and initiatives; tracking progress of initiative outcomes against goals; and ensuring that the team remains unblocked and focused during the regular sprint cadence.
Performance Management/Team Health
You will own all facets of performance and career management for the team. You will be expected to provide both technical and 'soft skill' mentoring in order to maintain a well-rounded, world class organization. This includes project management, quality audits and coordination of training sessions with senior-level engineers as well as day-to-day oversight of the team.
Recruiting and Hiring
You will take the lead in hiring quality personnel who not only fit the needs of the current organization but will allow the team to scale with platform and service growth. You will coordinate with Amazon and external recruiting staff to evaluate potential candidates, participate in initial phone screens and provide relevant guidance and feedback during on-site interview loops. You will also be responsible for ensuring that proper training takes place for all new hires.
Cross-Site, Cross-Team Coordination
You will be responsible for coordinating with your counterparts to ensure that a clear communication channel exists between the global TechOps teams. You will also work closely with Software and Reliability product teams to create and maintain proper tool chain and end to end process. A portion of this process will include establishing both solid operational acceptance criteria and a concrete feedback loop for communicating critical priorities.
You will be the point of contact for enquiries regarding engagement processes and issues within the global Amazon platform during your team's coverage. Responsibilities include delegation of emergent engagement issues to team members, driving initiatives regarding improvements to existing tools & processes and providing feedback on new practices & procedures in order to scale with the rapid expansion of the Amazon platform and customer base.
As a member of the TechOps management team, you will be expected to participate in an escalation oncall rotation for all related issues, including high-impact systems and network events. The manager is also expected to respond to critical issues regarding engagement and incident management on an as-needed basis as well as owning root cause analysis and corresponding process updates.
- High degree of organization and be very detail-oriented.
- Ability to interact with and influence people at all levels.
- Excellent written and verbal communication skills and ability to get ideas across to the team, peers and customers.
- Ability to contribute to and support long-term vision and direction regarding key systems support initiatives at Amazon.
- Experience in building and managing a team of strong technical people, and prior ownership of the operation of a mission-critical support team is crucial to success.
- Proven track record of delivering complex projects, including coordinating and driving issues to resolution autonomously utilizing excellent project management skills.
- A B.S. in Computer Science or five years' equivalent experience in a large-scale enterprise environment.
- 5+ years experience managing a team in a DevOps environment, developing custom software.
- Experience in complex systems and network administration.
- Strong understanding of fundamental operational best practices such as monitoring, alerting, deployment and change policies (ITIL a plus)
- Experience running agile frameworks or other workflow methodologies in an DevOps setting.
- Experience dealing with customers during issue resolution and operating under pressure.
- Routine communication of status to senior management
- SLA definition and refinement
- Goal-setting for reduction and elimination of customer facing defects
- Participation in post-mortem analysis, including ensuring a high quality bar for analysis and follow through of consequent action items
Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer, and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, disability, age, or other legally protected status.
Back to top