APAC Incident Manager, Beijing
The APAC Incident Manager is responsible for leading the resolution of critical incidents affecting critical business applications at Airbnb. The candidate will work with a team of global Incident managers in a follow the sun model. Candidates must have experience in, and not limited to, support operations, escalation management and critical incident response. The ideal candidate will lead the incident response, including communications to key stakeholders and executives, restoration activities for application and system outages, and follow-up data analysis and documentation.
This role will involve a significant amount of autonomy so the candidate should be comfortable working with little direction, thrives in a fast paced and ever changing environment and can communicate upwards with aplomb.
The candidate will work with other IT teams, business units, and vendors to triage, develop restoration plans, and execute on plans to recover application and supporting systems.
- Ensures that technical issues are responded to and that normal service operations are restored as quickly as possible
- Coordinate all resources and vendors in triage, root cause analysis, and restoration for critical application and system incidents
- Provide on-call support for critical customer facing applications
- Coordinate with other Incident Leads and vendors and conduct formal incident handoff to ensure global coverage and continuity of restoration efforts
- Partner with and maintain relationships with all resources associated with incident response
- Own business and technical communications on incident status as well as monthly reports on overall operations health and system performance
- Provide updates to senior Leadership on incidents, status and team response
- Oversee and lead stabilization work and fine-tuning of applications
- Identify persistent or recurring problems and recommend creative solutions
- Review and revise incident management processes, policies and escalation procedures on a regular basis to drive efficiencies and effectiveness in responding to issues
- Create/Update system documentation, troubleshooting guides, and other key documentation needed.
- Train, guide, and advise associates on the Major Incident Management Process
- Develop and produce metrics reports on application performance and SLAs
- Perform other administrative activities in support of application operations
- Ownership of critical team projects. Ensuring these projects are driven through to completion in a timely manner
You may be a good fit for our team if:
- You have experience and understanding of the challenges in a global, distributed, always-on application that is continually evolving
- You have strong mentoring, recruiting and leadership ability
- You are a creative problem-solver: while experience counts a lot, it’s the ability to solve new and interesting problems in a dynamic environment that will ultimately be most important.
- You are a leader that can shape the structure and processes that will make the team successful and happy.
- You have been responsible for own end-to-end availability and performance to prevent problem recurrence
- You want to implement a world-class continuous improvement process that includes root-cause analysis, solution identification and implementation, and ongoing emphasis on auto-remediation
- Excellent communication and interpersonal skills, must be able to talk to non-technical business clients in being able to clarify functional requirements
Back to top