In 2006, Amazon Web Services (AWS) began offering IT infrastructure services to businesses as web servicesnow commonly known as cloud computing. Today, AWS provides a highly reliable, scalable, low-cost infrastructure platform in the cloud that powers hundreds of thousands of businesses in 190 countries around the world. As our customers grow their businesses, AWS continues to provide Infrastructure that meets their global requirements, aiming to be the most customer centric company on earth.
The Infrastructure Operations (Data Center) Team is the backbone of AWS, supporting the rapidly growing AWS business and customers 24/7. We are committed to maintain the physical infrastructure of AWS, ensuring the standards for operational performance in the areas of safety, security, availability, productivity, capacity, efficiency, and cost.
As a member of the Infrastructure Operations (Data Center) Team, you will have the chance to work on the most advanced technologies in a DYNAMIC environment with expanding opportunities. If you enjoy working in a strong and close-knit diverse team, Infrastructure Operations (Data Center) Team is the place to be!
The Amazon Data Center Engineering Operations (DECO) team is seeking a strong subject matter expert (SME) who can deploy, operate, and maintain the facilities (electrical/mechanical systems, control/fire-fighting systems, etc.) of our large-scale, high-density data centers. We support our internal and external customers 24x7 all year, so work is by shift, on-call or a combination of the two.
l Own as the site SME and POC, plan, review, evaluate, operate, maintain, improve and manage mission-critical facilities including vendor management, day to day hands-on work and supervision relating to decrease/increase of rack capacity, onsite on-going or future construction works, planned maintenance works and urgent or emergency changes along with the AWS Infrastructure Priorities.
l Participate in and be responsible for future Capacity, Availability and other projects of assigned sites, review, evaluate, and give feedback on designs from Operations view point to mitigate Safety, Security and Availability risks beforehand.
l Prepare and implement countermeasure for natural disasters, emergency response to high priority/critical incidents including creating EOPs, training staffs and preparing appropriate tools. Respond to high severity events and large scale event as the owner of the operations. Understand SOO and EOPs, troubleshoot, mitigate, and resolve issues, write and update senior leaders through regular and timely report, conclude issue with complete root cause analysis.
l Review, evaluate and proactively identify SPOF risks or vulnerability in data center (electrical, mechanical, control) designs, test and commissioning program, construction and operations processes, and consider, plan, coordinate, propose, negotiate, persuade, grant approval from stakeholders for the issue remediation and/or mitigation plan and deliver results
l Build sustainable and scalable mechanism to collect, review and report regular metrics and KPI of the team, plan, propose, and drive kaizen based on the metrics and KPI results
l Understand and develop team structure, create and document headcount requirements, help drive interview, and hire bar raising candidates, build strong team through delegation, development, training, directing, coaching, empowering, motivating, promoting, and managing 6 to 10 engineers including regular performance review and discussion
l Plan and manage budget and procurement.
l 10 years+ experience with designing, building, commissioning, operating and maintaining data center or mission critical facilities such as power substation, airport, hospital, etc.
l Experience in managing life cycle of data center from designing, constructing, commissioning, operating and to decommissioning
l Has strong ability to understand electrical systems (be able to read and write SLD with no information in hand (Single Line Diagram with all details including breaker open / close state and SOO)), (supply system of power substations, transformers, switchgears, VFI-class UPS, DRUPS, PDU, ATS, STS, SLA or VRLA battery and related systems, fuel systems related to diesel/gas turbine generators, surge control circuits, active harmonic filters, battery monitoring systems, branch circuit monitoring systems, SCADA systems)
l Has strong ability to understand mechanical systems (CRAC/CRAH, AHU, chillers, cooling towers, storage tanks, heat exchangers, plumbing systems, pumps, valves, duct systems, fans, dampers, fire detection and extinguishing systems, drainage systems, building monitoring systems, automatic control systems)
l Experience and deep understanding with change management, incident management, problem management (including troubleshooting incident as incident commander and post failure/root cause analysis), vendor management, risk management, asset management (critical equipment and spares), energy management (PUE improvement, government reporting), BCP (business continuity planning), annual and mid to long-term maintenance planning and management, budgeting, reporting, communicating with senior leaders, PDCA cycle for process improvement, etc.
l Experience in managing and operating multiple data center sites
l Be able to represent the country for the role for global initiatives, work with cross functional teams and deliver expected results in a timely manner
l Be able to proactively identify and propose solution or risk mitigation plan for design SPOF or Availability risks
l Be able to create mission statement and visions for a team, set goals and strategies, and build, develop and manage strong team
l Be able to write, review, question, find mistake, and improve documents such as standard operating procedures, methods of procedure, and emergency operating procedures, root cause analysis, promotion document, technical proposals, vendor contract, Basis of Design, design documents, white papers, etc.
l Be able to travel outside the country, build good relationship of trust with cross functional leaders across all regions
l Be able to work independently, self-start, think logical and out of box, communicate without confrontation, dive deep into issues and provide solution
l Be able to train others
l English and Japanese proficiency (verbal and written) at a level that enables daily tasks without trouble - TOEIC 800+, JLPT N2 or above or equivalent
l Able to lift and/or move equipment weighing about 18kg safely using lifting machines
l Able to work on-call
l Able to cover shift in case needed
l Holds certification for Qualified Energy Manager and has experience reporting to the government about energy management
l Holds certification for Type II Chief Electrical Engineer and has experience managing extra-high voltage electrical facilities
l Holds certification in PMP, Prince2, ITIL v2 or v3, BICSI, ASHRAE, CDCP, CDCS, CDCE, or equivalent
Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status. For individuals with disabilities who would like to request an accommodation, please visit https://www.amazon.jobs/en/disability/jp/