Skip to main contentA logo with &quat;the muse&quat; in dark blue text.

Infrastructure DevOps Engineer

AT Apple
Apple

Infrastructure DevOps Engineer

Austin, TX

At Apple, scale and innovation converge to empower world-class engineering. The HMTS Platform team is seeking a seasoned Infrastructure DevOps Engineer with a focus on HPC, Datacenter, and Global Systems. This role involves architecting, deploying, and managing high-performance compute and enterprise infrastructure across Apple's global platforms. The Engineer will bridge high-throughput technical compute with global system integration, supporting thousands of nodes, global data flows, and mission-critical services. As a key member of Apple's Hardware Methodologies, Tool, & Solutions (HMTS) Platform team, you will serve as a vital connector between conventional IT infrastructure development and operations. Your contributions will be crucial in delivering an exceptional design environment for hardware engineering, supporting Apple's commitment to leading innovation in hardware.

Want more jobs like this?

Get Software Engineering jobs in Austin, TX delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.


Description

You will be responsible for the end-to-end infrastructure lifecycle supporting large-scale datacenter and HPC environments across multiple continents. You will: Design and operate scalable Slurm-based HPC clusters, distributed globally across 2,000+ nodes. Lead infrastructure automation for provisioning, monitoring, and configuration management of compute and enterprise services. Manage and tune high-availability services and support site-aware routing, load balancing, and DNS-based traffic distribution. Serve as expert in Active Directory integrations, trust relationships, replication latency troubleshooting, and directory service hardening. Architect virtual based solutions where appropriate to support auxiliary services, container workloads, and hybrid edge compute nodes. Oversee secure data replication strategies between sites, integrating load-balancer VIPs and geo-distributed failover configurations. Provide root-cause analysis for performance bottlenecks, host instability, or data inconsistencies across global platforms. Work closely with platform owners, security teams, and datacenter engineers to evolve infrastructure towards zero-touch, self-healing architecture.

Minimum Qualifications

  • A Bachelor's degree in Computer Science with several years of relevant experience
  • Proven experience in a DevOps role in an enterprise environment with private and public cloud exposure.
  • Proven experience in a Systems Admin or Systems/IT support role in an enterprise environment.

Preferred Qualifications

  • 7+ years of experience operating large-scale, production-grade datacenter or HPC environments (2,000+ nodes).
  • Expert-level Windows Server administration, including Active Directory, GPO, DNS, DHCP, and DFS for distributed enterprise environments.
  • Deep experience with RHEL/CentOS and infrastructure tuning for high-performance, low-latency workloads.
  • Advanced knowledge of global networking concepts: routing, DNS failover, site-aware load balancers, VIP configuration, and traffic shaping.
  • Strong hands-on experience with enterprise virtualization platforms (VMware vSphere/ESXi, HyperV) for production and edge workloads.
  • Proficient in infrastructure automation and scripting with PowerShell, Python, and Ansible.
  • Experience with InfiniBand fabrics and high-bandwidth data interconnects in compute environments.
  • Deep understanding of infrastructure observability using Prometheus, Grafana, Nagios, Splunk, or equivalent tools.
  • Proven success managing global replication services and multi-region compute/data platforms.
  • Excellent cross-functional communication and documentation skills, with the ability to influence and mentor across global teams.
  • Additional Requirements
  • Strong understanding of change management, CMDBs, and policy-based infrastructure enforcement.
  • Experience managing parallel storage systems (NetApp, BeeGFS, or Lustre) and integrating them with compute and replication workflows beneficial
  • Availability for emergency escalations and out-of-hours troubleshooting for priority systems.
  • Travel may be required to support infrastructure at remote datacenter or partner sites.
  • Experience supporting CAD / FEA engineering workloads , CI pipelines, and EDA is highly desirable.

Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant .

Submit Resume

Client-provided location(s): Austin, TX, USA
Job ID: apple-200599333
Employment Type: Other

Company Videos

Hear directly from employees about what it is like to work at Apple.