Sr. Site Reliability Engineer - Cloud Platform Infrastructure
Palo Alto Networks® is the fastest-growing security company in history. We foster a culture of innovation, authenticity, and collaboration. This focus helps to advance our mission of protecting our way of life in the digital age. Our people make this possible. It’s in our everyday interactions, how we work together and treat each other, that sets Palo Alto Networks apart from other organizations. If you are a motivated, intelligent, creative, and hardworking individual, then this job is for you!
We’re looking for a Cloud Platform, Site Reliability Engineer to take ownership for the development efforts of our Cloud Application and Mircoservices Platform. This includes scalability, reliability, automation, uptime and availability, building and maintaining mission-critical infrastructure and tools as a platform. You will own development efforts in each sprint from planning to delivery and will partner with other engineering teams to provide technical vision in making their services more observable, scalable and reliable. You will have the opportunity to gain technical breadth while sharing your cloud platform expertise with other team members.
You will not only identify problems but also develop and implement automation solutions in AWS that operate at scale. The best person for this role is someone that has a collaborative spirit and can seamlessly collaborate and pair with other engineering teams to build and manage a reliable, secure, and scalable platform for microservices.
- Design, build and maintain Infra in AWS to enable reliable and rapid deployment of microservices with effective monitoring and resilient operations.
- Set up critical infrastructure, develop tools and framework to automate operational tasks, deployment of machines, services/app
- Work closely with engineering teams to ensure microservices are designed with scale, operability, and performance
- Create meaningful dashboards, logging, alerting, and responses to ensure that issues are captured and addressed proactively
- Define Service Level Objectives for product(s) to constantly measure their reliability in production. Maximize services uptime and availability ensuring functional and performance SLAs
- Develop custom code or scripts to automate infrastructure, monitoring services
- Cross Functionality with Engineering Teams: Contribute to architecture diagrams and other documentation for security reviews
- Initiate, lead scripting and automation to streamline system updates and upgrades
- Deep understanding of at least one of modern programming language: Java, C, C++, Python, Ruby, C#
- Fluency in Linux, AWS services, and systems management tools (Ansible, Puppet, Chef, etc.)
- Expertise in AWS cloud infrastructure and its related services
- Fundamental understanding of distributed systems including: the CAP Theorem, Microservices, and the Twelve Factor Application
- Demonstrated ability to write programs using a high-level programming language like: C, Java, Python, Ruby
- Hands-on operational experience in creating and managing microservices
- Experience in CI/CD automation and GitHub a plus
- Excellent communication skills and the ability to work well in a team
- Strong automation skills to automate routine tasks using Python or BASH scripting
- Systematic problem-solving approach, strong customer focus, ownership, urgency, and drive to complete a task
- Demonstrated capability to provide depth and breadth technical leadership to agile teams
BS or MS Degree in Computer Science or Engineering involving 7-10 years coding experience in DevOps or SRE role
We are the global cybersecurity leader, known for always challenging the security status quo. Our mission is to protect our way of life in the digital age by preventing successful cyberattacks. This has given us the privilege of safely enabling tens of thousands of organizations and their customers. Our pioneering Security Operating Platform emboldens their digital transformation with continuous innovation that seizes the latest breakthroughs in security, automation, and analytics. By delivering a true platform and empowering a growing ecosystem of change-makers like us, we provide highly effective and innovative cybersecurity across clouds, networks, and mobile devices.
Our Security Operating Platform is built for automation. It is easy to operate, with capabilities that work together, so customers can prevent successful cyberattacks. They can use analytics to automate routine tasks, so they can focus on what matters. We are known for continuously delivering innovations; and with Application Framework, we extend that to an open ecosystem of developers that benefit from our customers’ existing investment in data, sensors, and enforcement points.
Back to top