Principal Site Reliability Engineer

AT&T is leading the way to the future - for customers, businesses and the industry. We're developing new technologies to make it easier to stay connected to their world. With a network that covers 225 countries, including more than 120 million customers, we'd say we're well on our way. Together, we've built a premier integrated communications company and an amazing place to work and grow.

The AT&T Entertainment Group's (AEG) Technology Team is at the forefront of innovation, shaping the way our customers share and enjoy video content. As part of our industry-leading team, you'll leverage our extensive networks to revolutionize the way people access their content at home or on the go. From designing, implementing and deploying the software to supporting the infrastructure that powers our video services, you'll work on our DIRECTV OTT and satellite TV platforms. We'll look to you to improve the efficiency and scalability of our systems, ensuring we continue to provide premier technology to our customers.

DIRECTV is looking for a Principal Site Reliability Engineer to join our Operations team. The Video Operations group is responsible for supporting on-air systems that power the DIRECTV platform, including streaming on mobile devices, VOD and satellite TV; all systems that are fast, fault-tolerant and scalable.  The Operations team is responsible for resiliency, swift response, performance and security of DIRECTV's production infrastructure.

  • We provide support to the Software Engineering teams and drive best practices for DIRECTV/AT&T's products nationwide.
  • We partner with the development teams to optimize and operationalize their applications correctly
  • We ensure systems are properly monitored, deployed and supported to provide the ultimate experience for our customers
  • We realize that failure is inevitable, so we embrace it and plan for fast recovery.

As a Principal, Site Reliability Engineer, you're curious, with deep technical knowledge. You're a problem solver and an engineer who uses ingenuity and technical leadership capabilities to solve hard problems. You foster a culture of inquisitiveness, collaboration and learning and are able to empathize with others. Your adaptation and evolution are guided by your experiences.

We're looking for an influential decision-maker who's ready to take on a high level of ownership and responsibility; a forecaster and problem solver for all of Operations.

  • Define and verify standards for configuration, monitoring, reliability and performance
  • Serve as subject matter expert for multiple proprietary and open source technologies
  • Select and develop automation tools and scripts to improve the availability, manageability, scalability and operability of services
  • Provide expert perspective regarding the capabilities and limits of a combined cloud and multi-datacenter production infrastructure in software architecture designs
  • Solve performance and stability issues and prevent their recurrence
  • Define and evangelize cloud-related optimizations and best practices to improve reliability and performance

Additional Information:

  • Advanced administration knowledge of Unix/Linux systems.
  • Demonstrated experience writing code.  (Java, C++, Groovy, etc.)
  • Solid understanding of cloud computing services (AWS, Azure) with experience writing automation for cloud platforms
  • Minimum two years of experience with scripting languages (Python, Shell, Perl, etc.).
  • Skilled in use of automation for job efficiency.
  • Minimum two years of experience working with micro services and container technologies (Docker, Kubernetes, etc.)
  • Ability to root cause sources of instability in a high-traffic, large-scale distributed system 
  • Ability to learn rapidly and communicate value of new technologies to technical and non-technical audiences
  • Meticulous and careful. You identify and consider all risks, and balance those with performing the task efficiently.
  • Thrive in a highly collaborative environment including strong communication skills.

Meet Some of AT&T's Employees

Aaron O.

Architect, Entertainment Group

As a cloud architect, Aaron builds and designs different cloud environments that enable video processing. His work helps customers get whichever channels they want, on whatever device they require.

Jennifer R.

iOS Developer

Jennifer develops mobile applications for AT&T customers. She creates new, easy-to-use features for iPhones and iPads that people haven’t experienced before.

Back to top