Principal Reliability Engineer
We spend a great deal of our time online. Whether it’s for information, commerce, or entertainment, each of us has come to depend on what we research, discover, and share. Publishers – those who create and curate content – are what makes the Internet great. Yet these publishers practice their craft largely alone, in siloes – without reference points or insightful understanding about where they sit in the grand scheme of things. To add to the challenge, once a Publisher’s content is in the wild, then the task of building engagement, growing a loyal following and enriching the engagement with that following can sometimes feel like shots in the dark or worse, a black box. Moreover, making money from their craft can be a complex task for any independent publisher who might prioritize generating content first and money second.
Sovrn believes that independent publishers are the Internet's vibrancy. As a partner and advocate to tens of thousands of independent publishers, Sovrn provides tools, technologies and services that help publishers (a) make money; (b) get distribution to grow their audience; and (c) access a massive data commons providing extraordinary insights.
The landscape of content networks, adtech vendors, and the myriad of buy-side / sell-side companies can be a complete maze for any reasonable person to decipher. Sovrn cuts through the noise and simplifies things with a basic, straightforward mission:
Help content creators do more of what they want to do – and less of what they don’t.
As Principal Reliability Engineer you will lead hands-on engineering for full stack provisioning “DevOps” automation. Reliability engineering includes deployment pipelines, software configuration management, infrastructure configuration management, incident resolution tools, and performance engineering. The goal of Reliability engineering is to streamline full stack delivery using automation in order to increase velocity of new features while meeting production service levels commitments. This role leads the technical implementation, workflow transformation, standards and policies. This role is heavy influencer of the culture to create collaboration and partnership across software, data, reliability, and quality engineering to enable the delivery and service management flows. The principal engineer is the expert and should have ability to lead both informal and formal teams through DevOps transformation, implementation, and sustaining activities.
- Architect and lead implementation for full stack provisioning automation ie continuous integration/deployment, software configuration, infrastructure configuration, and monitoring
- Lead technical transformation process from traditional operations to automated reliability operations leveraging DevOps workflows and automation tools
- Design and implement automation and tools to speed incident resolution, reduce production issues, and manage production change with minimal business disruption
- Design and lead monitoring framework technical implementation
- Ability to lead problem definition, solution design, and define implementation work plans
- Understanding of DevOps trends. Participation in community activities a plus.
- Develop critical software, integration, and tools needed to achieve reliability goals
- Ability to define and implement technology to track reliability metrics
- Able to work in agile delivery
- Strong communication and collaboration
- Partner with feature team engineering, data science, and external data partners to enable business deliverables
- Partner with QA automation engineering on integration
- Troubleshoot systemic issues and lead improvements
- Exposure to cluster workload and container management tools such as Kubernetes, Mesos, Docker
- Experience with pipeline and configuration automation tools
- Bachelor or Masters in Computer Science or relevant field
- 8+ years' experience, including team leadership
- Excellent understanding of DevOps and Reliability engineering
- An analytical mindset with problem-solving skills
- Excellent communication and collaboration skills
- Ability to understand business domain and translate to reliability services
Position Reports to
: Sr Director, Reliability Engineering
Back to top