Sr. Director, Reliability Engineering
We spend a great deal of our time online. Whether it’s for information, commerce, or entertainment, each of us has come to depend on what we research, discover, and share. Publishers – those who create and curate content – are what makes the Internet great. Yet these publishers practice their craft largely alone, in siloes – without reference points or insightful understanding about where they sit in the grand scheme of things. To add to the challenge, once a Publisher’s content is in the wild, then the task of building engagement, growing a loyal following and enriching the engagement with that following can sometimes feel like shots in the dark or worse, a black box. Moreover, making money from their craft can be a complex task for any independent publisher who might prioritize generating content first and money second.
Sovrn believes that independent publishers are the Internet's vibrancy. As a partner and advocate to tens of thousands of independent publishers, Sovrn provides tools, technologies and services that help publishers (a) make money; (b) get distribution to grow their audience; and (c) access a massive data commons providing extraordinary insights.
The landscape of content networks, adtech vendors, and the myriad of buy-side / sell-side companies can be a complete maze for any reasonable person to decipher. Sovrn cuts through the noise and simplifies things with a basic, straightforward mission:
Help content creators do more of what they want to do – and less of what they don’t.
As a Sr. Director, Reliability Engineering you will lead a team responsible for release & infrastructure engineering. The primary objectives are: deliver highly available, resilient, scalable, secure platform; implement continuous integration pipelines including configuration automation, quality automation integration, and software deployment; lead release practices which encourage speed while safeguarding production, drive incident outage resolution, instrument full stack monitoring, implement metrics, quantify and measure Service Levels, and drive reliability engineering solutions with focus on system failure modes and reduction of repetitive resolutions. This role requires ability to partner with software, data, and quality engineering; set clear, measurable goals, function as strategic and technical thought leader, meet current demands while engineering improved capabilities, data drive decision making, and ability to lead incident/outage resolutions across full stack engineering teams. The Sr. Director should also be able to develop team members through coaching, and direct training as needed for team to achieve goals.
- Work collaboratively with engineering leads to strategically ensure the reliability and maintainability of new and modified software, data, and infrastructure deployments
- Lead development of engineering solutions for release automation, tools, and infrastructure with focus on availability, performance, recovery, scalability, monitoring, and incident outage resolution to meet quantitative service levels, speed feature team delivery, and minimize service impacting events
- Lead incident outage response and resolution practices, leverage automation and tools to enable quick response and effective resolutions
- Consistently communicate reliability issues to other departments including internal teams, customers, and partners including status dashboard, outbound notifications, and issue escalation. Lead triage with all parties as needed to meet Service Levels
- Set availability, performance, scalability, release automation, monitoring, and service management standards
- Integrate team and services into agile release train delivery lifecycle
- Architect and integrate monitoring and metrics tools with full stack
- Primary owner of hosting provider(s) relationships and performance
- Ability to drive improvements which optimize platform price performance
- Experience with continuous integration, configuration automation, infrastructure architecture, monitoring instrumentation & metrics, performance tuning
- Ability to lead and motivate engineering team
- Drive to find system problems delivery engineering solutions to resolve
- Strong organizational skills, attention to detail and ability to manage to deadlines
- Able to work effectively in partnership with software and data engineering teams
- 7+ years of experience with digital platform engineering, and/or technical operations experience
- Strong software engineering, co-location & cloud hosting, infrastructure management knowledge
Position Reports to: VP, Platform Engineering
Back to top