Senior Lead Site Reliability Engineer


We are looking to hire excellent Senior Site Reliability Engineers who enjoy managing large-scale software systems. You love learning new technologies and figuring out the most efficient ways to manage thousands of hosts. You want to maintain and improve software that scales to 8+ million transactions per second on a huge, global scale. You believe that servers should be managed like cattle, not pets 


  • Monitor the capacity of worldwide serving and data systems. Continually improve capacity measurement
  • Coordinate and automate regular software deployments
  • Monitor production alerts, investigate, and solve for both the short and long term
  • Work with developers and product owners to advocate for operational improvements in our software stack
  • Troubleshoot, investigate and fix production issues in cloud and hosted environments, including both hardware and internal software issues
  • Investigate and fix performance issues in a variety of applications and languages
  • Design and build features to improve system and personnel scalability
  • Occasionally participate in an on-call rotation


  • 8+ years in an operational role (DevOps, system administration, SRE, etc.)
  • Multiple years of experience with configuration management software such as Chef, Puppet, Ansible, SaltStack, or other types of tools such as Consul
  • Deep experience with either AWS or physical datacenter infrastructure (such as hardware load balancers, imaging systems, out of band management, DNS)
  • Ability to code in at least one language
  • Knowledge of TCP/IP fundamentals
  • Experience managing a hybrid Windows and Linux environment, or an eagerness to learn one of these platforms
  • Experience with agile methodologies and a rapid development cycle
  • Experience deploying and managing monitoring tools at large scale, such as Graphite, Grafana, Nagios/Icinga, or SumoLogic
  • Database management experience is a big plus, especially Vertica or Microsoft SQL Server
  • Big plus if you are familiar and passionate about software such as Mesos, Kubernetes, and/or Docker
  • Familiarity with ITIL is a big plus, especially Incident, Problem, and Change Management
  • A desire to seek out needless complexity and remove it

The Trade Desk does not accept unsolicited resumes from search firm recruiters. Fees will not be paid in the event a candidate submitted by a recruiter without an agreement in place is hired; such resumes will be deemed the sole property of The Trade Desk. The Trade Desk is an equal opportunity employer. All aspects of employment will be based on merit, competence, performance, and business needs. We do not discriminate on the basis of race, color, religion, marital status, age, national origin, ancestry, physical or mental disability, medical condition, pregnancy, genetic information, gender, sexual orientation, gender identity or expression, veteran status, or any other status protected under federal, state, or local law.

Back to top