Skip to main contentA logo with &quat;the muse&quat; in dark blue text.
ecobee

Site Reliability Engineer

We’re ecobee. Born and raised in Canada, we're making living more comfortable with simply designed smart home devices that aren’t just smart for our users—but for our planet too. We invented the world’s first smart wifi enabled thermostat and continue to innovate the next generation in technology that is connecting the home. Our hive, headquartered in Toronto, is growing quickly and we are specifically looking for a Site Reliability Engineer for our fantastic Production Engineering team.

Who you are:

Have you worked on automating cloud services such as AWS or GCP? Perhaps you’ve setup a Kubernetes cluster, or used tools such as CloudFormation or Terraform. Maybe you've worked with on-premises hardware, and are itching to learn about cloud services, and how you might make a migration to the cloud? Or, perhaps you have read at least one W. Richard Stevens book.

Want more jobs like this?

Get jobs delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.

What you'll do:

You’ll join our Production Engineering team, which is responsible for delivering a scalable, reliable, and secure platform upon which ecobee’s online-services run. Our team of engineers comes from diverse backgrounds who collaborate on a daily basis because ideas come from everywhere. We nurture curiosity day-in and day-out and encourage you take take courses, attend conferences and continuously challenge the status quo.

Our challenges run from automating our systems and building continuous delivery environments, to supporting millions of simultaneous TCP sockets and ingesting nearly a terabyte of telemetry data a day. We aim to do this with high reliability and security in mind. We tend to use Python and Go for solving problems and developing internal tooling, and we’re open to other languages too.

You will help us improve the automation of our systems. This could involve everything from developing automation systems, to addressing scalability challenges, to crafting a new architecture for storing customer data. You will work closely with the Backend Development team on architectural matters and systems design. 

Your responsibilities will include:

  • Automate the deployment of applications and infrastructure.
  • Develop orchestration systems and solve configuration management problems.
  • Improve the monitoring of services. Strive to improve our mean-time-to-response on incidents, while reducing alerts.
  • Investigate the root cause of complex problems and solve them so that our users have a better experience.
  • Participate in an on-call rotation. We strive to reduce alerts out of hours and to minimize alerts during business hours too. Our focus for this role is for you to be building new software and systems, not responding to fires or tickets.
  • Research new technologies and influence the technical direction of the company.
  • Solve problems around tooling and the delivery of software to production.

Our Requirements:

  1. Experience with the automation of cloud services and/or configuration management and orchestration (such as Puppet, Chef, Ansible, Salt, or Terraform)
  2. Experience with Linux based systems (file systems, processes, systems administration)
  3. Strong experience in at least one of the following areas:
  • Programming in languages such as Python, Ruby, Go, Java, C, or similar
  • TCP/IP protocols and network fundamentals
  • Network Security (firewalls, routing, proxies, etc)
  • Linux / Unix system internals (troubleshooting/debugging processes, system calls, memory layout, etc.)

Are you the one we need? If so, we would love to hear from you.

ecobee is committed to workplace diversity and will provide accommodation to applicants with disabilities throughout the hiring process.

Job ID: 681665
Employment Type: Other

This job is no longer available.

Search all jobs