Site Reliability Engineer
Site Reliability Engineer (New York or Toronto)
Work Market is the leading platform for on-demand workforce management. Our marketplace is providing the biggest brands in the world and freelancing businesses, with a solution that allows both sides to manage an end-to-end contract engagement at scale. We are right in the middle of a rapidly growing freelance economy that will make up 50% of the workforce by 2020.
Our clients are able to expand and contract their workforce based on market demands and allow them to stay relevant and efficient in a competitive marketplace. We provide access to on-demand subject matter experts that allow companies to be productive and efficient. We offer freelancers the ability to have a flexible work schedule and to be selective about their project engagements while also building their portfolio.
The team here is very excited by the idea of redefining the labor model for the 21st century and are passionate about creating the best possible solutions for our customers. It is a large complicated problem to solve and we can’t do it without having the best talent in each of our teams moving forward and thinking outside the box. Technical skills are just as important as creativity, communication, and good teamwork. We are backed by Union Square Ventures, Spark Capital SoftBank Capital, Industry Ventures and Silicon Valley Bank and have recently received $20 million in Series C funding that is being reinvested back into hiring.
As a Site Reliability Engineer, you will be working closely with engineers to make sure the apps they build can scale, stay available, be secure, and handle failure. Giving insights to the health of these apps through metrics and monitoring is also critical to allowing teams to manage themselves from local dev through production. You will help guide them in this process.
- Work with development teams to write highly available, fault tolerant code using internal libraries as well as external ones like Netflix Hystrix.
- Create metrics which feed into our monitoring systems and dashboards to give insights for the developers and operations teams.
- Define alerting thresholds, assist in troubleshooting of application issues and help lead post mortems.
- Build out infrastructure to support these apps by using terraform to spin up aws environments, or creating docker containers to deploy into our schedulers.
- Try to break stuff! Our apps must be battle tested and able to withstand different failure scenarios. Load/stress testing, chaos monkey type experiments, anything to make sure our customers never deal with downtime or failure.
- Participate in oncall alongside developers and be available to work with our client/customer support teams if necessary.
- Designed, deployed, and managed large cloud infrastructures such as AWS, GCE, etc.
- Excellent oral and written communication. Ability to convey ideas internally to co-workers as well as externally through meetups and talks.
- Solid understanding of Linux (Ubuntu or other distro) system administration, configuration and command-line tools. (Do you love/hate systemd and journalctl for example?)
- Ability to understand distributed software architectures and troubleshoot them from infrastructure through application layers.
- Experience with containers (Docker/rkt) and how they work internally. Deployment into a production environment using a scheduler (Mesos / Kubernetes / Nomad / ECS) a plus.
- Implemented a service discovery system using tools like Smartstack/Consul/EtcD for dynamic environments.
- Ability to write code/scripts using languages such as Python, Go, Ruby.
- Passion for technology and desire to push our tech stack forward.
- Be a team player and work closely with developers and operations.
- Experience with configuration management systems and concepts.
- Strong experience with version control software such as Git.
- Experience with Monitoring, Instrumentation and performance engineering.
Nice to Have:
- Experience in a "continuous delivery/deployment” environment and supporting tools.
- Experience with Java and JVM in a production environment.
- Experience with messaging patterns/architecture (ActiveMQ, RabbitMQ, AMQP, Kafka)
- Ability to configure and customize monitoring tools (Prometheus, Nagios, Zenoss, New Relic, Graphite, etc)
- Hands on experience with server build automation
- Deployed Overlay/Underlay networks to support multi-host docker environments
- Experience working at a startup!
Meet Some of Work Market's Employees
Director Of Product Management
As Director of Product Management, Steve acts as a bridge between the Platform Services Team and the Engineering Team, overseeing big-picture projects, like allocating additional resources for freelancers.
Back to top