Site Reliability Engineer
We're looking for an engineer to build and run our large-scale web production systems and application environment to ensure Pinterest's site reliability. You'll design, build and monitor our applications and systems infrastructure that currently handle billions of monthly page views.
What you'll do:
- Design, implement and deliver software and infrastructure to improve the scalability, availability, and efficiency of Pinterest's services.
- Influence and create new designs, architectures, standards and methods for large-scale distributed systems with a focus on operability.
- Collaborate with developers in the deployment and scaling of new product features.
- Perform deep dives into reliability issues, partnering with software and systems engineers across the organization to produce and roll out fixes.
- Evaluate and implement emerging containerization and resource management technologies such as Docker, Mesos, Kubernetes.
What we're looking for:
- Experience in a modern web programming environment.
- Proficient in either Python, Java, Go, or C.
- Familiarity with distributed systems including service discovery, pub/sub, and search indexing. We use Zookeeper, Kafka, and Elasticsearch, respectively.
- Deep experience with at least one of the modern configuration management tools such as Puppet, Chef, Ansible, or Salt
- Strong knowledge of Linux/Unix/BSD internals.
Meet Some of Pinterest's Employees
As a company that relies on a huge collection of images and an elegant user-interface, Pinterest needs engineers like Tracy to write the code that supports it.
Back to top