Site Reliability Engineer
Leanplum is the most complete mobile marketing platform, designed for intelligent action. Our integrated solution delivers meaningful engagement across messaging and the in-app experience. Leanplum offers Messaging, Automation, App Editing, Personalization, A/B Testing, and Analytics.
Top brands such as Expedia, Tesco, and Lyft trust us to create impactful relationships with their users. We were founded in 2012 by former Google engineers with years of experience in optimization and have received over $17MM in funding from top-tier VCs like Kleiner Perkins and Shasta Ventures.
Inside the walls of Leanplum (just kidding, our space is open), you’ll meet employees from 16 countries and counting. We house a world champion air guitarist, three medalists from programming competitions, and six loyal office dogs who greet you at the door with tails wagging. Past perks have included company vacations to Mexico and Tahoe, Alfred Hitchcock movie nights, and costume parties. But most of all, we believe in gratitude, collaboration, and karma.
About This Role
Our Site Reliability Engineers are a hybrid of software and systems engineers. We code our way out of operational problems and into chocolate chip cookies.
Our current mission is to design Leanplum’s next version of the core infrastructure. We are responsible for reliability, scalability, and automation, while keeping an eye on latency, performance, and capacity.
We are seeking extraordinary talent to help fuel our distributed applications capable of serving over 1 billion mobile devices tracking over 1.5 billion analytical events/day equating to over 6000 requests/second and in the end generating over 1.5TB/day of data..
- Automate the server provisioning process across API, Cassandra and Spark with over 400 nodes
- Influence and create new designs and architectures for a growing number of distributed systems (multi regions cloud environment)
- Plan and execute configuration management and monitoring of our platform as it grows.
- Design the system and processes that engineers use to deploy their software into production.
- Design, write, and maintain software to improve the availability, scalability, latency, and efficiency of Leanplum’s services, incorporating cloud and open source tools when available and writing software of your own when nothing else fits the bill.
- Engage in service capacity planning and demand forecasting, anticipating performance bottlenecks and provisioning new hardware as necessary.
- Run software performance analysis and system tuning.
- Participate in rotating on-call duties.
- Fluent in one or more of: Java, Python, Scala.
- Familiarity with algorithms, data structures, and complexity analysis.
- Experience working with Unix/Linux systems from kernel to shell and beyond, with experience working with system libraries, file systems, and client-server protocols.
- Experience with network protocols and theory (TCP/IP, UDP, ICMP, MAC addresses, IP packets, DNS, OSI layers, and load balancing, etc.).
- Systematic problem solving approach.
- Expertise in designing, analyzing, and troubleshooting large-scale distributed systems.
- In-depth knowledge of operating systems (processes, threads, IPC, concurrency, locks, mutexes, semaphores, etc.).
- Strong sense of ownership and drive.
- Experience with AWS/GCP.
- Experience with tuning and performance (Spark, Cassandra, Google App Engine apps)
- Competitive Salaries
- Health, vision, and dental insurance
- Unlimited vacation
- Peer bonuses
- Delicious and healthy lunches
- TGIF happy hours
- Ping pong, darts, and foosball
- Puppies galore
Build more than a Career. Create Meaning.
Back to top