Senior Site Reliability Engineer
Leanplum is the mobile marketing platform built for engagement. Leanplum helps brands orchestrate multi-channel campaigns — from messaging to the in-app experience — all from a single, integrated platform. Mobile disruptors like Lyft, Tinder, Grab, TED, and Zynga rely on Leanplum to accelerate growth and build long-term customer relationships. Delivering personal engagement in the moments that matter, Leanplum captures more than 15 billion data points each day and sends more than 5 billion push notifications per month. Founded in 2012, Leanplum is based in San Francisco with offices in New York, London, Singapore, and Sofia, Bulgaria. Leanplum has received more than $46MM in funding from Canaan Partners, Kleiner Perkins, and Shasta Ventures. Leanplum has been named to Business Insider's Most Valuable Enterprise Startups of 2016, The Muse's Most Innovative Startups, and SF Business Times’ Best Places to Work. Learn more at www.leanplum.com
About This Role
Our Site Reliability Engineers are a hybrid of software and systems engineers. We code our way out of operational problems and into chocolate chip cookies.
Our current mission is to design Leanplum’s next version of the core infrastructure. We are responsible for reliability, scalability, and automation, while keeping an eye on latency, performance, and capacity.
We are seeking extraordinary talent to help fuel our distributed applications capable of serving over 1 billion mobile devices tracking over 6 billion analytical events/day equating to over 17,000 requests/second and in the end generating over 1.5TB/day of data.
What We Need Your Help With
- Monitoring and alerting for various components across our infrastructure
- Automate the server provisioning process across API, Cassandra and Spark with over 400 nodes
- Influence and create new designs and architectures for a growing number of distributed systems (multi regions cloud environment)
- Plan and execute configuration management and monitoring of our platform as it grows.
- Design the system and processes that engineers use to deploy their software into production.
- Design, write, and maintain software to improve the availability, scalability, latency, and efficiency of Leanplum’s services, incorporating cloud and open source tools when available and writing software of your own when nothing else fits the bill.
- Engage in service capacity planning and demand forecasting, anticipating performance bottlenecks and provisioning new hardware as necessary.
- Run software performance analysis and system tuning.
- Participate in rotating on-call duties.
You’re Good At
- Fluent in one or more of: Java, Python, or Scala
- Familiarity with algorithms, data structures, and complexity analysis
- Experience working with Unix/Linux systems from kernel to shell and beyond, with experience working with system libraries, file systems, and client-server protocols
- Nice to have experience with network protocols and theory (TCP/IP, UDP, ICMP, MAC addresses, IP packets, DNS, OSI layers, and load balancing, etc.)
- Systematic problem solving approach
You Might Be Also Good At
- Expertise in designing, analyzing, and troubleshooting large-scale distributed systems
- In-depth knowledge of operating systems (processes, threads, IPC, concurrency, locks, mutexes, semaphores, etc.)
- Strong sense of ownership and drive
- Experience with AWS, GCP, or Microsoft Azure
- Experience with tuning and performance (Spark, Cassandra, Google App Engine apps)
- Competitive Salaries
- Health, vision, and dental insurance
- Unlimited vacation
- Peer bonuses
- Delicious lunch catered daily
- Themed happy hours every Friday!
- Ping pong, darts, and foosball
- Puppies galore
Build more than a Career. Create Meaning.
Meet Some of Leanplum's Employees
As part of the Sales Team, Melanie's job is to help enterprise brands better understand and engage with their mobile app customers.
Back to top