Site Reliability Engineer

    • Sunnyvale, CA

About Clover:

Every day, Clover devices handle the core card and point-of-sale processing for hundreds of thousands of merchants. Behind the scenes, we operate a cloud platform providing processing, storage and collaboration for merchants, application developers, service providers and our merchants' customers.  Our devices and platform form the backbone of millions of payment interactions between merchants and their customers daily.


The Role:

To support all of this, we have a team of engineers working around the clock to ensure our systems remain operational, safe and secure.  Right now we are looking to further scale our operation, and we are looking for a hands-on technologist with creative and innovative problem solving skills.

Availability, reliability, and security are paramount.  In this role, you will help build and operate complex systems that allow our large fleet of smart payment terminals to process tens of millions of transactions a day.  We are hoping to find individuals who are a hybrid between system administrators and software engineers.



  • Act as a key contributor in forming the team’s technical strategy and aligning the team and stakeholders with it
  • Initiate large projects with complex architecture, breaking them down to the right logical components so that others engineers can be utilized & contribute effectively
  • Work frequently with other teams to coordinate major changes to cross-system architectures, influencing upstream or downstream for the most efficient solutions
  • Collaborate with engineering teams to propose features that solve recurring patterns of customer complaints
  • Expertly design and implement scalable, distributed, fault tolerant systems that satisfy complex requirements
  • Support services before they go live, through activities such as system design consulting, capacity planning and launch reviews
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity
  • Design and implement best practices for security, monitoring, and telemetry systems
  • Lead initiatives and meetings in the engineering organization and help your teammates be better engineers through better processes, practices and technical guidance



  • Strong CS fundamentals. BS degree in Computer Science or related technical field, or equivalent practical experience
  • Ability to manage competing priorities, a focus on shipping, and the ability to work well under pressure
  • Experience in designing, analyzing, scaling and troubleshooting large-scale distributed systems
  • A systematic problem-solving approach, coupled with strong communications skills and a sense of ownership and drive
  • A passion for automation; strong coding skills in at least one modern programming language (Java/Go/Python/Ruby)
  • Super strong Linux skills and supreme troubleshooting skills
  • Experience with a variety of Cloud technologies and familiarity with industry landscape and trends
  • Some configuration management experience; product does not really matter (any of Puppet, Chef, cfengine, Fabric, Ansible, Salt is fine)
  • Willingness to be part of on-call rotations


Nice to have:

  • Experience with large scale OLTP and OLAP deployments
  • Cloud experience: platform does not matter
  • Experience with tools like Elastic/Kibana, Jenkins, Pagerduty, Wavefront
  • Release software tooling (git, Jenkins, custom scripts)
  • Experience with algorithms, data structures, complexity analysis and software design

Back to top