Everbridge

Site Reliability Engineer II

2 months agoBeijing, China
Everbridge is looking for a skilled member of its SaaS Operations team with functional knowledge in all areas of technology operations and site reliability with particularly emphasis automation in a windows environment. The ideal candidate will fulfill the critical role of ensuring our systems are healthy, monitored, and designed to scale. The successful candidate should have hands-on experience in a web-scale role with emphasis on software-as-a-service. Candidates should also have experience designing, planning, implementing, tuning and operating technology including application servers, virtual machine & container management, large-scale monitoring/trending techniques, micro-service architectures, clustering technology, configuration management and creative scaling techniques.

What you'll do:

  • Keep people safe and businesses running.
  • Own operational availability, security, scalability, efficiency, monitoring, instrumentation, and overall service reliability of Everbridge's solutions.
  • Collaborate across Agile teams with Architects, Developers, Quality, Data, Security, and other Operations engineers on designing and implementing highly reliable solutions.
  • Embrace Site Reliability Engineering principles of proactivity, automation, cross-functional collaboration, data-driven decision making, and fast+safe failing to continually improve our technology and culture.
  • Enhance our infrastructure, tooling, and processes to extend operability as a self-service function for other groups in the engineering value stream.
  • Participate in a rotating on-call schedule to troubleshoot and resolve production escalations from our 24x7x365 NOC.
  • Have fun while we work hard to make a difference.

What you'll bring:

  • Previous experience contributing in a production Site Reliability, DevOps, or SaaS/Technical Operations
  • Minimum of 3 years of AWS experience in a production environment
  • Automation framework orchestration, configuration management, and software-defined infrastructure management techniques (SaltStack preferred, others e.g. Puppet, Chef, Ansible, etc. also acceptable)
  • 1+ years of Kubernetes experience (EKS, AKS, GKE, Self managed)
  • Ability to write code in at least one programming language (e.g. Python, Perl, Java, Ruby, Go)
  • Experience with Terraform, Jenkins, Packer and Docker
  • Large scale production UNIX/Linux operating system, application, and security maintenance in an online service provider environment (Ubuntu and Debian GNU/Linux preferred)

Bonus if:

  • You have experience with any of the following: Spinnaker, Helm, Nomad or Consul



Bridger Culture: 

At Everbridge, we have a mission that matters – to keep people safe and businesses running during critical events. Our “Bridgers” join Everbridge to make a positive impact on the world through their work. The core of our company culture is built around making a difference. Our people are dedicated to solving problems during difficult times and challenging situations as our software was built to save lives.
 
We are a rapidly growing organization transforming the field of critical event management and need passionate, committed and determined individuals to help us carry out our mission. Our environment is dynamic, and our culture is constantly evolving and expanding in order to provide the best employee experience.
 
Click here to learn more about what we do. Passionate about our mission? Want to #BeTheBridge? Apply to be a part of our team today!
 
Everbridge is an Equal Opportunity/Affirmative Action Employer. All qualified Applicants will receive consideration for employment without regard to race, creed, color, religion, or sex including sexual orientation and gender identity, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.
Job ID: be9a7c76-6eba-415a-9219-869c722a358a