Site Reliability Engineer, AWS / Kubernetes

    • Oakland, CA

At Altais, we're looking for bold and curious innovators who share our passion for enabling better health care experiences and revolutionizing the healthcare system for physicians, patients, and the clinical community. Doctors today are faced with the reality of spending more time on administrative tasks than caring for patients. Physician burnout and fatigue are an epidemic, and the healthcare experience and quality suffer as a result. At Altais, we're building breakthrough clinical support tools, technology, and services to let doctors do what they do best: care for people. Come join us as an early member of our passionate and growing team as we change the game for the future of healthcare and enable the experience that people need and deserve.


Do you enjoy working with a highly motivated and talented team to deliver mission-critical healthcare solutions that change the way healthcare is delivered? Altais is growing our Site Reliability Engineering team to help deploy, manage, troubleshoot, and enhance our complex cloud-based services for our customers.Do you want to push the limits on Amazon Web Services to drive value-based care? Using Athena, EMR, Kinesis, Redshift, Glue, MQ, Neptune, Greengrass, SageMaker. Kendra, Lex, Textract, PyTorch, TensorFlow, Transcribe, Polly, and Macie.We are looking for a highly technical, hands-on Engineer with experience using several open source projects commonly found in large-scale deployments. You will be managing our Kubernetes Lifecycle: deployments, upgrades, monitoring, and uptime of all K8S clusters. You will help to advance the deployment process of software into Kubernetes with CodeFresh / Terraform Cloud at massive scale. Additionally, you will work towards perfecting the metrics and alerting from Datadog and Opsgenie so that all events are actionable.Your focus will be on maximizing system uptime. Team members all participate in an on-call rotation.You will build innovative automated solutions and tools to help debug and resolve problems in production and prevent them from recurring. Further, you will proactively seek out system weaknesses and find ways to fix them before they cause production issues using monitoring data, watching trends, and using Chaos Engineering.This position is located in our San Francisco Montgomery Street location and will move to our brand-new Oakland City Center location in Fall 2020.About the Work:

  • Keeping your assigned site or service up and running or getting it back up and running quickly when a failure occurs

  • Working closely with internal partners and teams to ensure that we ship software that meets security, SLA, and performance requirements

  • Writing, updating, and user documentation, including runbooks/playbooks

  • Automating work including infrastructure needs, testing, failover solutions, failure mitigation, and much more

  • Debugging complex problems across an entire stack and creating solid solutions

  • Developing CI/CD processes to improve cadence

  • Using Chaos Engineering to test what you build under real-world conditions

  • Automate yourself out of your current roles every year


  • 5+ years production level experience with distributed applications at scale
  • Excellent communication skills, both verbal and written
  • Knows their way around a Unix/Linux shell, can write shell scripts, and understands Linux internals
  • Experience debugging complex problems
  • Experience designing, building, and operating large-scale production systems
  • Knows NodeJS, Python, Java, Go, Rust, or similar
  • A solid understanding of networking and core Internet protocols (e.g. TCP/IP, DNS, SMTP, HTTP, and distributed networks)
  • Advanced experience on Terraform and/or CloudFormation
  • Understands networking and messaging, especially between services
  • Has hands-on experience using source control (Git, GitHub) and feature branching strategies
  • Has experience with a variety of open-source databases (MySQL, Postgres, Redis, Cassandra, etc.)
  • Have a track record of embedding security into the fabric of an organization and infrastructure.
  • Understands the idea behind Chaos Engineering, even if they haven't yet implemented it themself

  • Experience with Function-as-a-Service (Kubeless)
  • Experience with Service Mesh (Linkerd/Istio) / API Gateways (Kong/Tyk) / Message Queuing (Kafka, AWS SQS)

You Share our Mission & Values: do not change
  • You are passionate about improving the healthcare experience and want to be part of the Altais mission.
  • You are bold and curious- willing to take risks, try new things and be creative.
  • You take pride in your work and are accountable for the quality of everything you do, holding yourself and others to a high standard.
  • You are compassionate and are known as someone who demonstrates emotional intelligence, considers others when making decisions and always tries to do the right thing.
  • You co-create, knowing that we can be better as a team than individuals. You work well with others, collaborating and valuing diversity of thought and perspective.
  • You build trust with your colleagues and customers by demonstrating that you are someone who values honesty and transparency.

Physical Requirements

Office Environment - roles involving part to full time schedule in Office Environment. Based in our physical offices and work from home office/deskwork - Activity level: Sedentary, frequency most of work day.
Please click here for further physical requirement detail.

Altais is a wholly owned subsidiary of Blue Shield of California. Candidates hired to work for Altais, Altais Health, and Altais Clinical Services will be employed by California Physicians' Services dba Blue Shield of California. External hires must pass a background check/drug screen. Qualified applicants with arrest records and/or conviction records will be considered for employment in a manner consistent with Federal, State and local laws, including but not limited to the San Francisco Fair Chance Ordinance. All qualified applicants will receive consideration for employment without regards to race, color, religion, sex, national origin, sexual orientation, gender identity, protected veteran status or disability status and any other classification protected by Federal, State and local laws

Back to top