Senior Infrastructure Engineer, Site Reliability

The Site Reliability Engineering team is charged with ensuring and improving the availability of the site through improving tools, processes, and communication. SREs are exposed to the whole technology stack, from code down to the infrastructure so that they can drive improvements in how the stack is designed, deployed, and operated. We want to make it possible for product teams to launch in production with certainty through guidance on how to leverage the power of our Platform to achieve a high level of quality and reliability.

What the Job Entails

  • Developing automation and tools, to reduce toil and improve repeatability of processes.
  • Define reliability metrics, and work to ensure services meet them.
  • Develop runbooks and processes to reduce MTTR in incidents.
  • Collaborate with core infrastructure and service engineers to improve service reliability, scalability, and tooling.
  • Troubleshoot issues across the entire stack, software, hardware, cloud, and networking.
  • Participate in 24x7 on-call rotation.

Our Ideal Candidate

  • 4+ years of professional experience in server-side website development or 6+ years in a DevOps role
  • A strong foundation in OOP, design patterns, algorithms, and programming languages grounds your tools development.
  • You have a deep understanding of at least two of the following: linux internals, networking, MySQL, Docker, Kubernetes, or cloud infrastructure.
  • You’ve built tooling to improve reliability of systems, automated remediation of issues, or improve scalability.
  • You have 4 or more years experience working in production environments at scale, and want to improve our availability and performance.
  • Writing a script should come as second nature to you, and you should have experience with Python, Bash, Ruby, or Perl.
  • Systems often need to be reconfigured, so you should have experience with a configuration management system like Puppet, Chef or Salt. (We use Salt.)
  • You should be able to clearly communicate technical details when speaking or writing.
  • This position is part of a well established team, and you should be excited about working closely with them, and product development teams.
  • Working in the cloud is a little different, so it would be great if you have some experience with AWS or GCP.
  • Our environment often has new challenges and technologies, so we want a candidate who is excited to learn.

Meet Some of Credit Karma's Employees

Kyle G.


Kyle works behind the scenes as a revenue analyst to provide Credit Karma’s members with personalized offers that help them optimize their finances.

Maria P.


As a full-stack engineer, Maria does everything from back end to front end testing. She works hard to build scalable, testable, and maintainable products that Credit Karma’s members find easy to use.

Back to top