Skip to main contentA logo with &quat;the muse&quat; in dark blue text.

Sr. Site Reliability Engineer - Navan Cognition.ai

Yesterday Tel Aviv, Israel

At NavanCognition.ai, we’re building the next generation of AI-powered workforces. As a dedicated team within Navan, our mission is to advance the state of agentic AI. We are the builders of Navan Cognition: a multi-agent AI platform that has already transformed our internal operations by handling challenging, real-world business processes with a focus on reliability and accuracy. Now, we’re taking the next step by opening this technology up to other companies. 

Joining our team means joining the frontline of AI innovation, crafting the foundation for a rapidly unfolding, AI-powered business era.

What You'll Do:

  • Design, build, and support tooling, automation, and infrastructure to maximize the reliability, scalability, and performance of Navan Cognition.
  • Proactively identify, mitigate, and resolve issues, leveraging AI-driven insights and automation where possible.
  • Develop robust monitoring, alerting, and incident response strategies; ensure actionable observability across all critical systems.
  • Drive best practices in CI/CD, Infrastructure-as-Code, environment provisioning, and disaster recovery.
  • Collaborate closely with engineering teams to build, deploy, and maintain highly available services in production.
  • Take responsibility for uptime, reliability, and the operational excellence of Navan Cognition.
  • Help define and measure SLOs/SLAs to ensure world-class service delivery.

What We’re Looking For:

  • 3+ years in Site Reliability, DevOps, or related Infrastructure Engineering roles in 24/7 production environments.
  • Deep experience operating, automating, and supporting distributed systems on AWS or similar clouds.
  • Experience with Infrastructure-as-Code (e.g., Terraform, CloudFormation) and CI/CD tooling (e.g., Jenkins, Github Actions, etc.).
  • Strong skills in Python, Bash, or comparable scripting languages for automation.
  • Hands-on experience with observability stacks (e.g., New Relic, Grafana, CloudWatch, Datadog) and incident response.
  • Familiarity with microservices architectures and patterns for resilience/scalability (e.g., throttling, retries, circuit breakers).
  • Experience with common data stores (MySQL/RDS, DocumentDB, Elasticsearch, Redis).
  • Working knowledge of Node.js/TypeScript backends (bonus: performance optimization and monitoring); experience with Java, Python, or Go is a plus.
  • Interest or experience in applying AI for infrastructure automation, monitoring, or optimization (a strong plus).
  • A collaborative mindset with strong communication skills, able to work independently and comfortably across teams and disciplines.
  • Thrives in a fast-paced, high-growth environment and ready to tackle complex system challenges at scale.

Want more jobs like this?

Get jobs in Tel Aviv, Israel delivered to your inbox every week.

Job alert subscription
  • Data-driven, analytical thinker with the ability to dive into metrics, identify insights, and drive product improvements
  • Startup-ready: thrive in fast-paced, ambiguous environments; bias for learning, action, and innovation
Client-provided location(s): Tel Aviv, Israel
Job ID: 7131909
Employment Type: OTHER
Posted: 2025-09-08T18:35:05

Perks and Benefits

  • Health and Wellness

    • Parental Benefits

      • Work Flexibility

        • Office Life and Perks

          • Vacation and Time Off

            • Financial and Retirement

              • Professional Development

                • Diversity and Inclusion