Sr. Site Reliability Engineer - Navan Cognition.ai

1 month ago• Tel Aviv, Israel

At NavanCognition.ai, we’re building the next generation of AI-powered workforces. As a dedicated team within Navan, our mission is to advance the state of agentic AI. We are the builders of Navan Cognition: a multi-agent AI platform that has already transformed our internal operations by handling challenging, real-world business processes with a focus on reliability and accuracy. Now, we’re taking the next step by opening this technology up to other companies.

Joining our team means joining the frontline of AI innovation, crafting the foundation for a rapidly unfolding, AI-powered business era.

What You'll Do:

Design, build, and support tooling, automation, and infrastructure to maximize the reliability, scalability, and performance of Navan Cognition.
Proactively identify, mitigate, and resolve issues, leveraging AI-driven insights and automation where possible.
Develop robust monitoring, alerting, and incident response strategies; ensure actionable observability across all critical systems.
Drive best practices in CI/CD, Infrastructure-as-Code, environment provisioning, and disaster recovery.
Collaborate closely with engineering teams to build, deploy, and maintain highly available services in production.
Take responsibility for uptime, reliability, and the operational excellence of Navan Cognition.
Help define and measure SLOs/SLAs to ensure world-class service delivery.

What We’re Looking For:

Want more jobs like this?

Get jobs in Tel Aviv, Israel delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.

3+ years in Site Reliability, DevOps, or related Infrastructure Engineering roles in 24/7 production environments.
Deep experience operating, automating, and supporting distributed systems on AWS or similar clouds.
Experience with Infrastructure-as-Code (e.g., Terraform, CloudFormation) and CI/CD tooling (e.g., Jenkins, Github Actions, etc.).
Strong skills in Python, Bash, or comparable scripting languages for automation.
Hands-on experience with observability stacks (e.g., New Relic, Grafana, CloudWatch, Datadog) and incident response.
Familiarity with microservices architectures and patterns for resilience/scalability (e.g., throttling, retries, circuit breakers).
Experience with common data stores (MySQL/RDS, DocumentDB, Elasticsearch, Redis).
Working knowledge of Node.js/TypeScript backends (bonus: performance optimization and monitoring); experience with Java, Python, or Go is a plus.
Interest or experience in applying AI for infrastructure automation, monitoring, or optimization (a strong plus).
A collaborative mindset with strong communication skills, able to work independently and comfortably across teams and disciplines.
Thrives in a fast-paced, high-growth environment and ready to tackle complex system challenges at scale.

Data-driven, analytical thinker with the ability to dive into metrics, identify insights, and drive product improvements
Startup-ready: thrive in fast-paced, ambiguous environments; bias for learning, action, and innovation

Client-provided location(s): Tel Aviv, Israel

Job ID: 7131909

Employment Type: OTHER

Posted: 2025-09-08T18:35:05

Perks and Benefits

Health and Wellness
Parental Benefits
Work Flexibility
Office Life and Perks
Vacation and Time Off
Financial and Retirement
Professional Development
Diversity and Inclusion

Joining our team means joining the frontline of AI innovation, crafting the foundation for a rapidly unfolding, AI-powered business era.

What You'll Do:

What We’re Looking For:

Want more jobs like this?

Perks and Benefits

Health and Wellness

Parental Benefits

Work Flexibility

Office Life and Perks

Vacation and Time Off

Financial and Retirement

Professional Development

Diversity and Inclusion