At Navan, “It’s all about the user. All of them.” We’re passionate about providing a seamless one-stop experience for business travelers, no matter how they travel, where they stay, or where they’re going.
Are you passionate about building highly reliable and scalable systems? Do you thrive on tackling exciting challenges that come with exponential growth? Navan is looking for a talented Site Reliability Engineer (SRE-2) to join our world-class team in the Bay Area, where you'll play a crucial role in ensuring our services are always available for our travelers.
At Navan, we're committed to making our systems the most reliable and scalable possible. As an SRE, you'll be instrumental in achieving this by designing and developing the tooling, automation, and infrastructure services that power the Navan platform, used daily by thousands of travelers. You'll work closely with development, release, productivity, and security teams, identifying customer needs and building innovative solutions to meet them.
Want more jobs like this?
Get jobs in Palo Alto, CA delivered to your inbox every week.
Join us in shaping the future of travel technology! If you're an SRE excited by the prospect of integrating AI into robust reliability practices, we encourage you to apply.
What You'll Do:
- Design and develop cutting-edge tooling, automation, and infrastructure services that enhance the reliability and scalability of Navan's core systems.
- Leverage AI-driven insights and technologies to proactively identify potential issues, optimize system performance, and enhance our autonomous, fault-tolerant infrastructure.
- Collaborate across a diverse range of systems and technologies, with a focus on building infrastructure that is optimized for both simplicity and uptime.
- Partner with backend and frontend engineering teams to ensure product solutions are inherently scalable, efficient, and reliable from inception.
- Implement robust infrastructure to support our massive growth, ensuring we maintain the highest level of service for our global user base.
What We’re Looking For:
- 3+ years of experience in SRE, DevOps, or Infrastructure Software Engineering, with 1+ years in 24/7 production environments.
- Demonstrated interest or hands-on experience in applying AI/ML to enhance system reliability, observability, or automation.
- Proven track record of operating distributed systems in AWS or other public clouds, with strong experience in CI/CD pipelines (e.g., Jenkins, Maven).
- Understanding of microservice architecture and resiliency patterns such as throttling, queuing, and retries.
- Proficiency in Infrastructure as Code tools like Terraform or CloudFormation.
- Strong scripting skills in Python, Bash, or Groovy, with a passion for end-to-end automation.
- Hands-on experience with observability stacks such as New Relic, Grafana, and CloudWatch.
- Working knowledge of data stores including MySQL/RDS, DocumentDB, Elasticsearch, and Redis.
- Proficiency with Java-based services; experience with Python, Node.js, or Go is a plus. Familiarity with JVM performance tuning is highly desirable.
- A collaborative mindset with strong communication skills, able to work independently and comfortably across teams and disciplines.
- Thrives in a fast-paced, high-growth environment and ready to tackle complex system challenges at scale.
The posted pay range represents the anticipated low and high end of the compensation for this position and is subject to change based on business need. To determine a successful candidate’s starting pay, we carefully consider a variety of factors, including primary work location, an evaluation of the candidate’s skills and experience, market demands, and internal parity.
For roles with on-target-earnings (OTE), the pay range includes both base salary and target incentive compensation. Target incentive compensation for some roles may include a ramping draw period. Compensation is higher for those who exceed targets. Candidates may receive more information from the recruiter.