Site Reliability Engineer, HipChat

Company Description

Love staying ahead of the growth curve and experimenting with new software and environments? Get on board as an Atlassian Site Reliability Engineer.

As part of the Site Reliability team for HipChat, you’ll build solutions to enhance HipChat availability, performance and stability. You'll also respond to HipChat alerts and issues that you can investigate and really sink your teeth into. You'll be working on development and production HipChat environments, automation, monitoring, data collection and configuration management. The best person for this role is someone that has a collaborative spirit - in our world, it’s not about being a hero and having all the answers, it’s about sometimes saying "I don't know" and working on finding solutions rather than starting with an assumption. The team needs someone who can ask questions, learn from others and turn chaos into order.

This role would be a great fit for someone that likes to see concrete results from their work as you will be developing and delivering solutions that operate at scale - seeing your own technology efforts grow and evolve, from a proof of concept to millions of users in production.

One thing we promise: you’ll never be bored.

Must have experience:

  • Scripting and software development - we don't want you doing repetitive work! Python is required
  • Must have hands on experience with AWS - minimum of 1 year experience
  • Management and troubleshooting of a continuous integration pipeline
  • Monitoring distributed systems application architectures
  • Configuration management tools - Chef is prefered
  • Serious troubleshooting skills across different levels of the stack

We'd be super excited if you have:

  • Expertise with Docker, Mesos, Cassandra, and Golang!
  • We'd dig someone with DataDog experience... And an appreciation of bad jokes
  • Understanding of ITIL terminology like incident and problem would be a plus
  • Finally: A clear head in the presence of alerts / Nerf darts
  • Awareness and insight into industry trends (technology, methods and tooling)

More about our team

Atlassian Site Reliability Engineering is a recently formed and rapidly growing group within the organisation. We are in the process of building our teams, tools and systems as part of Atlassian's mission to build the best SaaS services in the world. This is a truly exciting team to join - we are currently or are planning to be involved with every technical team across Atlassian.

We enable Atlassian to go fast by providing real time feedback on production systems. We work side by side with the product family and platform developers to maintain and improve services and performance. We live the company values with a strong customer focus and possess a healthy sense of urgency. We are a heavily data driven team, utilising a variety of data collection, enrichment, analytics and visualisations to learn about how complex systems.

We also live the 'Play, as a team' value by having a strong focus on sharing learning experiences from the front line with the development teams. So, the options for people in the team are vast. If you like mastering a domain and going deep, we need you. If you can juggle three tasks and coordinate multiple people in the heat of an incident, we need you. If you love the benefits of process and methodical improvement, you will love it here. If you want to keep your head down, headphones on and bash out code to support the team, we have a spot for you too.

Atlassian. Powered by You.

Job Description

We believe that the unique contributions of all Atlassians is the driver of our success. To make sure that our products and culture continue to incorporate everyone's perspectives and experience we never discriminate on the basis of race, religion, national origin, gender identity or expression, sexual orientation, age, or marital, veteran, or disability status.

Back to top