Site Reliability Engineer

Job Description

Love staying ahead of the growth curve and experimenting with new software and environments? Get on board as an Atlassian Site Reliability Engineer.

As a Sr. Engineer in one of our Site Reliability Engineering teams, you’ll build solutions to enhance availability, performance and stability of Atlassian services as well as automating away repetitive work. You'll also respond to pings, pages and alerts to investigate issues in our products that you can really sink your teeth into. You'll be working on non-production and production environments, monitoring, data collection and configuration management, as well as disaster recovery planning, capacity engineering, reliability improvement initiatives and platform automation. The best person for this role is someone that has a collaborative spirit - in our world, it’s not about being a hero and having all the answers, it’s about sometimes saying "I don't know" and working on finding solutions rather than starting with an assumption. The team needs someone who can ask questions, learn from others and turn chaos into order.

This role would be a great fit for someone with creative and innovative problem solving skills with a willingness to take responsibility for the code you write all the way to production. You will develop and implement solutions that operate at scale - seeing your own technology efforts directly improve the reliability of our products. Our teams are empowered and expected to improve our products to truly deliver a reliable experience to customers. You will own development efforts in each and every sprint from planning to delivery to realise this goal and collaborate with different team to review code.

One thing we promise: you’ll never be bored.

More about you

On your first day, you will have experience in:

  • Scripting and software development across one or more programming languages - we don't want you doing repetitive work! Knowledge of Java is required. 
  • Deep understanding of Linux systems
  • Monitoring distributed systems application architectures
  • Exposure to and maintenance of configuration management and orchestration tools at scale
  • Diagnosing and troubleshooting user facing service outages
  • Exposure to system and application level telemetry for large distributed cloud architectures
  • Diagnosing and resolving problems in high-throughput web applications and network services

We'd be super excited if you have:

  • Expertise with one or more of Python, bash, perl, Golang
  • Experience with container management and microservices architectures such as Docker
  • Building, automating, and maintaining infrastructure in Amazon Web Services
  • Experience monitoring cloud services with DataDog
  • Understanding of ITIL terminology for incident and problem management
  • Awareness and insight into industry trends (technology, methods and tooling)
  • Management and troubleshooting of a continuous integration pipeline
  • Experience leading teams of engineers in service outage situations

More about our team

Atlassian Site Reliability Engineering is a recently formed and rapidly growing group within the organisation. We are in the process of building our teams, tools and systems as part of Atlassian's mission to build the best SaaS services in the world. This is a truly exciting team to join - we are currently or are planning to be involved with every technical team across Atlassian.

We enable Atlassian to go fast by providing real time feedback on production systems. We work side by side with the product family and platform developers to maintain and improve services and performance. We live the company values with a strong customer focus and possess a healthy sense of urgency. We are a heavily data driven team, utilising a variety of data collection, enrichment, analytics and visualisations to learn about our complex systems.

We also live the 'Play, as a team' value by having a strong focus on sharing learning experiences from the front line with the development teams. So, the options for people in the team are vast. If you like mastering a domain and going deep, we need you. If you can juggle three tasks and coordinate multiple people in the heat of an incident, we need you. If you love the benefits of process and methodical improvement, you will love it here. If you want to keep your head down, headphones on and bash out code to support the team, we have a spot for you too.

Atlassian. Powered by You.


Back to top