Skip to main contentA logo with &quat;the muse&quat; in dark blue text.
Atlassian

Sr. Site Reliability Engineer, HipChat

Job Description

Excel at delivering exceptional availability and reliability for your customer facing cloud services? Atlassian Site Reliability Engineering wants you!

As a Sr. Engineer on the Site Reliability Engineering Team for HipChat, you’ll use your technical chops and people skills to enhance HipChat availability, reliability and performance. You'll also respond to HipChat alerts and issues that you can investigate and really sink your teeth into. You'll work on disaster recovery planning, capacity engineering, reliability improvement initiatives, platform automation, and much more. The best person for this role is someone that has a collaborative spirit - it’s not about being a hero and having all the answers, it’s about working as a team to find solutions that enhance HipChat user experience. The team needs someone who can listen, lead, and turns chaos into order.

Want more jobs like this?

Get jobs delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.

This role would be a great fit for someone that likes to see concrete results from their work. You will develop and implement solutions that operate at scale - seeing your own technology efforts grow and evolve, from proof of concept to millions of users in production.

One thing we promise: you’ll never be bored!

More about you

On your first day, you will have experience in:

  • Scripting and software development across multiple programing languages
  • Deep understanding of Linux systems
  • Network optimization and troubleshooting: TCP/IP, UDP, ICMP, MAC addresses, DNS, OSI layers, and load balancing
  • Building, automating, and maintaining infrastructure in Amazon Web Services
  • Development and maintenance of configuration management systems responsible for thousands of hosts
  • Leading a team of engineers in troubleshooting service outages affecting millions of users
  • Implementing system and application level telemetry for large distributed cloud architectures
  • Diagnosing and resolving capacity problems in high-throughput web applications and network services

We'd be super excited if you have:

  • Expertise in Docker, Mesos, Cassandra, and Golang
  • Experience monitoring cloud services with DataDog
  • Understanding of ITIL terminology
  • Awareness and insight into industry trends (technology, methods and tooling)

More about our team

Atlassian Site Reliability Engineering is a recently formed and rapidly growing group within the organisation. We are in the process of building our teams, tools and systems as part of Atlassian's mission to build the best SaaS services in the world. This is a truly exciting team to join - we are currently or are planning to be involved with every technical team across Atlassian.

We enable Atlassian to go fast by providing real time feedback on production systems. We work side by side with the product family and platform developers to maintain and improve services and performance. We live the company values with a strong customer focus and possess a healthy sense of urgency.

We also live the 'Play, as a team' value by having a strong focus on sharing learning experiences from the front line with the development teams. So, the options for people in the team are vast. If you like mastering a domain and going deep, we need you. If you can juggle three tasks and coordinate multiple people in the heat of an incident, we need you. If you love the benefits of process and methodical improvement, you will love it here. If you want to keep your head down, headphones on and bash out code to support the team, we have a spot for you too.

Atlassian. Powered by You.

We believe that the unique contributions of all Atlassians is the driver of our success. To make sure that our products and culture continue to incorporate everyone's perspectives and experience we never discriminate on the basis of race, religion, national origin, gender identity or expression, sexual orientation, age, or marital, veteran, or disability status.

Job ID: 100436604
Employment Type: Other

Perks and Benefits

  • Health and Wellness

    • Health Insurance
    • Dental Insurance
    • Vision Insurance
    • Life Insurance
    • Short-Term Disability
    • Long-Term Disability
    • FSA
    • HSA With Employer Contribution
    • Fitness Subsidies
    • Mental Health Benefits
    • On-Site Gym
    • HSA
  • Parental Benefits

    • Adoption Leave
    • Birth Parent or Maternity Leave
    • Non-Birth Parent or Paternity Leave
    • Fertility Benefits
    • Adoption Assistance Program
    • Family Support Resources
  • Work Flexibility

    • Flexible Work Hours
    • Remote Work Opportunities
    • Hybrid Work Opportunities
    • Work-From-Home Stipend
  • Office Life and Perks

    • Holiday Events
    • Casual Dress
    • Pet-friendly Office
    • Happy Hours
    • Snacks
    • Some Meals Provided
    • On-Site Cafeteria
  • Vacation and Time Off

    • Paid Vacation
    • Unlimited Paid Time Off
    • Paid Holidays
    • Personal/Sick Days
    • Volunteer Time Off
    • Sabbatical
    • Leave of Absence
  • Financial and Retirement

    • 401(K) With Company Matching
    • Company Equity
    • Performance Bonus
    • Relocation Assistance
    • Financial Counseling
  • Professional Development

    • Access to Online Courses
    • Internship Program
    • Leadership Training Program
    • Tuition Reimbursement
    • Learning and Development Stipend
    • Promote From Within
  • Diversity and Inclusion

    • Founder led
    • Employee Resource Groups (ERG)
    • Diversity, Equity, and Inclusion Program

Company Videos

Hear directly from employees about what it is like to work at Atlassian.

This job is no longer available.

Search all jobs