Principal Site Reliability Engineer - Advisory
As a Principal SRE at Atlassian, you will join an engineering-led company and the award-winning leader in software development and collaboration tools. With your deep understanding of modern engineering practices, your programming expertise and your operational experience, you will join a tactical taskforce that will engage with many teams across Atlassian to help reliably scale our Cloud products and platform. This is an amazing opportunity for you to impact a broad range of Atlassian teams and services from both a technical perspective, by assessing and recommending reliability-related technical changes; and a non-technical perspective, by enabling and empowering teams to adopt reliability best practices. If you crave variety, love building relationships with people and have a burning desire to make a difference, then this is the role for you.
More about you
On your first day, we'll expect you to have:
- Strong organizational and communication skills, with experience developing and instilling a culture of operational maturity.
- Ability to analytically select the best of a range of solutions, factoring in input from colleagues, documenting decisions along the way.
- Experience driving large, complex, cross-organisational initiatives.
- Strong stakeholder management and communication skills.
- An ability and desire to mentor and coach engineers.
- Expertise with software development in languages like Java, Go and Python.
- Hands on experience with public cloud offerings (AWS components like EC2, CloudFormation, IAM, RDS, S3, DynamoDB, Kinesis - or equivalents, e.g. in GCP)
- Experience managing complex systems in AWS that consume many types of AWS resources.
- Experience with configuration management tools (Ansible, Chef, Puppet, Salt, etc...).
- Working knowledge of datastores (RDBMS, time-series-database, NoSql, search, analytics).
- Experience operating software in production: building monitoring into your code, tweaking dashboards, defining alerts, writing runbooks, etc...
- Understanding of high-availability, fault-tolerant, scalable, distributed systems.
- Experience diagnosing and resolving problems in high-throughput web applications and network services.
- A "non-hero attitude": rather than celebrating heroic effort pulled off to resolve an incident, prefer engaging in engineering practices that avoid the incidents in the first place.
The following are not required, but definite bonuses:
- Experience with containerisation technologies like Docker, Kubernetes or Mesosphere.
- Experience with agile software development methodologies and software development best practices, such as unit testing, pair programming, and continuous integration.
- Experience engaging with and building trust amongst internal customers and/or developer communities.
- Experience working with remote teams.
- Experience with incident management processes and ITIL terminology for incident and problem management.
- Experience participating in 24/7 on-call rosters.
- Ability and willingness to learn new programming languages, frameworks and paradigms. Polyglots welcome!
More about our team
Atlassian Site Reliability Engineering is a rapidly growing group within the organization. We are in the process of building our teams, tools and systems as part of Atlassian's mission to build the best SaaS services in the world. This is a truly exciting team to join - we are currently or are planning to be involved with every technical team across Atlassian.
We enable Atlassian to go fast by providing real time feedback on production systems. We work side by side with the product family and platform developers to maintain and improve services and performance. We live the company values with a strong customer focus and possess a healthy sense of urgency. We are a heavily data driven team, utilising a variety of data collection, enrichment, analytics and visualisations to learn about our complex systems.
We also live the 'Play, as a team' value by having a strong focus on sharing learning experiences from the front line with the development teams. So, the options for people in the team are vast. If you like mastering a domain and going deep, we need you. If you can juggle three tasks and coordinate multiple people in the heat of an incident, we need you. If you love the benefits of process and methodical improvement, you will love it here. If you want to keep your head down, headphones on and bash out code to support the team, we have a spot for you too.
We believe that the unique contributions of all Atlassians is the driver of our success. To make sure that our products and culture continue to incorporate everyone's perspectives and experience we never discriminate on the basis of race, religion, national origin, gender identity or expression, sexual orientation, age, or marital, veteran, or disability status.
All your information will be kept confidential according to EEO guidelines.
Meet Some of Atlassian's Employees
Customer Advocate Manager
Rabya leads the Customer Advocate team in giving support to customers in all aspects of their Atlassian experience, answering non-technical inquiries through calls and email.
Back to top