Systems Reliability Engineer

Systems Reliability Engineer

About Us
At Cloudflare, we have our eyes set on an ambitious goal: to help build a better Internet. Today, Cloudflare runs one of the world’s largest distributed networks that powers more than 10 trillion requests per month, which is nearly 10 percent of all Internet requests worldwide. Cloudflare protects and accelerates any Internet application online without adding hardware, installing software, or changing a line of code.

Our customers range from Fortune 500 companies and nonprofits to small businesses and budding entrepreneurs. We are working to create a faster, more secure, and more reliable experience for anyone online and given the scale at which we operate, our mission is big. Our team is hard at work shaping the future of the Internet by solving some of its toughest challenges.

Come join us!

About the Role
An engineering role at Cloudflare provides an opportunity address some big challenges, at scale. We believe that with our talented team, we can solve some of the biggest security, reliability and performance problems facing the Internet. Just how big?

-We have in excess of 15 Terabits of network transit capacity
-We process 10% of the world’s Internet traffic
-We operate 153 Points-of-presence around the world
-We serve more traffic than Twitter, Amazon, Apple, Instagram, Bing, & Wikipedia combined.
-Anytime we push code, it immediately affects over 200 million internet users
-Every day, up to 20,000 new customers sign-up for Cloudflare service
-Every week, the average Internet user touches us more than 500 times

We are looking for talented Systems Reliability Engineers to build and operate the platform which makes Cloudflare customers place their trust in us. Our SREs come from a variety of technical backgrounds and have built up their knowledge working in different environments. But the common factors across all of our reliability-focused engineers include a passion for automation, scalability, and operational excellence. Our SRE teams monitor our network in a “follow the sun” approach with offices in Singapore, London, and San Francisco.

We are still a small team, well-funded, growing quickly and focused on building an extraordinary company. This is a superb opportunity to join a high-performing team and scale our high-growth network as Cloudflare’s business grows. You will build tools to constantly improve availability, performance, uptime and response times. You will nurture a passion for an “automate everything” approach that makes systems failure-resistant and ready-to-scale.

Cloudflare SREs work in one of these 4 teams:
-Core Operations
-Edge Operations
-Core Platform
-Edge Platform

The Operations teams focus on the immediate state and functionality of the Cloudflare platform around the world, leveraging an array of monitoring, alerting and diagnostics tools. The Platform teams focus on developing and enhancing the Cloudflare platform and its capabilities. The Platform and Operations team are both “devops” teams, responsible for reliability engineering across a wide portfolio of applications and services, leveraging developer and operator patterns. Many of our SREs have had the opportunity to work at multiple offices on interim and long-term project assignments. The ideal SRE candidate has a passionate curiosity about how the Internet fundamentally works and has a strong knowledge of DNS, Linux and TLS along with strong coding ability in Bash, Python or Go. We prefer to hire very experienced candidates; however raw skill trumps experience and we welcome strong junior applicants.

Required Skills
-Proven Linux systems administration experience
-Relevant Site Reliability Engineering experience
-Proven software development skills in Python, Go or SQL
-Strong skills in network services, including DNS, TLS/SSL and HTTP
-Track record in network fundamentals DHCP, ARP, subnetting, routing, firewalls, IPv6
-Previous experience with the Linux kernel and Linux software packaging

Daily Duties
-Develop and optimise the current monitoring cluster;
-Oversee the development of our database clusters (PostgreSQL and Clickhouse);
-Take part in our weekly operational oncall schedule;
-Mentor and guide new engineers;
-Architect and scale our current platform and ensure its reliability

What Makes Cloudflare Special?
We’re not just a highly ambitious, large-scale technology company. We’re a highly ambitious, large-scale technology company with a soul. Fundamental to our mission to help build a better Internet is protecting the free and open Internet.

Sound like something you’d like to be a part of? We’d love to hear from you!

Location: London
Apply by: April 4th 2019
Salary: Competitive

Back to top