Engineering Manager: Edge SRE
- San Francisco, CA
At Cloudflare, we have our eyes set on an ambitious goal: to help build a better Internet. Today the company runs one of the world’s largest networks that powers trillions of requests per month. Cloudflare protects and accelerates any Internet application online without adding hardware, installing software, or changing a line of code. Internet properties powered by Cloudflare have all web traffic routed through its intelligent global network, which gets smarter with every request. As a result, they see significant improvement in performance and a decrease in spam and other attacks. Cloudflare was recognized by the World Economic Forum as a Technology Pioneer and named to Entrepreneur Magazine’s Top Company Cultures list.
We realize people do not fit into neat boxes. We are looking for curious and empathetic individuals who are committed to developing themselves and learning new skills, and we are ready to help you do that. We cannot complete our mission without building a diverse and inclusive team. We hire the best people based on an evaluation of their potential and support them throughout their time at Cloudflare. Come join us!
About the department
As part of the Cloudflare Engineering organization, SREs are primarily responsible for production reliability. SREs are based in San Francisco, London, Singapore, Austin and Lisbon and use the global distribution to enable follow the sun coverage which allows work to be focused in business hours in each location.
SREs are supported by all engineering teams at Cloudflare who participate in on call schedules for their services. The SRE teams facilitate remediation and follow up of production issues and mature the tooling to enable all engineering teams to self-service on production. Incident follow up work across all engineering teams is prioritized above product innovation and the impact of production incidents influences the priority.
Currently SREs support two main environments: Edge SRE are focused on edge distribution where most client traffic is served. Core SRE are focused on the core services like control plane, data pipeline and other supporting supporting services
Edge SRE project work is organized in four development areas: Platform Engineering, Production Tooling, Hardware Lifecycle and Observability.
Who you are
- You have 5+ years of software engineering, reliability, or operations experience in a customer-focused environment.
- You have 2+ years experience managing a team of 5 or more engineers on projects in the areas of: distributed systems, tooling, Linux, Internetworking, infrastructure security or infrastructure management
- You are comfortable collaborating and co-ordinating on cross-team projects and workflows
- You can provide a strong technical vision for systems and infrastructure teams
- You have experience building services and systems, have successfully taken projects from inception to production, and are comfortable diving in to provide leadership for major projects when needed
- You are capable of leading a discussion with upper management, and are able to tailor the level of technical detail to suit your audience
What you'll do
We are looking for an Engineering Manager to join the Edge SRE team in San Francisco. You will lead and develop a team of SREs that are responsible for Cloudflare edge production and building the tools for all teams to understand and interact with it. You will play a lead role in driving our Observability initiatives for edge services and will be tasked with leading engineers who build tools and best practices for engineering teams to debug in production, measure availability and performance indicators, track and report on thresholds.
- Lead a team of engineers who are working to keep the Cloudflare edge reliable and scalable
- Mentor, grow, and empower your team by giving them the skills, confidence and motivation to make decisions
- Help the individuals on your team to build and execute personal development plans that align with Cloudflare’s goals and objectives
- Take an active role in prioritizing the roadmap for the SRE Org
- Drive cross-team and cross-org alignment in engineering, infrastructure and product teams
- Partner with other Engineering Managers across Cloudflare to achieve reliability outcomes for their services
- Participate in deep technical design discussions within your team, and across partner teams, and ensure that we're building the right systems and keeping the quality high
Examples of desirable skills, knowledge and experience
- Hands-on experience with software or reliability engineering
- Experience leading and hiring a team that builds and runs tools and platforms
- Excel at planning and overseeing execution to meet commitments and deliver with predictability
- Observability: Tracking and refining key customer
- Incident root cause analysis and follow-ups
- Incident management
- Comfortable managing teams/projections with deadlines and short release cycles
- Experience using observability tools such as Jaeger, OpenTracing, ELK, Prometheus, Thanos, Grafana, Clickhouse
- Experience running and maturing distributed systems
- Familiarity working with Proxies, DNS, Databases, Internet and Security
- Experience developing tools and APIs
What Makes Cloudflare Special?
We’re not just a highly ambitious, large-scale technology company. We’re a highly ambitious, large-scale technology company with a soul. Fundamental to our mission to help build a better Internet is protecting the free and open Internet.
Project Galileo: We equip politically and artistically important organizations and journalists with powerful tools to defend themselves against attacks that would otherwise censor their work, technology already used by Cloudflare’s enterprise customers--at no cost.
Athenian Project: We created Athenian Project to ensure that state and local governments have the highest level of protection and reliability for free, so that their constituents have access to election information and voter registration.
Path Forward Partnership: Since 2016, we have partnered with Path Forward, a nonprofit organization, to create 16-week positions for mid-career professionals who want to get back to the workplace after taking time off to care for a child, parent, or loved one.
Sound like something you’d like to be a part of? We’d love to hear from you!
Cloudflare is proud to be an equal opportunity employer. We are committed to providing equal employment opportunity for all people and place great value in both diversity and inclusiveness. All qualified applicants will be considered for employment without regard to their, or any other person's, perceived or actual race, color, religion, sex, gender, gender identity, gender expression, sexual orientation, national origin, ancestry, citizenship, age, physical or mental disability, medical condition, family care status, or any other basis protected by law. We are an AA/Veterans/Disabled Employer.
Cloudflare provides reasonable accommodations to qualified individuals with disabilities. Please tell us if you require a reasonable accommodation to apply for a job. Examples of reasonable accommodations include, but are not limited to, changing the application process, providing documents in an alternate format, using a sign language interpreter, or using specialized equipment. If you require a reasonable accommodation to apply for a job, please contact us via e-mail at email@example.com or via mail at 101 Townsend St. San Francisco, CA 94107.
Back to top