We want people who are passionate about building apps that you and your peers will love. As a Software Engineer in the Insights team at DigitalOcean, the software you write will power the company-wide observability platform used by engineering teams across the organization. You will help define the next generation of cloud services and create flexible and powerful observability solutions that our internal teams will leverage to support our growing customer base. Your solutions will make developers' lives (both inside DO and in the 'wild') easier by building new systems and improving the efficiency and performance of existing systems.
The Insights team is responsible for building and operating DigitalOcean's internal observability platform, handling metrics, traces, and logs at scale.
Want more jobs like this?
Get Software Engineering jobs delivered to your inbox every week.
If your passion is building reliable, scalable observability systems that teams rely on every day, DigitalOcean is the right place for you.
What You'll Be Doing:
- Develop and maintain our Insights observability platform, including metrics collection, storage, visualization, and alerting systems.
- Own and drive initiatives independently from concept to production, with minimal oversight.
- Engineer solutions to meet both internal teams' and customers' observability needs.
- Create scalable services that are performant and highly reliable in a distributed environment.
- Take part in an on-call rotation and lead incident response efforts when needed.
- Operate complex distributed systems at scale with high reliability objectives.
- Maintain and improve our observability platform with a focus on enhancing reliability and performance.
- Mentor teammates and transfer knowledge through design docs, pairing sessions, and code reviews.
- Technologies we use: Kubernetes, Go, gRPC, MySQL, Redis, Kafka, Prometheus, Grafana, OpenTelemetry, and others
What You'll Add to DigitalOcean:
- Language: Strong proficiency in Go with at least 3 years of experience in designing, building, and shipping production-grade Go applications (required, not just desired).
- Proven Site Reliability Engineering (SRE) background with experience in implementing and maintaining reliable, scalable systems.
- Hands-on experience operating and managing Kafka clusters at scale.
- Extensive experience in designing, building, and running distributed systems in production environments.
- Demonstrable experience with observability platforms and tools (e.g., Prometheus, Grafana, OpenTelemetry).
- Familiarity with SLIs/SLOs and incident response best practices.
- Experience in one or more of the following areas:
- Distributed Databases like Mongo, Redis, MySQL, PostgreSQL, etc.
- Fully managed infrastructure solutions
- Serverless components
- Kubernetes
- Containers and Container Registries
- Demonstrated ability to navigate the complexity of distributed systems to operate them in production.
- Strong ability to contribute meaningfully to discussions on architectures, implementations, design patterns, and processes with the ability to succinctly convey ideas to peers and mentees.
- Effective knowledge transfer skills and ability to mentor team members.
- Experience in Agile software development methodologies.
- Extensive experience working within a microservice architecture, with deep knowledge of both asynchronous, event-driven processing (particularly Kafka), and synchronous gRPC/HTTP-based requests.
- Comfortable with rapid execution, learning from failure, and building for scale and reliability.
- Experience working effectively on teams that operate across multiple time zones.
Why You’ll Like Working for DigitalOcean
- We innovate with purpose. You’ll be a part of a cutting-edge technology company with an upward trajectory, who are proud to simplify cloud and AI so builders can spend more time creating software that changes the world. As a member of the team, you will be a Shark who thinks big, bold, and scrappy, like an owner with a bias for action and a powerful sense of responsibility for customers, products, employees, and decisions.
- We prioritize career development. At DO, you’ll do the best work of your career. You will work with some of the smartest and most interesting people in the industry. We are a high-performance organization that will always challenge you to think big. Our organizational development team will provide you with resources to ensure you keep growing. We provide employees with reimbursement for relevant conferences, training, and education. All employees have access to LinkedIn Learning's 10,000+ courses to support their continued growth and development.
- We care about your well-being. Regardless of your location, we will provide you with a competitive array of benefits to support you from our Employee Assistance Program to Local Employee Meetups to flexible time off policy, to name a few. While the philosophy around our benefits is the same worldwide, specific benefits may vary based on local regulations and preferences.
- We reward our employees. The salary range for this position is $120,000 - $142,700 based on market data, relevant years of experience, and skills. You may qualify for a bonus in addition to base salary; bonus amounts are determined based on company and individual performance. We also provide equity compensation to eligible employees, including equity grants upon hire and the option to participate in our Employee Stock Purchase Program.
- We value diversity and inclusion. We are an equal-opportunity employer, and recognize that diversity of thought and background builds stronger teams and products to serve our customers. We approach diversity and inclusion seriously and thoughtfully. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.
*This is a remote role
#LI-Remote
#LI-AB1