Staff Site Reliability Engineer
San Francisco, United States
TextNow is based around a simple idea: Communication belongs to everyone. We work hard to help people stay connected with free phone service. At TextNow, we work together to solve complex and interesting problems that have a positive impact on our customers' lives.
TextNow is looking for motivated Site Reliability Engineers (SRE's) to own back-end services, front-end services, infrastructure components, and everything in between. You'll also be working closely with software engineers to set up continuous delivery to ensure quick and smooth feature deployment.
Join us on our mission to help people stay connected with technology that is free (or as close to free as possible).
What You'll Do:
- You will be responsible for maintaining and scaling production services and servers for complex and high throughput cloud services (AWS).
- You will improve scalability, service reliability, capacity, and performance.
- You will write automation code for provisioning and operating infrastructure at massive scale.
- Build tools for internal use to support software engineering best practices.
- You are not an operator, you're an experienced software engineer focused on operations.
- You will work with development teams to make sure the applications fit nicely within the infrastructure and scalability/reliability/security is designed and implemented from the start.
- You will participate in on-call rotation, being responsible for Text Now uptime and supporting the infrastructure.
- You will roll up the sleeves to troubleshoot incidents, formulate theories and test your hypothesis, and narrow down possibilities to find the root cause.
Who You Are:
- B.S., M.S., or PhD. in Computer Science or equivalent
- 10+ years of professional experience in an operationally-focused role, such as DevOps or SRE strongly preferred.
- Strong knowledge of UNIX/Linux and experience working with open source software (such as MariaDB, Redis, HAProxy, Nginx)
- Experience working with MySQL databases
- Minimum of 2 years of experience with programming languages (Python, Ruby, etc.)
- Understanding of modern web stacks and architecture (HTTP, REST)
- Experience with automation/configuration management tools such as Puppet, Chef, Ansible, Salt, Fabric, Docker, etc.
- Experience with database administration
- Experience with deploying web apps to cloud infrastructure (AWS, etc.) and working with distributed, service-oriented architecture
- Strong work life blend
- Flexible work arrangements (wfh, remote)
- Employee Stock Options
- Unlimited vacation
- Competitive pay and benefits
- Parental leave top up
Back to top