Amazon

Site Reliability Engineer, Blink

6 days agoRemote

DESCRIPTION

Job summary

Blink's Site Reliability Engineers (SREs) are responsible for keeping all of our user-facing services running smoothly. As an SRE you will be part of the team supporting our production infrastructure and ensuring customers have a great experience with our products.

Key job responsibilities

  • Monitor and support the AWS Cloud components of the Blink home security system
  • Work with the latest AWS technology and tools with access to all of Amazon's internal resources
  • Identify and resolve service issues while diving deep into understanding root causes
  • Develop, maintain, and improve monitoring solutions
  • Automate repetitive processes
  • Perform infrastructure maintenance and configuration
  • Work closely with the software development teams to ensure that platforms are designed with operability in mind
  • Ensure our systems are resilient and fault-tolerant
  • Assist with initiatives for upgrading and scaling our systems to improve availability and performance
  • Create and maintain operational runbooks and other documentation
  • Participate in a 24/7 on-call rotation. Each engineer is on-call about 1 week per month


BASIC QUALIFICATIONS

  • 2+ years of hands-on troubleshooting in highly available, highly scalable, mission-critical environments
  • 2+ years of experience using and configuring system health and application performance monitoring tools
  • 2+ years of experience of UNIX/Linux operating system administration
  • 2+ years of experience managing cloud infrastructure and developing operational processes
  • Proficiency in Python, Ruby, Perl, Bash, or any popular programming language
  • Clear written and verbal communication skills


PREFERRED QUALIFICATIONS

  • Experience writing Ansible playbooks and modules
  • Experience performance tuning SQL queries and schema
  • In-depth understanding of HTTP, experience debugging REST applications and implementing security
  • In-depth understanding of networking and tools used to debug networking issues (tcpdump, netstat, lsof, etc.)
  • Experience using Jenkins for building, deploying and automating job creation with DSL
  • AWS EC2, ELB, ECS, AutoScaling, IAM, S3, RDS, DynamoDB
  • DNS, SSH, HTTP, TCP/IP and other common network protocols
  • Understanding of Continuous Integration / Continuous Delivery (CI/CD) and Agile software engineering practices

Job ID: Amazon-1419091

Company Videos

Hear directly from employees about what it's like to work at Amazon.