Site Reliability Engineer

About STATS
STATS collects the richest sports data in the world and transforms it through revolutionary AI to unlock the past, present and future of everything sport. The pioneer of live sports data, STATS continues to speed innovation in the industry with AutoSTATS, the first-ever AI-powered technology to collect comprehensive sports data from any television broadcast. The world's most innovative brands, technology companies, media, fantasy, gaming, teams and leagues trust STATS to provide world-class artificial intelligence solutions to engage billions of fans. STATS combines the industry's fastest and most accurate data platform with video analysis, sports content and research, player tracking, and the latest in AI and machine learning to provide unparalleled media and team performance solutions. For more information, go to www.stats.com and follow STATS on Twitter @STATS_Insights.
Job Description
Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. The main function of the SRE team is to be responsible for the availability, performance, monitoring, and incident response for STATS' internally critical and our customer-facing systems.

What You'll Do:

  • Engage in and improve the whole life cycle of services-from inception and design, through deployment, operation and refinement
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health
  • Scale systems through sustainable mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
  • Practice sustainable incident response and blameless postmortems.
  • Establish a mindset and a set of engineering approaches to running better production systems with focuses on optimizing existing systems, building infrastructure and eliminating work through automation
  • Establish a culture of diversity, intellectual curiosity, problem solving and openness to ensure team success
  • Create an environment that provides the support and mentorship needed to learn and grow
Skills & Requirements
What You'll Need:
  • B.S. in Computer Science or equivalent experience
  • Minimum of 3 years of experience with technical operations and software development
  • Solid understanding/experience of containerization services such as Docker
  • Working knowledge of open source tools such as Prometheus, Grafana, Logstash, Elasticsearch
  • Solid understanding/experience of web services, databases and relating infrastructure/architectures
  • Ability to manage using a preferred scripting language
  • Solid understanding of IT infrastructure
  • Excellent Troubleshooting Skills
  • DevOps experience a plus
  • System administration experience a plus
  • AWS cloud experience a plus
  • Supporting experience for enterprise-level SaaS environment a plus
  • Security experience a plus
  • Kubernetes experience a plus

STATS provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, sexual orientation, national origin, age, disability or genetics.


Back to top