Data Center Operations Engineer
ROBLOX – The Imagination Platform
ROBLOX has grown by almost 4X in the past year, and we expect to do the same over the next couple. Our production environment is a hybrid of AWS and physical data centers (we are achieving solid cost reductions with this mix at scale). Through this growth, we blew through our original cage, expanded into another, and are now expanding to a second physical location. Our gaming cloud spans 30+ data centers throughout North America, Europe, and Asia. We serve 30M monthly active users, and are are rapidly approaching 1M peak concurrent users.
Our infrastructure management system is highly automated, enabling engineers to interact with thousands of servers across multiple environments in parallel via the command line. Our application space includes database scalability, caching, message queuing, wide-table search, recommendations, elasticsearch, redis, mobile, and a proprietary gaming cloud management. Our stack is a mix of Windows and Linux.
Come join the ROBLOX data center team and help us grow our footprint globally
- Maintain uptime SLA’s for a 24/7/365 production environment
- Triage, troubleshoot, and solve production issues caused by failures in hardware, configuration, networking, vendor outages, and continuous software upgrades
- Quickly resolve complex problems encountered during the installation and operation of our applications, to ensure negligible impact on players and internal operations
- Maintain and upgrade bios, drivers, and OS specification for all vendor hardware
- Prioritize and multitask daily responsibilities, while being flexible enough to respond to emergent high-priority issues
- Assist in implementing disaster recovery procedures and system failover efforts
- Collect, document and help manage vendor-related services
- Participate in on-call rotations
- 10 years of experience in production data center support and management
- Bachelor’s degree in Engineering or Computer Science, or equivalent work experience
- Extensive knowledge of Windows and Linux operating systems
- Extensive knowledge of core networking technologies, routers, switches firewall and load balancers, with demonstrable knowledge of either Juniper or Cisco equipment
- Experience with server performance and failure monitoring/diagnostics
- Ability to size hardware and capacity plan equipment and data centers
- Knowledge of debugging common scripting languages: PowerShell, bash, python.
- Expertise with automation, monitoring, and alerting tools
- Experience with automation tools (Chef, Puppet, Orchestra, etc.)
- Expertise with monitoring and alerting tools
Nice To Have:
- Experience working in a physical data center, installing and maintaining hardware
Perks & Benefits:
- Robust medical, dental and vision insurance
- Flexible time-off
- Wellness reimbursement
- Free onsite parking & other commuter benefits
- Free catered lunches & a fully stocked kitchen with unlimited snacks!
- Chance to work with a top-notch team on cool and unique projects!
ROBLOX is a powerful technology platform that allows users of all ages to create games, play and socialize in immersive 3D worlds. Over 22 million user generated games have been produced on the ROBLOX platform with more than 28 million players coming each month to socialize, learn, and play in worlds that stretch imaginations. ROBLOX was ranked #37 in the Consumer Products and Services category for overall revenue in INC Magazine’s 2016 5000 Fastest Growing Private Companies in America.
Meet Some of Roblox's Employees
Director Of Engineering
Isaiah guides team workflow for ROBLOX's product development. He troubleshoots and streamlines the process, and supplies the tools and support to help team members thrive.
Back to top