Platform Reliability Engineer
- Dallas, TX
Homecare Homebase, a subsidiary of Hearst Corporation is a market leader in healthcare software development providing mobile cloud-based solutions for clinical, operational, and financial improvement of homecare and hospice agencies throughout the United States. Our software enables real time solutions for wireless information exchange and communication between office staff, field staff, and physicians.
Our success is fueled by our talented technology teams that are driven by their passion to make a difference in patient care. Our employees work in a culture that is guided by values of caring, action, respect, excellence, and smile (a positive attitude). If you want to work in a role where your skills have a direct influence on patient care, Homecare Homebase is the next step in your career. We are hiring technologists that want to make a difference.
Platform Reliability Engineer
The Platform Reliability Engineer is a technical leader who assures infrastructure and system application operational alignment to the system design and business strategy. The Platform Reliability Engineer documents the system, analyzes impacts of new requirements, and delivers the correct technical solutions in alignment with architectural and business requirements. The Platform Reliability Engineer's efforts are largely proactively and focused on efficiency, automation, and reducing costs-taking manual and repetitive tasks and automating them. Overall, the Platform Reliability Engineer has strong technical abilities, a sound understanding of modern infrastructure design and automation, and possesses advanced knowledge and skill in implementing relevant solutions.
ESSENTIAL DUTIES AND RESPONSIBILITIES:
Responsibilities for the Platform Reliability Engineer can vary, but should include:
- Act as a technical subject matter expert and point of escalation and provide technical direction to team-members and evangelize best practices and methodologies.
- Conduct necessary analysis, design and prepare technical documentation and runbooks for new toolsets and processes.
- Stay current on applicable industry trends and integrated technologies.
- Cultivate strong working relationships with scrum teams, engineers, architects, vendors, contractors and leadership.
- Leverage, support, and advocate the use of configuration management tools for infrastructure in a hybrid cloud model.
- Support services before they go live through activities such as system design consulting, developing runbooks, capacity planning and launch reviews.
- Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
- Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
- Practice sustainable incident response, incident management, and blameless postmortems.
- 3+ years experience in 24x7 production environments.
- 3+ years Windows and/or Linux administration and enterprise production experience.
- 2+ years Kubernetes and Cloud administration and automation experience.
- Proficient with scripting (eg. Bash - Powershell).
- Proficient with security best practices in server configuration, tool development, and access controls.
- Proficient with administration of Linux or other Unix variants (Ubuntu, CentOS, RedHat, Solaris, etc) in a production environment.
- Proficient with networking and troubleshooting (TCP/IP, DNS, HTTP, routing, switching, firewalls, LAN/WAN, traceroute, iperf, dig, cURL or related).
- Proficient with administration, automation, and orchestration of large-scale Windows and Linux environments using configuration management solutions (Ansible, Puppet, Chef, DSC).
- Proficient with administering and managing infrastructure as code (git, Azure DevOps, Azure CLI, ARM, Terraform, PowerShell).
- Direct experience migrating solutions to Azure (comparable cloud platforms considered).
- Proficient in administering monitoring solutions such as Splunk, Application Insights, Azure Monitor, or SCOM.
- Experience working in a regulated industry such as HIPAA, PCI, SOX.
- Experience working in an Agile and/or SAFe environment.
- Strong written and verbal interpersonal skills.
- Strong customer focus, ownership, bias for action and the ability to dive deep.
- Excellent problem solving and analytical skills with attention to detail and driving issues to resolution.
- Excellent ability to align business process and requirements with a technical implementation.
- DevOps mindset practitioner and change agent.
- Write engineering level documentation and develop operational excellent standard operating procedures and run books with a bias towards automation.
- Design systems management solutions using automation and self-repair rather than relying on alarming and human intervention.
- Develop appropriate metrics and monitors to ensure operational excellence for services being supported.
- Bachelor degree in Computer Science, Engineering, Math or related (equivalent experience considered).
- Candidates with relevant certifications are preferred, including but not limited to the following:
- ITIL Foundations
- Configuration: RHCE-Ansible
- Kubernetes - CKA, KCSP
- Linux - RHCE, CompTIA Linux+, GCUX, LPI
- Microsoft: Azure Administrator, Azure DevOps Engineer, MCSE
Back to top