Engineer - Guest Reliability
Description: Target as a tech company? Absolutely. We're the behind-the-scenes powerhouse that fuels Target's passion and commitment to cutting-edge innovation. We anchor every facet of one of the world's best-loved retailers with a strong technology framework that relies on the latest tools and technologies-and the brightest people-to deliver incredible value to guests online and in stores. Target Technology Services is on a mission to offer the systems, tools and support that guests and team members need and deserve. Our high-performing teams balance independence with collaboration, and we pride ourselves on being versatile, agile and creative. We drive industry-leading technologies in support of every angle of the business, and help ensure that Target operates smoothly, securely and reliably from the inside out.
As an engineer, you serve as a technical specialist delivering the engineering that powers the product. You develop keen insight into the technical architecture and design to deliver robust and scalable software components. You constantly demonstrate the depth of your expertise by solving engineering problems. You are passionate about the quality of software and balance between speed of delivering new features and robustness of the software components you implement. You can handle operational issues with little or no oversight. You actively review code to ensure the software quality and functional accuracy is maintained across the team. You are keen to learn the design and architecture of the product and participate in ceremonies that can influence both. Core responsibilities of this job are described within this job description.
The Guest Reliability Engineer is responsible for driving the reliability of our applications and infrastructure so that we avoid - or if we cannot avoid quickly resolve - service disruptions. As a GRE at Target, you will do this via a combination of IT operational work and automating your learnings from doing this work. Put simply, you will substitute software for human labor in recoveries of our systems. In addition, you'll get to work on ensuring the following: availability, latency, performance, efficiency, change management, monitoring, emergency response and capacity planning for our systems.
- Design, write and deliver software/automation to improve the recoverability, availability, scalability, latency, and efficiency of products.
- Monitor and recover multiple applications within the following product groups of Stores, Supply Chain, Corporate Applications, and Infrastructure.
- Provide preventative activities, proactive monitoring, troubleshooting and quick resolution of events and incidents to ensure infrastructure and application stability.
- Prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions.
- Design, write and deliver monitors and dashboards that improve predictability and are actionable in a proactive manner.
- Day-to-day operational management, including response, incident, event and problem management activities.
- Understand different platforms, applications, hardware and infrastructure and how they interact.
- Enhance Knowledge repository that helps reduce recovery times for disruptions in service.
- Consult with architects and other senior engineers across projects or services to complete architectural and technical design deliverables
- Provide technical oversight to others resolving high severity hardware, operational, infrastructure and application incidents
- Oversee preventative maintenance, troubleshooting and quickly resolve problems to ensure infrastructure and application stability
- Provide thought-leadership within team and to the broader community to promote re-use and develop consistent technical build, implementation and support processes
- Lead the design, lifecycle management, and total cost of ownership of platforms, applications and infrastructure services
- Provide input to strategic technical roadmap for platform or infrastructure services
- BA/BS or equivalent experience
- 1-3 years total work experience
- Has in-depth knowledge of state-of-the art engineering technical approaches in design, build, testing, debugging problems as required by domain
- Maintains technical knowledge within areas of expertise
- Stays current with new and evolving technologies via formal training and self-directed education
- Proficient in the following technology areas:
- Java/C#/C++ Programming Languages
- MySQL or SQL Server
- PowerShell, Ruby, or Python
- Knowledge of scripting languages and skills to build scripting and automation, VBScript, Windows PowerShell, Perl, Windows Management Instrumentation, Windows Remote Management, and Microsoft System Center suite of tools
- Technical aptitude and skills around Microsoft Windows, with desire to build domain application knowledge and ServiceNow skills.
- Technical knowledge of operations hardware and applications.
- Excellent communication skills and ability to manage vendor partners.
- Strong reasoning, troubleshooting, problem solving and analytical skills.
- A desire to not do repetitive activities instead utilize coding skills to reduce human labor.
Back to top