Site Reliability Engineer

3+ months agoOttawa, IL

Discover. A brighter future.

With us, you'll do meaningful work from Day 1. Our collaborative culture is built on three core behaviors: We Play to Win, We Get Better Every Day & We Succeed Together. And we mean it - we want you to grow and make a difference at one of the world's leading digital banking and payments companies.We value what makes you unique so that you have an opportunity to shine.

Come build your future, while being the reason millions of people find a brighter financial future with Discover.

Job Description

The Principal Software Engineer/Site Reliability Engineer (SRE) role will handle responsibilities for operational stability, resiliency, automation and performance from an application development perspective. SRE's have an intense passion for finding and improving efficiencies with development, infrastructure and deployment automation. This role will work closely with our Architects, Software Development & Engineering teams as well as with CI/CD and Infrastructure teams to build mature, production-ready services and applications. As part of the SRE team, you will help define our standards for monitoring, alerting, scalability, and production-readiness. You will monitor and report on the uptime of our systems and services, the performance of our applications, and the capacity of our platform.

The Software Engineering teams develop and maintain full stack solutions to fit business needs. Full stack solutions require one or more of the following: front-end (user interfaces), back-end (APIs), database and devops development. Works directly with business partners to understand business requirements. Independently innovates on and advocates for best practices within the team, and collaborates on them across the enterprise. Carries a holistic view of all products and their interactions to design complex solutions and plan for how new projects will fit into the larger ecosystem. Solves complex technical problems. Guides team in implementation of solutions from inception to production and help them ensure operational excellence.

Come join us in a rewarding and fast-paced position as a Software Engineer/SRE in the Card Acquisitions Value Stream. Our domain supports the new credit card application platform at Discover, which is key to supporting the business growth of the Discover Card account portfolio. Our domain delivers solutions by using Agile development methodologies to support existing and new product features.


  • Work with the team and leadership to develop the long term Site Reliability Engineering road map
  • Designing and developing automated solutions, monitoring, diagnostic and debug tools to help improve troubleshooting and recovery
  • Develops and maintains complex front-ends with a focus on user experience
  • Develops and maintains backend systems
  • Uses holistic knowledge of all products in the team's ecosystem to plan how new systems will be built and integrated
  • Innovates on and advocates for best practices and improved processes within the team and with internal partners; stays up to date with technology trends and innovations; mentors team member
  • Creates and maintains devops processes, application infrastructure, and utilizes cloud services (including database systems and models)
  • Supports live systems to ensure business continuity

Minimum Qualifications:
  • Bachelor's Degree in Information Technology
  • 8+ years in Computer Science, Information Technology or Equivalent Experience
  • In lieu of degree, 10 + years in Computer Science, Information Technology or Equivalent Experience

Preferred Skills:
  • Experience as part of an Agile engineering or development team
  • Experience with DevOps and SRE best practices
  • Familiar with Pair Programming (XP) methodologies
  • Experience with Java, Spring framework, and scripting languages
  • Excellent communication skills
  • Proficient in monitoring, alerting, analyzing and troubleshooting large scale distributed systems
  • Solid understanding of defining and executing high availability, disaster recovery, resiliency and chaos engineering testing
  • Experience with automation, monitoring and log analysis tools to manage operations
  • Familiar with AppDynamics or other monitoring and diagnostic tools
  • Strong understanding of object-oriented principles with an ability to write clean code
  • Strong experience working with a relational database and NoSQL database
  • Strong experience with CI/CD pipelines with Jenkins or similar; Git/GitHub; Artifactory
  • Proven skills in high availability and scalability design, as well as performance monitoring
  • Experience developing and implementing API service architecture
  • Experience in working in a cloud environment such as AWS, GCP or Azure.
  • Understanding of messaging systems like MQ, Rabbit MQ, Kafka, or Kinesis.
  • Strong experience of developing multi-threaded and synchronization application
  • Experience building secure web applications with user authentication
  • Understanding of software testing principles and methodologies

What are you waiting for? Apply today!

The same way we treat our employees is how we treat all applicants - with respect. Discover Financial Services is an equal opportunity employer (EEO is the law). We thrive on diversity & inclusion. You will be treated fairly throughout our recruiting process and without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status in consideration for a career at Discover.

Job ID: Discover-R3800