Tech Lead, Site Reliability Engineer

Job Description

Enterprise Technology & Operations (ET&O) stores 60,000 Terabytes of data and handles 79 billion market messages per day; critical to our customers success.

Site Reliability Engineers (SREs) in ET&O are software and systems engineers who are responsible for ensuring the global availability of our products and services. These SREs implement the principles and practices described by Google's SRE, develop services that look and feel like AWS, and are operate them using the best of ITIL.

We place a high degree of trust and responsibility in the hands of our SREs and expect them to follow DevOps. We recognize that SREs won't be superstars in every area. We are looking for those who have the required skills now, and want to continue to learn.

In return we ensure every individual has interesting problems to investigate, the resources they need, and ways to improve their skills and advance their careers.

Will you join us?

Are you a hands on engineer who loves coding and thrives when working to develop new systems and improve existing ones? Do you want to build solutions that scale? Do you want to be an internal partner and contribute to the service marketplace? We need engineers with the right mix of knowledge and skills in software engineering and systems engineering who want to operate our services at scale.

Our SRE Culture

  • We partner with Development and Product Management across the firm and are strong advocates of the CALMS framework.
  • We create the right framework that enables every individual to contribute to solving problems for the team and the organization.
  • We seek to eliminate manual and repetitive operational toil by design when it is cost effective.
  • We reuse whenever possible, exploiting open source tools and contributing to open source projects, but build new tools when required.
  • We value interpersonal skill, technical aptitude, innovative thinking and learning ability above proficiency with a specific toolse t.

The SRE Job Family

SREs evaluate services before and after production releases to prevent, identify and fix problems that impact service availability in deploying, configuring, monitoring, recovering, and scaling. They work closely with product development groups in a collaborative DevOps environment to maintain the highest level of service up-time. They all participate in on-call rotations, taking action to avoid interruptions and after they occur to recover from them. By design, SREs dedicate at least 50% of our time 'engineering away' problems and adding features our customers require.

Required Skills for all SREs
  • Basic-level programming experience in at least one language such as: Java, C#, Javascript, Python or Ruby.
  • You configure and administrate systems using either Windows or Linux.
  • You design scalable systems and services, connecting distributed systems together using a broad range of skills and tools.
  • You apply an evidence based approach to solving service problems in real time to provide the fastest path to recovery.

Software Engineering Skills
  • You are familiar with object-oriented design, data structures, algorithms and can follow coding best ractices and standards.
  • You excel in an agile/lean development framework like Scrum.
  • You automate software build and testing using tools like Jenkins that support test driven development.
  • You design databases, and understand query optimization against them.
  • You understand cloud technologies and platforms such as AWS or Azure and use their APIs and configuration tools.

Systems Engineering Skills
  • You write code to test systems, generate load, instrument, analyze, profile and discover system properties and attributes.
  • You use configuration management (tools; puppet, chef or ansible) to expertly manage configuration at scale.
  • You investigate system components discovering and removing performance bottlenecks and sources of unreliability.
  • You applying the scientific method to system components to identify improvements to the configuration and design to improve reliability, performance and operability.
  • You select, configure, analyze and tune [Network, Storage, Database, Web Applications, Application Containers, Message Queuing] systems.

The Financial and Risk Business of Thomson Reuters is now Refinitiv. Refinitiv equips the financial community with access to an open platform that uncovers opportunity and catalyzes change. With a dynamic combination of data, insights, technology, and news from Reuters, our customers can access solutions for every challenge, including a breadth of applications, tools, and content-all supported by human expertise. At Refinitiv, we facilitate the connections that propel people and organizations to find new possibilities to move forward.

As a global business, we rely on diversity of culture and thought to deliver on our goals. Therefore we seek talented, qualified employees in all our operations around the world-regardless of race, color, sex/gender, including pregnancy, gender identity and expression, national origin, religion, sexual orientation, disability, age, marital status, citizen status, veteran status, or any other protected classification under country or local law. Refinitiv is proud to be an Equal Employment Opportunity/Affirmative Action Employer providing a drug-free workplace.

Intrigued by a challenge as large and fascinating as the world itself? Come join us.

St. Louis-Missouri-United States of America

Back to top