Site Reliability Engineer
Thomson Reuters (TR) is trusted for the decisions that matter most, empowering customers to act with confidence in a complex world.
Enterprise Technology & Operations (ET&O) @TR stores 60,000 Terabytes of data and handles 79 billion market messages per day; critical information which must always be available to our customers.
Site Reliability Engineers in ET&O are responsible for ensuring the global availability of our mission critical customer-facing products and services.
We place a high degree of trust and responsibility in the hands of our SREs. While it is likely that no SRE will possess all of the skills on this page we seek candidates who have the required core skills, will master some, and are willing to learn the vast majority.
Will you join us?
The SRE Role
- SREs are engineers with the right mix of knowledge and skills in software engineering (i.e. programming, data structures and algorithms) and systems engineering (i.e. applying scientific principles of experimentation and observation to entire systems to improve reliability, performance and operability).
- We constantly evaluate products and services before and after production releases to prevent, identify and fix problems that impact service availability in deploying, configuring, monitoring, recovering, and scaling.
- We participate in on-call rotations to monitor and support our products and services, taking recovery actions prior to and after disruptions.
- We dedicate at least 50% of our time 'engineering away' problems both, directly and through pairing and coaching our team.
- We work side-by-side with SREs in our team applying software engineering principles to resolve problems impacting service uptime or our operational efficiency.
Our SRE Culture
- To accomplish our mission and continue to build our internal DevOps culture, we embrace and are strong advocates of the CALMS framework.
- We seek to eliminate manual and repetitive operations tasks at every opportunity by exploiting open source tools, contributing to open source projects and building new tools when required.
- We value technical aptitude, innovative thinking and a great learning ability above proficiency with a specific toolset.
Required Core Skills for all SREs
- Systems configuration and administration: Windows or Linux.
- Analyzing and discovering how all components of a distributed system work together using a broad range of skills and tools.
Possess or will learn quickly
- Applying an evidence based approach to solving system problems under pressure and in real time to provide the fastest path to service recovery.
- System and software configuration management using tools such as puppet, chef or ansible.
- Cloud technologies and platforms such as AWS or Azure using API or configuration tools.
Skills for SRE Specializations (SRE-SWE and SRE-SE)
SREs have diverse backgrounds such as software development and systems administration, from which their experience is often biased towards software engineering (SRE-SWE) or systems engineering (SRE-SE). We strongly value the breadth and depth of skills and diversity of thinking this brings to our team.
While it is likely that no SRE will possess all of the skills on this page we seek candidates who have the required core skills, will master some, and are willing to learn the vast majority.
- Object-Oriented design, design patterns and programming following clean coding practices.
- Agile/lean development practices such as Scrum, XP and agile design.
- Data structures and algorithms.
- Software testing frameworks that support TDD and BDD.
- Automating software build and testing using tools such as Jenkins.
- Database programming, schema design and query optimization (relational and NoSQL).
- Writing code to drive system engineering activity such as system testing, load generation, instrumentation, log analysis, performance monitoring, error simulation and deep discovery of system properties.
- Conducting investigation across any system component and related systems to discover and rectify performance bottlenecks and sources of unreliability.
- Applying scientific principles of experimentation and measurement to system components to identify improvements to the configuration and architecture which improve reliability, performance and operability.
- Network flow analysis and troubleshooting.
- Selection, design and tuning storage systems for reliability and performance.
- Configuring, analyzing and tuning (relational and NoSQL) database systems to improve reliability and performance.
- Configuring and tuning web servers, application containers, message queueing systems and other middleware to improve reliability and performance.
At Thomson Reuters, we believe what we do matters. We are passionate about our work, inspired by the impact it has on our business and our customers. As a team, we believe in winning as one – collaborating to reach shared goals, and developing through challenging and meaningful experiences. With over 50,000 employees in more than 100 countries, we work flexibly across boundaries and realize innovations that help shape industries around the world. Bring your ambition to make a difference. We'll bring a world of opportunities.
As a global business we rely on diversity of culture and thought to deliver on our goals. To ensure we can do that, we seek talented, qualified employees in our operations around the world regardless of race, color, sex/gender, including pregnancy, gender identity and expression, national origin, religion, sexual orientation, disability, age, marital status, citizen status, veteran status, or any other protected classification under country or local law. Thomson Reuters is proud to be an Equal Employment Opportunity Employer providing a drug-free workplace.
Intrigued by a challenge as large and fascinating as the world itself? Come join us.
To learn more about what we offer, please visit thomsonreuters.com/careers.
More information about Thomson Reuters can be found on thomsonreuters.com.
Meet Some of Thomson Reuters's Employees
Madlyn works with business leaders at Thomson Reuters to ensure all employees are engaged and operating as efficiently as possible in their respective positions.
Back to top