Site Reliability Engineer (SRE)
- London, United Kingdom
Thought Machine is solving one of the biggest problems in banking. Since launching in 2014, our mission has been to liberate banks from outdated, legacy technology which stifles their ability to innovate. Thought Machine's core product is a cloud native, core banking engine built to run any type of bank - from established Tier 1 banks all the way to new challenger banks. To move closer to achieving our mission, we are looking for highly talented individuals to join the fast growing team. With a founding team drawn from Google, we have a deep culture of engineering excellence and we believe it is this which delivers a solution compelling enough to engender a seismic shift in the banking industry. Thought Machine was ranked as the most desirable London fintech to work at by Sifted when comparing employee reviews on Glassdoor and named in Fintech 50 (2020). We pride ourselves on having an excellent internal culture, where we take cultural fit as important as technical fit when we make new hires. At Thought Machine, we strive hard to create a fast-paced, supportive and fun working environment to enable the team to produce the best technical work in the industry. Site Reliability Engineers at Thought Machine take responsibility for deploying our software into production. As well as traditional DevOps roles, your focus will be on writing and maintaining software with the aim of automating the deployment processes. DUTIES Supporting the engineering team in building highly fault-tolerant, scalable applications. Developing tools to ensure our services can scale and are highly available. We always try to manage our ops tasks with automation, by adopting open source tools or developing bespoke tools as required Being part of the 24x7 on-call rota, helping support and maintain production systems Day to day development support and monitoring of production server and network environments by developing and deploying logging and monitoring tools. Developing applications to increase code quality throughout our codebase. Supporting disaster recovery, backup, redundancy and capacity planning activities.
Back to top