Software Developer - Data Infrastructure
Shopify's platform is expanding at an incredible rate. Every day we generate vast amounts of information, which can be leveraged to make better decisions, and ultimately make us a more successful company. Data Infrastructure is responsible for building and maintaining the platform that powers these decisions.
We need to deliver a rock solid data platform. To achieve this we need deep systems experts with a software engineering mindset. We prefer automation, tooling, and scripting over hands on administration and configuration. We leverage existing tools whenever possible, including Hadoop, Hive, Presto, Spark, as well as build our own when necessary.
As a member of this team you will spend your days investigating and deploying new technology infrastructure, maintaining and growing our existing infrastructure, and working with data engineers and analysts to provide the tools they need and working to make data accessible across the company.
What has Data Infrastructure done at Shopify recently?
- Simplified ad hoc data access: To enable fast and efficient access to all data on the cluster (both raw and modelled data) we deployed Presto and integrated it with Hue which is accessible to everyone in the company
- Managed and built out core infrastructure: Over the past two years we have gone from an initial Hadoop cluster of 16 nodes to 180 nodes, which runs over 1000 Spark and MapReduce jobs per day
- Spark job abstraction: Built an abstraction layer using pySpark to make writing Spark jobs accessible to wider development teams and analysts
- Improve Spark join performance: Implemented a Spark join strategy that offloads random key access to Cassandra allowing Spark to pipeline jobs more efficiently
You'll need to be:
- A systems-level thinker who can work anywhere in the stack, from right beside the OS and upwards
- A low-level generalist who is comfortable with multiple languages such as Bash, Ruby, and Python
- Proficient with configuration management tools, such as Chef
- Experience with a variety of open source software
- Experienced with database administration: query optimization, OLTP and OLAP schema design, hardware considerations
- Experienced building large distributed systems at scaleEnjoy research and figuring out how entire pieces of the stack work
- Passionate about working on a team: collaborating on problems, asking questions, delivering feedback, and supporting others in their goals
It’d be great if you have experience with:
- Working with data at petabyte scale
- Managing or using Kafka, Spark, Presto, Hive, YARN or Hadoop
- Cloudera Hadoop Distribution and Cloudera Manager
- Networking with Arista or Juniper gear
- Amazon Web Services or Google Cloud Platform
- Docker, Kubernetes (or other container orchestration/resource management tools)
- Database administration: query optimization, OLTP and OLAP schema design, hardware considerations
- Cassandra or other NoSQL data storesSplunk or other log aggregation tools
You'll be working on things like:
- Ensuring that Hadoop stays online, secure, and performant
- Day-to-day system administration of the production environment
- Capacity planning
- Development of configuration management and automation tools
- Evaluating cloud and data center options for our cluster(s)
- Providing automated on demand test cluster deployments
- Deploying machine learning infrastructure to our cluster(s)
- Scalability, performance, and availability of the Hadoop and Spark infrastructure
- Collaborating with other Data Engineers and Data Analysts throughout your day-to-day activities
Here’s how to apply:
If you’re interested in helping us shape the future of commerce, click the “Apply now” button to submit your application. Please address your cover letter to Yandu Oppacher.
Experience comes in many forms, many skills are transferable, and passion goes a long way. If your background is this close to what we’re looking for, please consider applying, even if you aren’t able to check every box above. We are dedicated to diversity and providing an inclusive workplace for all and especially encourage members of underrepresented groups to apply.
Back to top