Senior Data Engineer - Ec2 Capacity Data Analytics

3+ months agoCape Town, South Africa


EC2 Capacity Data Analytics (CDA) team is looking for a Senior Data Engineer to join our team.

Our team is part of the EC2 Capacity Engineering organization, which is responsible for providing the elasticity EC2 customers need to scale up/down compute resources in a cost-efficient manner. We predict customer usage across thousands of configuration combinations to deliver exactly what our customers require in just the right amount of time with just the right amount of capacity.

As a Senior Data Engineer, you will lead the team in building the ETL and analytics solutions for our internal customers to answer questions with data and drive critical improvements for the business. You will use best practices in software engineering, data management, data storage, data compute, and distributed systems to influence the team data architecture. You will actively mentor/develop other engineers in the team.

On any given day, we use Python, Scala, Java, SQL, Lambda, CloudFormation, Redshift and Glue as well as other public AWS services and a host of Amazon internal tools. We don't expect you to be an expert in, or necessarily even be familiar with all of the technologies listed above, but we do expect you to be excited to learn about them.

This position involves on-call responsibilities, typically for one week every two months. Our team is dedicated to supporting new team members. We care about your career growth, we try to assign projects and tasks based on what will help each team member develop into a more well-rounded engineer and enable them to take on more complex tasks in the future.

Our team values work-life balance and we are flexible when people occasionally need to work from home.

Job Duties
Lead the team to develop and maintain automated ETL pipelines for big data using languages such as Scala, Spark, SQL and AWS services such as S3, Glue, Lambda, SNS, SQS, KMS. Example: ETL jobs that process a continuous flow of JSON source files and output the data in a business-friendly Parquet format that can be efficiently queried via Redshift Spectrum using SQL to answer business question.

Design and implement data architecture to support analytics use cases that yield significant data quality, availability and/or business value.

Develop and maintain automated ETL monitoring and alarming solutions using Java/Python/Scala, Spark, SQL, and AWS services such as CloudWatch and Lambda.

Implement and support reporting and analytics infrastructure for internal business customers using AWS, services such Athena, Redshift, Spectrum, EMR, and QuickSight.

Develop and maintain data security and permissions solutions for enterprise scale data warehouse and data lake implementations including data encryption and database user access controls and logging.

Develop and maintain data warehouse and data lake metadata, data catalog, and user documentation for internal business customers.

Develop, test, and deploy code using internal software development toolsets. This includes the code for deploying infrastructure and solutions for secure data storage, ETL pipelines, data catalog, and data query.


• Bachelor's degree in Computer Science or related technical field, or equivalent work experience.
• 7+ years of overall work experience in either Software Engineering, Data Engineering, Database Engineering, Business Intelligence, Bigdata Engineering, Data Science.
• Experience with AWS technologies stack including Lambda, Glue, Redshift, RDS, S3, EMR or similar big data solutions stack


• Demonstrate efficiency in handling data - tracking data lineage, ensuring data quality, and improving discoverability of data.
• Demonstrable proficiency in distributed systems and data architecture; design and implementation of batch and stream data processing pipelines; knows how to optimize the distribution, partitioning, and MPP of high-level data structures.

Job ID: Amazon-1330591