Senior Data Scientist, Ice Team (Innovation)


The Innovation Program is a software development team that builds solutions to address the everyday pain points of engineers. We solicit input through surveys, interviews, internal networks, and principal engineering communities to evaluate and prioritize opportunities. We are seeking a Sr. Data Scientist to build and analyze SDE output (tech survey feedback, commits, code reviews, resolved tickets, task points, deployments, phone screens, design docs) and draw correlations from the data that will shape best practices across the Amazon developer community. This information will be used to provide summary analysis to executive management along with feed into the roadmap of internal development teams. Our team will drive resolution of the top engineering pain points along with design solutions that integrate seamlessly into existing development workflow at minimal cost to engineers. We quantify the impact of solutions by evaluating changes in behaviors, metrics, team composition, and future survey feedback.

In 2016, our goal was to design a measurement framework to evaluate how our solutions would affect everyday output. We began collecting SDE metrics (commits and code reviews) to baseline users and setup an experiment to measure the impact of eliminating wait times during the build and test workflow. Our hypothesis is that eliminating wait times encourages behavioral changes that shift from larger, less frequent commits to smaller, more iterative changes. Smaller changes require less time to review, reduce reviewer fatigue, and increase the potential for error detection.

In 2017, we will target 2-3 projects that will reduce or eliminate developer pain by automating manual work, improving developer education, providing structure to manage career growth, and/or improving quality. We will expand our SDE output metrics to improve our visibility into potential impact from our work. We will identify correlations to deep dive looking for hypotheses to validate and study. Our goal will be to isolate practices or tools used by higher performing teams and integrate them into the default engineer workflow. Our success will be measured by our ability to deliver solutions that integrate with existing tools, deliver measurable impact, and provide continuous improvement.

As an Senior Data Scientist you will be working in one of the world's largest and most complex data warehouse environments. You should be an expert in the architecture of DW solutions for the Enterprise using multiple platforms (RDBMS, Columnar, Cloud). You should excel in the design, creation, management, and business use of extremely large datasets. You should have excellent business and communication skills to be able to work with business owners to develop and define key business questions, and to build data sets that answer those questions. Above all you should be passionate about working with huge data sets and someone who loves to bring datasets together to answer business questions and drive change.

Basic Qualifications

  • A desire to work in a collaborative, intellectually curious environment.
  • Degree in Computer Science, Engineering, Mathematics, or a related field or 7+ years industry experience
  • Demonstrated strength in data modeling, ETL development, and data warehousing.
  • Data Warehousing Experience with Oracle, Redshift, Teradata, etc.
  • Experience with Big Data Technologies (Hadoop, Hive, Hbase, Pig, Spark, etc.)
  • MS in Statistics, Computer Science, or Mathematics
  • Experience in data mining, machine learning techniques and statistics
  • The ability to distill problem definitions, models, constraints from informal business requirements; and to deal with ambiguity and competing objectives.
  • Ability to translate business requirements into solutions
  • Ability to analyze mined data and extrapolate conclusions

Preferred Qualifications

  • Industry experience as a Data Scientist or related specialty (e.g., Software Engineer, Business Intelligence Engineer, Data Eningeer) with a track record of manipulating, processing, and extracting value from large datasets.
  • Coding proficiency in at least one modern programming language (Python, Ruby, Java, etc)
  • Experience building/operating highly available, distributed systems of data extraction, ingestion, and processing of large data sets
  • Experience building data products incrementally and integrating and managing datasets from multiple sources
  • Query performance tuning skills using Unix profiling tools and SQL
  • Experience leading large-scale data warehousing and analytics projects, including using AWS technologies – Redshift, S3, EC2, Data-pipeline and other big data technologies
  • Experience providing technical leadership and mentor other engineers for the best practices on the data engineering space
  • Linux/UNIX including to process large data sets.
  • Experience with AWS

Meet Some of Amazon's Employees

Mae M.

Senior UX Designer

Mae integrates human-centered design into tools that enable business partners to operate efficiently and intuitively. She analyzes customer needs and pain points to improve designs.

Heather Z.

Director of Alexa Engagement

Heather focuses on building great customer experiences for Alexa users. She heads a team of technical and creative professionals who bring the product to life.

Back to top