Senior Big Data Architect - Search ML Data Platform

3+ months agoPalo Alto, CA


The Amazon Search team builds the search engine that powers Amazon's world wide shopping experience. Whenever a customer searches or browses using an Amazon website or application, we connect them to the products and services they are looking for.

Search ML Data Platform team is responsible for delivering high quality and fresh ML model training data, and providing seamless access to all ML artifacts through managed Federated Data Lake infrastructure. This big-data platform provides the ML training data to Amazon search ranking, matching quality, search economics and also powers live-site features, including search suggestions, query understanding, spelling, search result ranking, and personalization. More than 350 teams across Amazon consume our datasets. We are located in downtown Palo Alto, a short walk from numerous shops and restaurants, and right across from the Caltrain station.

As a Senior Software Development Engineer you will:
• Data Definition: Work with Scientists to define our data products providing a stable schema that they can use to train the model while modeling ever changing Search User Experience worldwide. You will work backwards from these customers to define data models of all the intermediate data-sets, all the way to data model of our inputs such as logging done by Search Experience.
• Data Organization: You will define how we organize PB of our data on physical storage, and create appropriate indexes to make it easy to access for wide variety of use cases such as ML Training, Analytics, and privacy compliance.
• Data Quality: You will work with our customer to define metrics for data quality that reflect their concerns and relative priority amongst those concerns such as freshness, precision, and completeness. You will use these to define quality metrics for all the intermediate data-sets and take it all the way to define data quality metics for our inputs. You will define SLA for our data input providers for data quality and influence our architecture to gracefully degrade with imperfect data and providing mechanisms for our customers to understand quality of various data-sets so they can use it appropriately. You will define how we do trade-off between various concerns such as freshness, completeness, precision and cost.
• Design & Develop: Lead the design, get your hands dirty and write code, and ultimately deploy big data and machine learning services. These services define the foundation of our search R&D processes, supporting science, product development and production of the worlds largest product search engine.
• Operational Excellence: Obsess over operational excellence, evaluate system performance, security, design system metrics and driving quality improvements
• Obsess over customer needs and satisfaction
In this role, you'll help establish technical standards and drive the Search Data organization's overall Data architecture and Engineering practices. You'll work on the hardest problems, building high quality, architecturally sound systems that are aligned with our business needs and built to handle Amazon's world wide scale. Your expertise is deep and broad; you're hands on, producing both detailed technical work and high-level architectural designs.


At least 10 years of experience in all of the following:
• Software development and design.
• Writing production code using Java, Scala, and Python
At least 5 years of recent experience in:
• Data Architecture and Data processing for ML and Analytical application
• Defining data and processing interfaces and getting buy-in from across the organization
• Leading delivery of projects requiring work from multiple organizations
• Data transformation/ETL tools and technologies and understanding of related concepts (such as data cataloging and curation, etc.)
• Big data infrastructure such as Hadoop, Spark, Kubernetes,etc.
• Implementation and tuning experience in the Big Data Ecosystem, (such as Hadoop, Spark, Presto, Hive), Database (such as Oracle, MySQL, PostgreSQL, MS SQL Server) and Data Warehouses (such as Redshift, Teradata, Vertica)


• Graduate degree in computer science or related field (MS or Ph.D.)
• Experience with public cloud infrastructure.
• AWS Certification, e.g. AWS Solutions Architect, Developer, or SysOps Associate/Professional
• Data-driven and "quantitative" mentality. Grounded, detail-oriented, always backs up ideas with facts
• Ability to understand complex application data flows and bridge the gap between technical and business app requirement
• Track record of implementing AWS services in a variety of business such as large enterprises and start-ups