Senior Data Engineer

3+ months agoNew York, NY

Who We Are

Yieldmo is an advertising technology company that operates a smart exchange that differentiates and enhances the value of ad inventory for buyers and sellers. As a leader in contextual analytics, real time technology, and digital formats, we create, measure, model, and optimize campaigns for unmatched scale and performance. By understanding how each unique impression behaves and looking for patterns and performance in real time, we can drive real performance gains without relying on audience data.

What We Need

As a member of the Yieldmo Data Team you are expected to build innovative data pipelines for processing and analyzing our large user datasets (250 billion + events per month). A unique challenge with the role is being comfortable developing in varied technologies including: Develop custom transformation/integration apps in Python and Java, and build pipelines in Spark, Kafka, Kinesis, transforming and analyzing in SQL.


  • Develop ETL (Extract, Transform and Load) Data pipelines in Spark, Kinesis, Kafka, custom Python apps  to transfer massive amounts of data (over 20TB/ month) most efficiently between systems
  • Engineer complex and efficient and distributed data transformation solutions using Python, Java, Scala, SQL
  • Productionalize Machine Learning models efficiently utilizing resources in clustered environment
  • Research, plan, design, develop, document, test, implement and support Yieldmo proprietary software applications
  • Analytical data validation for accuracy and completeness of reported business metrics
  • Open to taking on, learn and implement engineering projects outside of core competency
  • Understand the business problem and engineer/architect/build an efficient, cost-effective and scalable technology infrastructure solution
  • Monitor system performance after implementation and iteratively devise solutions to improve performance and user experience
  • Research and innovate new data product ideas to grow Yieldmo’s revenue opportunities and contribute to company’s intellectual property


  • A passion for Data
  • BS or higher degree in computer science, engineering or other related field
  • 5+ years of Object Oriented Programming experience in languages such as Java, Scala, C++ or MA degree plus 3+ years of progressive experience in OOP in languages such as Java, Scala, C++
  • 3+ years of experience of developing in Python to transform large datasets on distributed and cluster infrastructure
  • 3+ years of experience in engineering ETL data pipelines for Big Data Systems
  • Prior experience of designing and building ETL infrastructure involving streaming systems such as Kafka, Spark, AWS Kinesis
  • Experience implementing clustered/ distributed/ multi-threaded infrastructure to support Machine Learning processing on Spark or Sagemaker
  • Proficient in SQL with some experience performing data transformations and data analysis using SQL
  • Comfortable in juggling multiple technologies and high priority tasks

Nice to Haves

  • Experience with Distributed columnar databases like Veritca, Greenplum, Redshift, or Snowflake
  • Ad tech experience


  • White glove service to help you upgrade your home office
  • 1 Mental Escape (ME) day each month to fully unplug and recharge
  • Work life balance, flexible PTO and competitive compensation packages
  • A generous learning stipend and other opportunities for professional development