Senior Data Engineer

PulsePoint Data Engineering team plays a key role in our technology company that’s experiencing exponential growth. Our data pipeline processes over 45 billion impressions a day (> 18TB of data, 160 TB uncompressed). This data is used to generate reports, update budgets, and drive our optimization engines. We do all this while running against extremely tight SLAs and provide stats and reports as close to real-time as possible.

The most exciting part about working at PulsePoint is the enormous potential for personal and professional growth. We are always seeking new and better tools to help us meet challenges such as adopting proven open-source technologies to make our data infrastructure more nimble, scalable and robust.  Some of the cutting edge technologies we have recently implemented are Kafka, Spark Streaming, Docker and Mesos.  

What you'll be doing:

  • Design, build and maintain reliable and scalable enterprise level distributed transactional data processing systems for scaling the existing business and supporting new business initiatives
  • Optimize jobs to utilize Kafka, Hadoop, Vertica, Spark Streaming and Mesos resources in the most efficient way
  • Monitor and provide transparency into data quality across systems (accuracy, consistency, completeness, etc)
  • Increase accessibility and effectiveness of data (work with analysts, data scientists, and developers to build/deploy tools and datasets that fit their use cases)
  • Collaborate within a small team with diverse technology backgrounds
  • Provide mentorship and guidance to junior team members

 

Team Responsibilities:

  • Installation, upkeep, maintenance and monitoring of Kafka, Hadoop, Vertica, RDBMS
  • Ingest, validate and process internal & third party data
  • Create, maintain and monitor data flows in Hive, SQL and Vertica for consistency, accuracy and lag time
  • Maintain and enhance framework for jobs(primarily aggregate jobs in Hive) 
  • Create different consumers for data in Kafka such as flafka for Hadoop, flume for Vertica and Spark Streaming for near time aggregation
  • Train Developers/Analysts on tools to pull data
  • Tool evaluation/selection/implementation
  • Backups/Retention/High Availability/Capacity Planning
  • Disaster Recovery- We have all our core data services in another Data Center for complete business continuity
  • Review/Approval - DDL for database, Hive Framework jobs and Spark Streaming to make sure they meet our standards
  • 24*7 On call rotation for Production support

Technologies We Use:

  • Chronos - for job scheduling
  • Docker - Packaged container image with all dependencies
  • Graphite/Beacon - for monitoring data flows
  • Hive - SQL data warehouse layer for data in HDFS
  • Impala- faster SQL layer on top of Hive
  • Kafka- distributed commit log storage 
  • Marathon – cluster wide init for Docker Containers
  • Mesos - Distributed cluster resource manager
  • Spark Streaming - Near time aggregation
  • SQL Server - Reliable OLTP RDBMS 
  • Sqoop - Import/Export data to RDBMS
  • Vertica - fast parallel data warehouse 

Required Skills:

  • BA/BS degree in Computer science or related field
  • 5+ years of software engineering experience
  • Knowledge and exposure to distributed production systems i.e Hadoop is a huge plus
  • Proficiency in Linux
  • Fluency in Python, Experience in Scala/Java is a huge plus
  • Strong understanding of RDBMS, SQL;
  • Passion for engineering and computer science around data
  • Willingness to participate in 24x7 on-call rotation

What you’ll get:

  • Sane work hours (with flexible scheduling)
  • Competitive Salary & 401K Plan Match
  • Generous paid vacation (we consider your birthday a holiday)
  • Sabbatical at 5 years of employment
  • Health & Wellness Fairs
  • The opportunity to partake in our Office Fitness Shape-Up Program
  • Professional training and industry membership access
  • Annual Company Retreat
  • Complimentary membership to local programs like NYC CitiBike
  • Corporate Discount to New York Sports Club (NYSC)
  • Free team lunches twice a month
  • Team happy hours and beer-o-clock Fridays
  • Awesome snacks: drink bar, coffee bar, ice cream bar, candy bar & fruit bar
  • The opportunity to join our Company Basketball Team
  • Indoor dart wars, Ping-Pong Tournaments, walking desks, annual office olympics 

 

Want to peek inside the Pulsepoint offices? Check it out here: https://www.themuse.com/companies/pulsepoint 

 

 

 


Meet Some of PulsePoint's Employees

Elizabeth P.

Director, Business Development & Operations, Digital Health Solutions

Elizabeth oversees business development and operations for PulsePoint Digital Health Solutions. She works with advertisers, finding new and innovative ways to target patients, physicians, and insurers in the healthcare industry.

Greg B.

Technical Account Manager

Greg’s role sits at the intersection of business and technology. He’s responsible for overseeing the needs of the Sales, Development, and Engineering Teams.


Back to top