ETL Developer Lead/Manager

Location: Washington DC Region
Category: Information Technology
Employment Type: Direct Hire

We are looking for a Senior Data Engineer/ Architect to join our team! This engineer will take part in developing and testing various ETL applications. Building these applications requires team work, and to deliver these solutions, the engineer will collaborate with an interdisciplinary team of experts in machine learning, data visualization & design, business process optimization, and software engineering.
In this role, you will be responsible for developing, maintaining, and testing data solutions with a wide variety of data platforms including relational databases, big data platforms and no-sql databases. You will develop various data ingestion & transformation routines to acquire data from external data sources, manage distributed crawlers to parse data from web sources, and develop APIs for secure exchange of data. You will be involved in securing access to the data based on appropriate rights, implementing data quality routines and mechanisms to flag bad data for correction, and building QA and automation frameworks to monitor daily ingestion of data and provide alerts on errors and other problems. Ideally, this person also has the ability to tune the ETL applications under various conditions using Spark.

Primary Responsibilities:

  • Write custom ETL applications using Spark in Python/Java that follow a standard architecture.
  • Contribute to designing of new applications, setting/changing standards and architecture, and deciding on usage of new technologies.
  • Success will be defined by the ability to meet requirements/acceptance criteria, delivery on-time, number of defects, and clear documentation.
  • Perform functional testing, end-to-end testing, performance testing, and UAT of these applications and code written by other members of the team.
  • Proper documentation of the test cases used during QA will be important for success.
  • Other important responsibilities include clear communication with team members as well as timely and thorough code reviews.


  • Python/Java - Python would be ideal but a solid knowledge of Java is also acceptable.
  • SQL - Writing SQL queries, Stored Procedures and Views
  • Linux - Common working knowledge, including navigating through the file system and simple bash scripting
  • Hadoop - Common working knowledge, including basic idea behind HDFS and map reduce, and hadoop fs commands.
  • Spark - how to work with RDDs and Data Frames (with emphasis on data frames) to query and perform data manipulation.
  • Source Control Management Tool - We use BitBucket
  • Writing code to parse JSON / HTML / Javascript etc.
  • ETL tools (SSIS, Informatica Power Center, Talend or Pentaho) - Familiarity
  • 7-12 years of experience with engineering data pipelines (Strong knowledge of what works and what doesn't. This includes common pitfalls and mistakes when designing a data pipeline.)
  • Deep experience manipulating relational databases (ex. SQL Server, PostgreSQL or MySQL)
  • Worked/developed in a Linux or Unix environment.
  • Worked in AWS (particularly EMR).
  • Has real hands-on experience developing applications or scripts for a Hadoop environment (Cloudera, Hortonworks, MapR, Apache Hadoop). By that, we mean someone who has written significant code for at least one of these Hadoop distributions.
  • Intellectual curiosity! We are always noodling on new problems. If you see yourself as a life-long learner who enjoys tackling new challenges, learning about new approaches and tools in your area of expertise, and learning from an interdisciplinary team that encourages you to stretch outside your comfort zone... you will find a home here : )
  • Passion. (What does this mean? To us, this means you care deeply about making an impact. You take ownership of your projects and bring a "founder's mentality" to your work.)

Back to top