Data Engineer

Data Engineer

New York, NY. United States

Every day, thousands of professionals use Work Market to support the on-demand labor needs of large companies. We produce iceberg-sized data and – if you’ll excuse the metaphor – we’re only at the tip of understanding it all.

 

In this role, you’ll work in our data team with our engineering team to help the livelihoods of thousands of independent workers every day and the companies that hire them. Your primary focus will be around building and maintaining our data systems, servers, and information tools that power these efforts, both public- and private-facing.

 

Many data engineering projects are open-ended and ongoing, others are incredibly specific and discrete. Above all, we’re looking for candidates who like applying elegant and strategic solutions to complex data and system management problems.

Role Responsibilities

  • Design, build, maintain, and improve:
    • ETL systems
    • home-grown and open-source information tools
    • 3rd-party integration tools and services
  • Design and deploy data models/mappings that apply to both research- and development-scale efforts
  • Help make raw datasets more usable and accessible via APIs and well-structured data
  • Partner with engineering and product to execute data-related product initiatives
  • Other projects as needed

Qualifications

  • 2-3+ years experience as a Data Engineer (or similar educational/professional experience)
  • Proficiency in SQL, preferably across a number of dialects (we commonly write PostgreSQL and MySQL)
  • Version control and shell commands (git, github, SSH, *nix,  etc.)
  • Solid experience with Python (or another ETL/pipeline/system)
  • Ability to learn and assimilate technical information quickly
  • Track record of shipping work in a timely manner

Bonus points for

  • Python for Data Analysis (pandas, scipy, etc.) and/or R
  • Python for ETL (Luigi, Airflow, etc.)
  • Comfort with the AWS ecosystem (EC2, S3, RDS, Redshift, et al)
  • Experience with…
    • BI tools (Tableau, Looker, Mode, et al)
    • Building BI tools (e.g. using D3 to make data pretty)
    • Data warehousing (Stars and Snowflakes!)
    • Spark and/or other cluster-/HDFS-style deployments
  • Knowledge of SaaS-based business
  • Enterprise-grade data and systems knowledge
  • Tolerance for the occasional data cleansing




Application Challenges

Pick one (or more, if you’re feeling ambitious) of the following challenges and submit your results (and supporting explanation/code) along with your resume. We also encourage you to submit links or attachments with other work that you’d like to show off:

 

1. Counting Vitae

Take the plaintext version of your resume (or linkedin profile) and create a bar chart of character frequency, stripping out punctuation and whitespace.

 

e.g. Results should look something like the following image, assuming your resume solely contains the words “Work Market”. Improvements encouraged.

 

2. Disaster Planning

Find the data set (linked below) from NOAA.

https://www1.ncdc.noaa.gov/pub/data/swdi/stormevents/csvfiles/

 

Necessary Deliverables

  1. Write a script that takes the above URL as a shell parameter and combines all of the Storm Event Details into a single CSV or relational database table of your choosing (mysql, postgres, sqlite if your computer can handle it…).
  2. Write a script/function that summarizes the property damage, by state, by year, into another csv or table.

 

Optional embellishments

  1. Write a script/small web app that publishes a JSON/XML of the summary data.
  2. Show off some D3 (or similar or static) visualization.
  3. Tell us where you would build a house if you want to minimize property damage.

 

In your submission, please include ONLY your written scripts, not the source data.

 

3. Bad Zips

Find the data set (linked below) from Data.gov.

https://catalog.data.gov/dataset/consumer-complaint-database

 

Write a script that:

  1. Normalizes the postal code format.
    1. Put all non-conformant postal codes into a file paired with their `complaint_ID`
  2. Filters by complaints reported in or after 2016.
  3. Writes back a new csv with records grouped by their zip (for example: Number of Complaints By Zip, Most Common Complaint Type By Zip, etc).


In your submission, please include ONLY your written scripts, not the source data.


Meet Some of Work Market's Employees

Steve N.

Director Of Product Management

As Director of Product Management, Steve acts as a bridge between the Platform Services Team and the Engineering Team, overseeing big-picture projects, like allocating additional resources for freelancers.

Marqia W.

Senior Front-end Engineer

Marqia’s role at Work Market is a hybrid of traditional front-end engineering and user experience design. She works closely with the Product Team to build the best website for freelancers.


Back to top