Skip to main contentA logo with &quat;the muse&quat; in dark blue text.
Hearst Magazines

Senior Data Engineer

Position at Corporate Technology

The Senior Data Engineer works within the ML/AI Group and is responsible for the ongoing development of the Data & Machine Learning Platform.

This role will participate in all aspects of the data application life cycle, including analysis, design, development, testing, production deployment and support.

The role will help formulate approaches to improve existing processes, develop opportunities and present innovative solutions on cutting-edge technologies. You will create big data accelerators to help deploy scalable solutions fast.

The Senior Data Engineer will work with Data Architects and Data Scientists to evolve the DATA/ML platform, while delivering both strategic and tactical projects to provide iterative value through impactful business outcomes. This individual should be highly self-motivated with a strong problem-solving and analytical nature, love of learning, passion for insight, and desire for achieving results.

Want more jobs like this?

Get Software Engineer jobs delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.

Duties and responsibilities

  • Design and build data processing pipelines for structured and unstructured data using tools and frameworks in the big data ecosystem
  • Architect, write code, complete programming and perform testing and debugging of data applications
  • Design and develop new systems and tools in a number of languages to facilitate effective data ingestion and curation; batch, scalable event frameworks, streaming and real-time data analytic pipelines
  • Build data APIs and data delivery services that support critical operational and analytical applications for our internal business operations, customers and partners.
  • Implement and configure big data technologies as well as tune processes for performance at scale
  • Design, develop and maintain conceptual, logical and physical data models for big data systems through schema on write and schema on read delivery.
  • Perform data analysis with business, understanding business process and structuring both relational and distributed data sets.
  • Build and provision analytical sandboxes for data scientists and analysts. Work with data science team to bring machine learning models into production
  • Conduct timely and effective research in response to specific requests (e.g. data collection, summarization, analysis, and synthesis of relevant data and information)
  • Devops for unit, functional, integration and regression test plans. Communicate with QA and port data engineering test scripts to QA team
  • Evaluate, benchmark and integrate cutting edge open source data tools and technologies
  • Monitor performance of the data platform and optimize as needed
  • Work with Machine Learning and Deep Learning pipelines
  • Work with diverse data including, Images, Words, Video, Audio

Qualifications

  • Proficient understanding of distributed computing principles
  • Programming development on Hadoop, include understanding of Zookeeper, Oozie and Yarn
  • Working knowledge of HDFS storage formats (such as Parquet, ORC and ORC with snappy compression)
  • Familiarity with networking and network application programming, including HTTP/HTTPS, JSON, and REST APIs
  • Experience with at least one scripting language (ex: Python) and one object oriented language (ex: Java)
  • Experience building ETL flows in Spark, Pyspark Experience in a production setting is preferred
  • Knowledge of SQL-on-hadoop engines (presto, polybase, phoenix, etc)
  • Experience deploying with Containers and virtual environments
  • Ability to solve complex problems in a fast-paced environment with limited guidance
  • An eye for quality and a willingness to do what is necessary to achieve deadlines in a dynamic environment with frequent priority changes
  • Able to work efficiently in teams and/or as an individual
  • Top-notch oral and written communication skills, especially the ability to make complex data and data science techniques accessible to non-technical team members
  • B.S./M.S. in Computer Science or a related field, or equivalent experience

Desired Skills:

  • Unix/Linux OS level knowledge with Bash/Shell scripting is a plus
  • Data Analysis: Understand business processes, logical data models and relational database implementations
  • Expert knowledge in SQL. Optimize complex queries.
  • Administration of Hadoop cluster, with all included services and ability to solve any performance and work load management issues on the cluster
  • Experience with writing Kafka producers and consumers is a plus
  • Experience with Spark Streaming, Spark SQL in a production setting is a plus

A Great Employee:

  • Exhibits courage
  • Navigates ambiguity
  • Relies on common sense and experience
  • Leverages their educational background
  • Communicates openly
  • Likes to laugh
  • Loves a good challenge
  • Is resilient when faced with setbacks
  • Teams well with others
  • Insists on quality results
  • Is forward-thinking
  • Adapts easily to change
  • Solves problems creatively
Job ID: 6db240e8f10141cbfb526c6a6bf6406d
Employment Type: Other

This job is no longer available.

Search all jobs