Software Engineer, Data Platform

2 months agoSan Francisco, CA


The data platform software engineer in SmartNews plays a key role in accelerating the products/business developments. Great efforts are paid on building a highly efficient and flexible data service for analytical and operational purposes. To serve the internal users from analytics and product-dev teams, the goal and mission of data engineers is to create high-level, easy-to-use data services for simplifying the accessing, integration and consolidation of various data sets, and also building the platforms for executing tasks processing massive data in terms of TB per day. Technology drives the growth of SmartNews, and thus we eagerly adopting cutting-edge technologies from the industry and academia especially, the open-source community. Responsibilities This is a hybrid of data engineering and system development role: Design, develop, setup and maintain new services, libraries, tools, frameworks for data processing or management, and investigate new algorithms to increase efficiency for Data Processing, such as ETL, Data Pipelines, OLAP DBMS, real-time messages and streams processing, data-sync between systems, etc. Develop tooling for system performance evaluation, monitoring and tuning of the data processing procedures or platforms, get insights of efficiency and stabilizability and make continuous improvement, such as optimizing distributed query engines, computing resource management and isolation, multi-tier storage systems, etc. Own and maintain the key data processing portfolios such as building and taking care of the environment, trouble-shooting and being responsible to the on-call of incidents. Work closely with data architecting/modeling roles to understand ways to implement the data service, and interact with Site Reliability Engineering (SRE) team to deploy the environments and drive production excellence. Devise system, tooling and approaches for data privacy and security. Establish access control, create processes to handle sensitive data. Diagnose and resolve complex technical challenges for data accessing or processing. Using elegant and systematic rather than ad-hoc methods to help other teams tuning the performance and improving stability.