AI/ML Lead Data Engineer - Automation/Image Processing
Join us as we embark on a journey of collaboration and innovation, where your unique skills and talents will be valued and celebrated. Together we will create a brighter future and make a meaningful difference.
As a Lead Data Engineer at JPMorganChase within the Commercial & Investment Bank, you are an integral part of an agile team that works to enhance, build, and deliver data collection, storage, access, and analytics solutions in a secure, stable, and scalable way. As a core technical contributor, you are responsible for maintaining critical data pipelines and architectures across multiple technical areas within various business functions in support of the firm's business objectives.
Job responsibilities
- Design, build, and maintain scalable, high-performance data pipelines and infrastructure to support ingestion, processing, and storage of large volumes of scanned document images across enterprise-wide workflows
- Architect end-to-end data solutions on AWS cloud services to enable seamless flow of scanned images from source systems through OCR processing, model inference, and downstream data extraction and categorization pipelines
- Develop robust image preprocessing and OCR integration pipelines that handle TIF/PNG format conversion, normalization, resolution enhancement, noise reduction, and batching to prepare scanned documents for downstream computer vision and OCR models
- Build and optimize data pipelines that integrate OCR engine outputs, extracting structured text and metadata from scanned images and routing them into databases and analytics platforms for further processing
- Design and manage data storage architectures and containerized deployments, using Oracle databases and AWS-native stores (S3, EFS) to efficiently catalog, index, and retrieve extracted text, classification labels, and metadata from processed document images
- Drive the adoption of containerized deployment strategies using AWS EKS (Elastic Kubernetes Service) to deploy and scale image processing microservices, OCR engines, and data pipeline components with high availability and fault tolerance
- Collaborate closely with data scientists and ML engineers to ensure training datasets for different models, and other computer vision models are properly curated, versioned, labeled, and accessible through well-structured data pipelines
- Evaluate and integrate emerging data technologies and tools to continuously improve pipeline throughput, reduce processing latency for high-volume document scanning workloads, and optimize cost efficiency
- Establish and enforce data quality, lineage, governance, and security frameworks to ensure traceability and integrity of extracted data from scanned documents throughout the entire processing lifecycle
- Partner with security and compliance teams to ensure that scanned document data, extracted PII/PHI, and sensitive content are handled in accordance with regulatory requirements, encryption standards, and access controls
- Lead and mentor a team of data engineers, establishing coding standards, peer review processes, CI/CD workflows, and best practices for building production-grade image and document processing pipelines
Required qualifications, capabilities, and skills
Want more jobs like this?
Get Data and Analytics jobs in Tampa, FL delivered to your inbox every week.

- Formal training or certification on Data Engineering concepts and 5+ years applied experience
- Strong proficiency in Java, Groovy, and Python for building data pipelines, image preprocessing workflows, automation scripts, and backend data services
- Hands-on experience with image file handling, particularly TIF/PNG format processing, multi-page document splitting, format conversion, and integration with OCR and computer vision pipelines
- Deep hands-on experience with AWS cloud services including S3 (for image storage), Lambda, Step Functions, and CloudWatch for building and monitoring scalable data workflows
- Expertise in AWS EKS (Elastic Kubernetes Service) for deploying and managing containerized image processing, OCR, and data pipeline services using Docker and Kubernetes
- Advanced knowledge of Oracle databases including PL/SQL, performance tuning, partitioning strategies, and data modeling for storing and querying large volumes of extracted document data and classification results
- Familiarity with OCR technologies and the ability to build data pipelines that consume and structure OCR output for downstream analytics and model training
- Understanding of data requirements for training deep learning models including dataset preparation, annotation management, and feature store integration
- Experience with CI/CD pipelines (Jenkins) and infrastructure-as-code tools (Terraform, CloudFormation) for automated deployment and environment management
- Strong understanding of data governance, data quality frameworks, metadata management, and data cataloging, particularly in the context of document-centric and image-heavy data ecosystems
- Excellent leadership, communication, and stakeholder management skills with the ability to drive technical decisions across cross-functional teams
Preferred qualifications, capabilities, and skills
- Domain expertise in the healthcare industry
ABOUT US
JPMorganChase, one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world's most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management.
We offer a competitive total rewards package including base salary determined based on the role, experience, skill set and location. Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility. These benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup childcare, tuition reimbursement, mental health support, financial coaching and more. Additional details about total compensation and benefits will be provided during the hiring process.
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation.
JPMorgan Chase & Co. is an Equal Opportunity Employer, including Disability/Veterans
ABOUT THE TEAM
J.P. Morgan's Commercial & Investment Bank is a global leader across banking, markets, securities services and payments. Corporations, governments and institutions throughout the world entrust us with their business in more than 100 countries. The Commercial & Investment Bank provides strategic advice, raises capital, manages risk and extends liquidity in markets around the world.
Perks and Benefits
Health and Wellness
Parental Benefits
Work Flexibility
Office Life and Perks
Vacation and Time Off
Financial and Retirement
Professional Development
Diversity and Inclusion