Design and build distributed, scalable, and reliable data pipelines that ingest unstructured data and process it at scale and in real-time. Collaborate with other teams to design and develop data tools that support both operations and product use cases. Perform offline analysis of large data sets to derive business impact. Evaluate technologies and prototype solutions to improve our data processing architecture. Design and engineer data pipelines that preprocess, chunk, convert into vector and store into vector database. Requires a Bachelor's degree in Computer Science, Computer Engineering, or related field of study plus three (3) years of experience in the position offered or three (3) years of experience as an Analyst, Technology; Developer; or related occupation in the technology field. Requires three (3) years of experience with: databases including Snowflake; Python, Pyspark, GenAI (NLP/LLM) and cloud and on-prem technology solutions for storing unstructured data; engineering and maintaining real-time and batch data pipelines; Linux; and Git. Requires two (2) years of experience with: Generative AI Large Language Models including embedding models openai/text-em bedding-ada-002, openai/text-embedding-3-small and openai/text-embedding-3-large, and chat models openai/gpt-4 and openai/gpt-4-turbo; pre-processing and chunking techniques for long and short content including Fixed-size, Recursive, and Specialized chunking; converting data into embeddings and storing both the vector embeddings and text data in a vector database while injecting relevant metadata for optimal search and retrieval; building Retrieval Augmented Generative (RAG) applications to perform vector similarity searches; indexing strategies for vector database including HNSW and Flat methods; cloud platforms including Azure Ai; databases including Redis; implementing chat completions API for question and answer functionality; implementing vector and content storage API for different types of unstructured data including pdfs, videos, PPTs and audio; vector search and hybrid search; utilizing LLMs to develop data engineering solutions; designing and engineering pre-processing and chunking architecture and framework; and evaluating multiple vector storage technologies and prototype solutions to improve data processing architecture. Requires any amount of experience with: distributed systems including Hadoop and HDFS; Dataiku; cloud platforms including AWS; SQL; Teradata; TWS; ETL; and spark. Qualified Applicants: To apply, visit us at https://morganstanley.eightfold.ai/careers?source=mscom and enter JR000454 in the search field. No calls please. EOE Our values - putting clients first, doing the right thing, leading with exceptional ideas, committing to diversity and inclusion, and giving back - aren't just beliefs, they guide the decisions we make every day to do what's best for our clients, communities and more than 80,000 employees in 1,200 offices across 42 countries. Our teams are relentless collaborators and creative thinkers, fueled by their diverse backgrounds and experiences. We are proud to support our employees and their families at every point along their work-life journey, offering some of the most attractive and comprehensive employee benefits and perks in the industry. There's also ample opportunity to move about the business for those who show passion and grit in their work. Consequently, our recruiting efforts reflect our desire to attract and retain the best and brightest from all talent pools. We want to be the first choice for prospective employees. It is the policy of the Firm to ensure equal employment opportunity without discrimination or harassment on the basis of race, color, religion, creed, age, sex, sex stereotype, gender, gender identity or expression, transgender, sexual orientation, national origin, citizenship, disability, marital and civil partnership/union status, pregnancy, veteran or military service status, genetic information, or any other characteristic protected by law.
Want more jobs like this?
Get jobs delivered to your inbox every week.