Introduction
At IBM, work is more than a job - it's a calling: To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate. Not just to do something better, but to attempt things you've never thought possible. Are you ready to lead in this new era of technology and solve some of the world's most challenging problems? If so, lets talk.
Your Role and Responsibilities
As a Research Scientist on our Data and Model Factory team, your role will not just be limited to answering questions but also to questioning answers. We're not merely looking for people who can do the job; we're seeking those who can redefine it. You will:
Lead the development of groundbreaking machine learning techniques, with a focus on large language models (LLMs).
Want more jobs like this?
Get Science and Engineering jobs in Cambridge, MA delivered to your inbox every week.
Design and implement large scale experiments for alignment research, aiming to understand and improve model behaviors at scale.
Investigate the applicability and performance of LLMs and generative models across diverse tasks.
Analyze and manage extensive datasets, translating complex interpretability experiments into actionable insights through high-quality visualizations.
Work collaboratively with peers across IBM departments and researchers at MIT, sharing your expertise and absorbing theirs, to drive a unified research agenda that benefits the entire organization.
Required Technical and Professional Expertise
Perform GenAI research that leads to something more than papers
Lead model architecture effort
Mixture of experts and/or other sparse architecture for LLMs
Expansible LLM architecture for life-long learning
Model fusion for better LLM open-source strategies
Preferred Technical and Professional Expertise
A PhD in Machine Learning, specifically in generative modeling or continual learning.
A strong track record of publications in large-scale generative modeling, representation learning, and continual learning.
Hands-on experience in training large language models, preferably within multi-node cluster environments.
Technical Familiarity: Knowledge of alignment training frameworks, such as deepspeed chat, demonstrating a comprehensive understanding of current LLM training methodologies.