Speech Processing Intern - Speech Enhancement

Affectiva is an MIT Media Lab spin-off focused understanding human emotion. Our vision is that technology needs the ability to sense, adapt and respond to not just commands but also non-verbal signals. We are building artificial emotional intelligence (Emotion AI).

As you can imagine, such an ambitious vision takes a great team with a strong desire to explore and innovate. We are growing our team to improve and expand our core technologies and help solve many unique and interesting problems focused around sensing, understanding and adapting to human emotion.

Our first technology measures human emotion through sensing and analyzing facial expressions. This technology is already being used commercially in a number of different verticals and use cases, and has been released to the public in the form of SDKs so that developers around the world can begin to use it to create a new breed of apps, websites and experiences. Currently, we are extending our emotion sensing technology beyond the face to analyze human speech. Our goal is to build out our technology to perform emotion sensing unimodally from speech, as well as multi-modally from speech and facial expressions when both channels are present.

This position is on the Science team, the team tasked with creating and refining Affectiva’s technology. We are a group of individuals with backgrounds in machine learning, computer vision, speech processing and affective computing.

We’re looking for a summer intern to work on speech enhancement and source separation with a goal to improve emotion recognition in a noisy acoustic environment. The candidate will work closely with the lead speech scientist and other members of the science team to implement classic strategies in these areas, as well explore novel data driven strategies.


  • Implement existing speech enhancement  and speaker diarization algorithms and test its effect on emotion recognition from speech.
  • Prototype new data-driven technology / methodologies and evaluate technical feasibility.
  • Design novel data collection or annotation strategies to build a corpus for robust learning.
  • Clearly communicate your implementations, experiments, and conclusions.


  • Pursuing graduate degree (MS or PhD) in Electrical Engineering, Computer Science, or Mathematics with specialization in speech recognition, or signal processing.
  • Hands-on experience in speech enhancement and/or speaker diarization on real-world data.
    • Knowledgeable about beamforming, multichannel acoustic echo cancellation, blind source separation, source localization, dereverberation, and robust speech recognition in noisy environment.
    • Experience with robust speech recognition in reverberant and noisy environment using ASR engines such as HTK or Kaldi
    • Explored recent techniques such as non-negative matrix factorization and underdetermined blind source separation/extraction
  • Strong publication record in journals/proceedings such as ICASSP, NIPS, PAMI, InterSpeech.
  • Expertise in Python or C/C++.
  • Experience working with deep learning models (RNN, LSTM, CNN) a plus.

Meet Some of Affectiva's Employees

Abdelrahman M.

SDK Technical Lead

Abdelrahman builds machines that sense emotions and expressions for Affectiva’s software. He also helps create SDKs so any developer can integrate emotion recognition into their applications.

Brett R.

HR Manager

Brett dabbles in all areas of administrative management and personnel to keep the current team supported while she hires new employees to add to the mix.

Back to top