Machine Learning Intern - Unsupervised audio-visual data collection and annotation

Affectiva is an MIT Media Lab spin-off focused on understanding human emotion. Our vision is that technology needs the ability to sense, adapt and respond to not just commands but also non-verbal signals. We are building artificial emotional intelligence (Emotion AI).

As you can imagine, such an ambitious vision takes a great team with a strong desire to explore and innovate. We are growing our team to improve and expand our core technologies and help solve many unique and interesting problems focused around sensing, understanding and adapting to human emotion.

Our first technology measures human emotion through sensing and analyzing facial expressions. This technology is already being used commercially in a number of different verticals and use cases, and has been released to the public in the form of SDKs so that developers around the world can begin to use it to create a new breed of apps, websites and experiences. Currently, we are extending our emotion sensing technology beyond the face to leverage human speech. Our goal is to build out our technology to perform emotion sensing multi-modally from speech and facial expressions when both channels are present, and unimodally when one of the channels of information is missing.

One of the main challenges to doing multi-modal emotion sensing is the lack of annotated audio-video data. Currently, there exist fairly small audio-visual datasets that are annotated with emotion labels (e.g., RECOLA, IEMOCAP, etc). However, to build multi-modal datasets that generalize across language, gender, culture and other aspects of human life that affect emotional presentation, we intend to build large emotion-annotated datasets of publicly available videos of affective affective interactions. Our approach to developing these large annotated datasets will include semi-supervised and unsupervised techniques that leverage the recent advances in deep learning.

We are interested in hiring a summer intern with interest, experience and expertise in either of two broad areas: 1) collecting large datasets of affective interactions and 2) automatically annotating the dataset with emotion tags. We are very interested in candidates who have hands-on experience tackling these subproblems; for example, if you have collected audio-video data either via crowdsourcing tasks or by leveraging the large quantities of user-generated tags (e.g., hashtags) available on the public web; or used learning based approaches for automatic data annotation, such as bootstrapping labels from one channel to another parallel channel; autonomous learning, collaborative learning, or other innovative semi-supervised and unsupervised approaches.

The candidate will work closely with members of the Science team, the team tasked with creating and refining Affectiva’s technology. The Science team is a group of individuals with backgrounds in machine learning, computer vision, speech processing and affective computing. The Science team does everything from initial prototyping of state-of-the art algorithms to producing models which can be included in our cloud and mobile products.


  • Running a multitude of data annotation experiments related to
    • Bootstrapping labels from video to audio channel using Affectiva’s face-based classifiers
    • Autonomous learning paired with collaborative learning based approaches
    • Explore other weakly supervised or unsupervised approaches
  • Design, implement and evaluate crowdsourcing tasks for collecting datasets of affective interactions
  • Clearly communicate your implementations, experiments, and conclusions.


  • Pursuing graduate degree (MS or PhD) in Electrical Engineering or Computer Science, with specialization in speech processing or computer vision.
  • Hands-on experience developing methodologies for automatic data acquisition and data annotation problems.
  • Experience using deep learning techniques (CNN, RNN, LSTM), on computer vision tasks or speech processing tasks.
  • Experience working with deep learning frameworks (e.g. TensorFlow, Theano, Caffe) including implementing custom layers
  • Strong publication record in machine learning, speech or computer vision related journals/proceedings
  • Good presentation and communication skills

Meet Some of Affectiva's Employees

Abdelrahman M.

SDK Technical Lead

Abdelrahman builds machines that sense emotions and expressions for Affectiva’s software. He also helps create SDKs so any developer can integrate emotion recognition into their applications.

Brett R.

HR Manager

Brett dabbles in all areas of administrative management and personnel to keep the current team supported while she hires new employees to add to the mix.

Back to top