Software Engineer (Acoustic Modeling), Oculus

At Oculus, we're developing the future of Augmented Reality (AR), Virtual Reality (VR) and Mixed Reality (MR). We are currently seeking a talented Machine Learning Engineer for the Human Understanding Pittsburgh team. Human Understanding is tasked with building next generation technology and experiences centered around Virtual Humans, self-presence and social presence achieved through tracking, capturing & reconstruction of faces, eyes, hands and body. Our team is addressing a variety of technical challenges in the areas of real-time 3D hand, face, eye and body tracking, leveraging deep convolutional and recurrent neural networks. The right candidate thrives in a cross-functional environment as the team work in close collaboration with Research, Hardware, Firmware and Software teams in multiple geographic locations. This position involves working with early technology paths, with a focus on execution and working collaboratively to manage system maturation/risk, while providing clear and effective communication across the team.
We're looking for candidates who share a passion for exploring and solving complex, unsolved problems at the intersection of VR and machine learning. This position is based in Pittsburgh.


  • Work closely with speech recognition and computer vision experts and researchers on implementing algorithms
  • Optimizing acoustic model training procedures for using large amounts of audio data in different languages
  • Adapt machine learning techniques from Vision, Speech, & Graphics to our work
  • Apply software development skills to a wide range of ML-related projects
  • M.S. or Ph.D. in Computer Science, Electrical Engineering, or similar
  • Experience working collaboratively in cross-functional teams
  • Track record of increasing responsibilities and decision-making experience
  • 3+ years of experience developing deep learning architectures in Python and software development in C/C++
  • Experience with time-series modeling (convolutional networks, auto-regressive models, RNNs), and have experience working on large quantities of data.
  • Experience with self-supervised or unsupervised machine learning models for speech or vision applications.
  • Experience modeling multiple sensing modalities.

Back to top