Research Scientist, Virtual Humans - Speech Synthesis (PhD)
- Pittsburgh, PA
Facebook's mission is to give people the power to build community and bring the world closer together. Through our family of apps and services, we're building a different kind of company that connects billions of people around the world, gives them ways to share what matters most to them, and helps bring people closer together. Whether we're creating new products or helping a small business expand its reach, people at Facebook are builders at heart. Our global teams are constantly iterating, solving problems, and working together to empower people around the world to build community and connect in meaningful ways. Together, we can help people build stronger communities - we're just getting started.
Facebook Reality Labs (FRL) brings together a world-class R&D team of researchers, developers, and engineers to build the future of connection within virtual and augmented reality. We're developing all the technologies needed to enable breakthrough AR glasses and VR headsets. At the Pittsburgh lab, we aspire to a vision of social VR and AR, where people are able to interact with each other across distances in a way that is indistinguishable from in-person interactions. As a Research Scientist at FRL Pittsburgh, you will be solving challenges at the forefront of computer vision and machine learning. You will work alongside top researchers, engineers, and artists collaborating together on the innovation necessary to make that vision a reality. You will be expected to push forward the frontiers of research and publish your work at leading venues in the field. We want people who work well across disciplines, can brainstorm big ideas, and are excited to work in new technology areas. The ideal candidate will have research experience in machine learning applied to waveform/speech synthesis, acoustic modeling, and audio-visual learning. Knowledge of spatial audio processing is a plus.
- Prototype novel algorithms for speech synthesis leveraging deep learning and multimodal data captured using our state-of-the-art capture systems
- Develop robust algorithms and systems for integrating multiple sensors and modalities
- Collaborate with team members across a variety of domains including signal processing, acoustic engineering, computer vision and computer graphics
- Attend scientific conferences, publish and present papers in international conferences and journals
- Currently has, or is in the process of obtaining, a PhD degree or completing a postdoctoral assignment in the field of Computer Science, Machine Learning, Artificial Intelligence, or related field
- Research experience in one or more of the following areas: neural speech synthesis, multimodal learning, or similar
- 3+ years of experience in machine learning, audio processing, or computer vision
- 3+ years of experience in standard AI, CV and ML libraries, including PyTorch, Torch, TensorFlow, Keras, etc.
- Interpersonal experience: cross-group and cross-culture collaboration
- Must obtain work authorization in country of employment at the time of hire, and maintain ongoing work authorization during employment
- 3+ years of programming experience in C++ or Python
- Understanding of spatial audio, room acoustics, and multichannel audio array processing
- Proven track record of achieving significant results as demonstrated by grants, fellowships, patents, as well as first-authored publications at leading workshops or conferences such as CVPR, ECCV/ICCV, ICASSP, NeurIPS, SIGGRAPH, InterSpeech, ICLR, or similar
- Demonstrated software engineer experience via an internship, work experience, coding competitions, or widely used contributions in open source repositories (e.g. GitHub)
Back to top