Senior Research Engineer, Neural Text-to-Speech (TTS)
- Paris, France
At SoundHound Inc., we believe every brand should have a voice. As the leading innovator of conversational technologies, we’re trusted by top brands around the globe. Houndify, our independent Voice AI platform, with 70,000+ users, allows brands to create custom voice assistants that deliver results with unprecedented speed and accuracy.
Our mission is to enable humans to interact with the things around them in the same way we interact with each other: by speaking naturally. We’re making that a reality through our SoundHound music discovery app and Hound voice assistant and through our strategic partnerships with brands like Mercedes-Benz, Hyundai, Deutsche Telekom, and Pandora. Today, our customized voice AI solutions allow people to talk to phones, cars, smart speakers, mobile apps, coffee machines, and every other part of the emerging ‘voice-first’ world.
Our diverse team of engineers, UX/UI designers, writers, data scientists and linguists are all passionate about creating a world with more conversations. With more than 14 years of expertise in voice technology, we have hundreds of millions of end users, and a worldwide team in six countries building solutions for a voice-first world.
About the Role:
- Conduct research and development of neural TTS systems
- Improve the stability and reliability of neural TTS generation
- Innovate on methods of emphasis and prosody control
- Build systems for multiple languages
- Opportunity to work on production deployment of TTS
- Opportunity to attend the relevant deep learning conferences and publish
- Success for this role: build world class production neural TTS
- 5+ years of relevant industry experience
- MS / PhD in Computer Science, Electrical Engineering, Physics, or equivalent
- Experience with designing, developing, and training deep neural networks
- Experience with a deep learning library such as PyTorch, TF, Keras, etc.
- Strong programming skills on Linux using Python and/or C++
- Fluency in written and spoken English
Nice to Haves:
- Previous work in text to speech, either in building neural TTS or in concatenative / unit-selection / parametric approaches
- Experience with automatic speech recognition (ASR) such as with neural acoustic or language modeling
- Published work in Deep Learning
Back to top