Member-only story
At the “Interspeech” conference, which is one of the largest and most prominent conferences in the field of speech and language processing, various tasks related to speech technology are commonly presented and discussed. Here are some of the most frequent speech-related tasks covered at Interspeech:
1. Automatic Speech Recognition (ASR)
— End-to-End ASR: End-to-end models that directly map speech to text using deep learning architectures like transformers, sequence-to-sequence models, or RNNs.
— Multilingual ASR: Building ASR systems that can handle multiple languages, often addressing low-resource languages.
— Robust ASR: Systems designed to perform well in noisy environments or with accents, dialects, and spontaneous speech.
2. Speech Synthesis (Text-to-Speech — TTS)
— Neural TTS*: Neural network-based models like WaveNet, Tacotron, and FastSpeech for generating high-quality synthetic speech.
— Multilingual and Emotion TTS: Systems that can synthesize speech in different languages or with varying emotional expressions.
— Low-resource TTS: Developing TTS systems for languages with limited training data.
3. Speech Enhancement
— Noise Suppression: Techniques for improving speech intelligibility by removing background noise.
— Dereverberation: Reducing reverberation effects, particularly in environments with…