Research Areas
Exploring the frontiers of AI through applied research and innovation
Generative Speech Modeling
Researching novel approaches to neural speech synthesis with emotional expressiveness and natural prosody.
Zero-shot voice cloningEmotion-aware TTSCross-lingual synthesis
3 publications
Agentic AI Systems
Developing autonomous LLM agents with reasoning, memory, and tool-use capabilities.
Multi-step reasoningMemory architecturesFunction calling
2 publications
Multimodal Representation Learning
Unified representations across speech, text, and visual modalities.
Cross-modal alignmentJoint embeddingsMultimodal reasoning
1 publications
Current Research Focus
DIS-Vector Framework
Open-source framework for extracting speaker embeddings, emotional features, and prosodic patterns from minimal audio samples.
Real-time Conversational Agents
End-to-end systems combining streaming ASR, LLM reasoning, and expressive TTS for human-like interaction.
Few-shot Voice Adaptation
Techniques for adapting TTS systems to new voices with under 5 seconds of audio.
Emotion-aware Synthesis
Modeling prosody and emotional expression in neural TTS systems.