Research Areas

Exploring the frontiers of AI through applied research and innovation

Generative Speech Modeling

Researching novel approaches to neural speech synthesis with emotional expressiveness and natural prosody.

Zero-shot voice cloningEmotion-aware TTSCross-lingual synthesis
3 publications

Agentic AI Systems

Developing autonomous LLM agents with reasoning, memory, and tool-use capabilities.

Multi-step reasoningMemory architecturesFunction calling
2 publications

Multimodal Representation Learning

Unified representations across speech, text, and visual modalities.

Cross-modal alignmentJoint embeddingsMultimodal reasoning
1 publications

Current Research Focus

DIS-Vector Framework

Open-source framework for extracting speaker embeddings, emotional features, and prosodic patterns from minimal audio samples.

Real-time Conversational Agents

End-to-end systems combining streaming ASR, LLM reasoning, and expressive TTS for human-like interaction.

Few-shot Voice Adaptation

Techniques for adapting TTS systems to new voices with under 5 seconds of audio.

Emotion-aware Synthesis

Modeling prosody and emotional expression in neural TTS systems.