Research Areas

Exploring the frontiers of AI through applied research and innovation

Researching novel approaches to neural speech synthesis with emotional expressiveness and natural prosody.

Zero-shot voice cloningEmotion-aware TTSCross-lingual synthesis

3 publications

Developing autonomous LLM agents with reasoning, memory, and tool-use capabilities.

Multi-step reasoningMemory architecturesFunction calling

2 publications

Unified representations across speech, text, and visual modalities.

Cross-modal alignmentJoint embeddingsMultimodal reasoning

1 publications

Current Research Focus

Open-source framework for extracting speaker embeddings, emotional features, and prosodic patterns from minimal audio samples.

End-to-end systems combining streaming ASR, LLM reasoning, and expressive TTS for human-like interaction.

Techniques for adapting TTS systems to new voices with under 5 seconds of audio.

Modeling prosody and emotional expression in neural TTS systems.