Available for opportunities

AI / ML Engineer

Building |

4+ years designing and deploying production AI systems - from sub-500ms voice pipelines to enterprise RAG platforms serving millions of queries. Specialized in Speech AI, LLMs, and Vision-Language Models.

PyTorchXTTS-v2vLLMFastAPIAI AgentsRAG SystemsMulti-Agent Systems

View My Projects Download CV Get In Touch

By the numbers

Delivered at Production Scale

AI Systems Built

GitHub Stars

Countries Deployed

Years in AI / ML

Core competencies

Technical Depth

Built through 4+ years of hands-on research and production deployments

Speech AI

End-to-end voice intelligence pipeline design

TTS (XTTS-v2, StyleTTS2, Bark)92%

ASR (Whisper, wav2vec2)90%

Voice Cloning & Speaker Embed87%

Prosody & Emotion Control80%

LLMs & GenAI

Production-grade language model deployment

LLaMA-3 / Mistral Fine-tuning90%

RAG & Vector Databases92%

Agentic AI (LangGraph, CrewAI)85%

Prompt Engineering & Evals88%

Vision AI

Multimodal perception & scene understanding

Object Detection (YOLO, RT-DETR)85%

VLMs (LLaVA, InternVL)82%

Image Segmentation (SAM)80%

Video Analytics Pipeline78%

MLOps & Infra

Scalable model serving at production scale

FastAPI / Streaming APIs93%

Docker / Kubernetes82%

AWS SageMaker / GCP Vertex78%

WebSockets & Real-Time88%

Tools & Frameworks

PyTorchHuggingFaceLangChainLangGraphLlamaIndexQdrantWeaviateFAISSFastAPITriton InferenceONNXTensorRTDockerKubernetesAWS SageMakerGCP Vertex AIRay ServeRedisPostgreSQLNeo4j

Portfolio

Production AI Systems

Real-world AI systems with measurable impact - not demos, not notebooks

Open Source

DIS-Vector

Open-Source Voice Intelligence Framework

End-to-end speaker embedding extraction, verification, and voice cloning. Achieves 97.3% speaker verification accuracy using contrastive learning on 10K+ hours of speech data.

PyTorchWhisperXTTS-v2FAISSFastAPI

97.3% speaker verification acc.

View Details

Production

Conversational AI Agent

Streaming ASR → LLM → TTS Pipeline

Sub-500ms end-to-end latency conversational agent. Full-duplex voice streaming with LLaMA-3 reasoning, intent detection, and XTTS neural speech synthesis over WebSockets.

LLaMA-3FastAPIWebSocketsLangGraphRedis

<500ms end-to-end latency

View Details

Live Demo

Multimodal Vision RAG

Vision-Language Retrieval System

CLIP-powered multimodal embedding pipeline with LLaVA reasoning. Processes images + text queries against a Qdrant vector store for enterprise document intelligence.

CLIPLLaVAQdrantYOLOTransformers

2.1M vectors indexed

View Details

Deployed

Multilingual TTS Engine

Low-Latency Synthesis at Scale

Production TTS engine supporting 12 languages with sub-200ms first-token latency. Custom prosody model trained on 5K hours of studio-quality speech.

StyleTTS2BarkCoquiTritonONNX

<200ms first-token latency

View Details

Enterprise

Enterprise RAG Platform

Knowledge Graph + Vector Hybrid

Hybrid search RAG with knowledge graph traversal. Combines dense retrieval (BGE-M3) with Neo4j graph reasoning for 40% improved answer relevance over naive RAG.

LangChainWeaviateBGE-M3GPT-4o

+40% relevance vs baseline RAG

View Details

Production

Real-Time Object Tracker

Edge-Optimized Vision Pipeline

ONNX-quantized YOLOv9 + ByteTrack deployed on edge hardware at 30fps. Used in retail analytics for customer flow mapping across 50+ stores.

YOLOv9ONNXByteTrackTensorRTOpenCV

30fps on edge hardware

View Details

See All Projects

Let's build something great

Looking to hire an AI Engineer?

I'm open to full-time roles and contract engagements in AI/ML, Speech AI, and LLM infrastructure.

Start a Conversation View on GitHub