Available for opportunities

AI / ML Engineer

Building |

4+ years designing and deploying production AI systems - from sub-500ms voice pipelines to enterprise RAG platforms serving millions of queries. Specialized in Speech AI, LLMs, and Vision-Language Models.

PyTorchXTTS-v2vLLMFastAPIAI AgentsRAG SystemsMulti-Agent Systems

By the numbers

Delivered at Production Scale

0+
AI Systems Built
0+
GitHub Stars
0+
Countries Deployed
0+
Years in AI / ML

Core competencies

Technical Depth

Built through 4+ years of hands-on research and production deployments

Speech AI

End-to-end voice intelligence pipeline design

TTS (XTTS-v2, StyleTTS2, Bark)92%
ASR (Whisper, wav2vec2)90%
Voice Cloning & Speaker Embed87%
Prosody & Emotion Control80%

LLMs & GenAI

Production-grade language model deployment

LLaMA-3 / Mistral Fine-tuning90%
RAG & Vector Databases92%
Agentic AI (LangGraph, CrewAI)85%
Prompt Engineering & Evals88%

Vision AI

Multimodal perception & scene understanding

Object Detection (YOLO, RT-DETR)85%
VLMs (LLaVA, InternVL)82%
Image Segmentation (SAM)80%
Video Analytics Pipeline78%

MLOps & Infra

Scalable model serving at production scale

FastAPI / Streaming APIs93%
Docker / Kubernetes82%
AWS SageMaker / GCP Vertex78%
WebSockets & Real-Time88%

Tools & Frameworks

PyTorchHuggingFaceLangChainLangGraphLlamaIndexQdrantWeaviateFAISSFastAPITriton InferenceONNXTensorRTDockerKubernetesAWS SageMakerGCP Vertex AIRay ServeRedisPostgreSQLNeo4j

Portfolio

Production AI Systems

Real-world AI systems with measurable impact - not demos, not notebooks

Open Source

DIS-Vector

Open-Source Voice Intelligence Framework

End-to-end speaker embedding extraction, verification, and voice cloning. Achieves 97.3% speaker verification accuracy using contrastive learning on 10K+ hours of speech data.

PyTorchWhisperXTTS-v2FAISSFastAPI
97.3% speaker verification acc.
View Details
Production

Conversational AI Agent

Streaming ASR → LLM → TTS Pipeline

Sub-500ms end-to-end latency conversational agent. Full-duplex voice streaming with LLaMA-3 reasoning, intent detection, and XTTS neural speech synthesis over WebSockets.

LLaMA-3FastAPIWebSocketsLangGraphRedis
<500ms end-to-end latency
View Details
Live Demo

Multimodal Vision RAG

Vision-Language Retrieval System

CLIP-powered multimodal embedding pipeline with LLaVA reasoning. Processes images + text queries against a Qdrant vector store for enterprise document intelligence.

CLIPLLaVAQdrantYOLOTransformers
2.1M vectors indexed
View Details
Deployed

Multilingual TTS Engine

Low-Latency Synthesis at Scale

Production TTS engine supporting 12 languages with sub-200ms first-token latency. Custom prosody model trained on 5K hours of studio-quality speech.

StyleTTS2BarkCoquiTritonONNX
<200ms first-token latency
View Details
Enterprise

Enterprise RAG Platform

Knowledge Graph + Vector Hybrid

Hybrid search RAG with knowledge graph traversal. Combines dense retrieval (BGE-M3) with Neo4j graph reasoning for 40% improved answer relevance over naive RAG.

LangChainWeaviateBGE-M3GPT-4o
+40% relevance vs baseline RAG
View Details
Production

Real-Time Object Tracker

Edge-Optimized Vision Pipeline

ONNX-quantized YOLOv9 + ByteTrack deployed on edge hardware at 30fps. Used in retail analytics for customer flow mapping across 50+ stores.

YOLOv9ONNXByteTrackTensorRTOpenCV
30fps on edge hardware
View Details

Let's build something great

Looking to hire an AI Engineer?

I'm open to full-time roles and contract engagements in AI/ML, Speech AI, and LLM infrastructure.