About
Senior engineer with 12+ years shipping production systems in mobile, web, and cloud. I design and run reliable LLM services with clear SLOs, CI/CD, MLflow registries, cost controls, and real observability.
Currently building personal R&D projects to learn enterprise MLOps patterns: Threads-Agent (GenAI platform), ROI-Agent (multimodal automation), and Achievement Collector. I bring production discipline from EPAM/GlobalLogic to AI platforms.
Key Achievements
Requests/month processed by deployed systems
Average cost reduction achieved for clients
Uptime maintained across production deployments
Model rollback time in production
Experience Highlights
Deployed and scaled large language models serving millions of users with sub-300ms p95 latency
Built end-to-end ML pipelines with automated training, evaluation, and deployment using MLflow
Reduced token costs by 30-50% through vLLM deployment and intelligent caching strategies
Implemented RAG systems and AI solutions for Fortune 500 companies with SOC 2 compliance
LLM Deployment
- • vLLM, TGI, Ollama
- • Model quantization & optimization
- • Multi-GPU scaling
- • Inference cost analysis
MLOps & Infrastructure
- • MLflow, Kubeflow, DVC
- • Kubernetes, Docker
- • CI/CD with GitHub Actions
- • Monitoring with Prometheus/Grafana
AI/ML Frameworks
- • PyTorch, HuggingFace
- • LangChain, LlamaIndex
- • Vector databases (Pinecone, Weaviate)
- • FastAPI, Redis, PostgreSQL
Ready to Work Together?
Download my detailed resume or get in touch to discuss how I can help optimize your AI/ML systems for production scale.