Skip to content

About

Senior engineer with 12+ years shipping production systems in mobile, web, and cloud. I design and run reliable LLM services with clear SLOs, CI/CD, MLflow registries, cost controls, and real observability.

Currently building personal R&D projects to learn enterprise MLOps patterns: Threads-Agent (GenAI platform), ROI-Agent (multimodal automation), and Achievement Collector. I bring production discipline from EPAM/GlobalLogic to AI platforms.

Key Achievements

10M+

Requests/month processed by deployed systems

45%

Average cost reduction achieved for clients

99.9%

Uptime maintained across production deployments

<2min

Model rollback time in production

Experience Highlights

LLM Production Systems

Deployed and scaled large language models serving millions of users with sub-300ms p95 latency

MLOps Infrastructure

Built end-to-end ML pipelines with automated training, evaluation, and deployment using MLflow

Cost Optimization

Reduced token costs by 30-50% through vLLM deployment and intelligent caching strategies

Enterprise Integration

Implemented RAG systems and AI solutions for Fortune 500 companies with SOC 2 compliance

Technical Expertise

LLM Deployment

  • • vLLM, TGI, Ollama
  • • Model quantization & optimization
  • • Multi-GPU scaling
  • • Inference cost analysis

MLOps & Infrastructure

  • • MLflow, Kubeflow, DVC
  • • Kubernetes, Docker
  • • CI/CD with GitHub Actions
  • • Monitoring with Prometheus/Grafana

AI/ML Frameworks

  • • PyTorch, HuggingFace
  • • LangChain, LlamaIndex
  • • Vector databases (Pinecone, Weaviate)
  • • FastAPI, Redis, PostgreSQL

Ready to Work Together?

Download my detailed resume or get in touch to discuss how I can help optimize your AI/ML systems for production scale.