← Back to Projects
<project>
Pet Project · Live
Local LLM Service — Saiga Nemo 12B
ML Engineer · DevOps · Solution Architect
Local Russian-language LLM deployed on 3× NVIDIA P104-100 mining GPUs (8 GB VRAM each) with tensor splitting and GGUF quantisation. Achieves 22–28 tokens/sec at 99%+ uptime on commodity hardware. Multi-platform architecture: web interface + Telegram bot with synchronised state.
Stack
Saiga Nemo 12B
llama.cpp
GGUF Q4_K_M
NVIDIA P104-100 ×3
Tensor splitting
Python
FastAPI
aiogram
Docker
Performance
12B
Parameters
128K
Context
28
Tokens/sec
99%
Uptime
Interested in working together?
I'm open to freelance projects, full-time remote roles, and interesting collaborations. Let's talk.
Get in Touch