Local LLM Service — Saiga Nemo 12B

Pet Project · Live

ML Engineer · DevOps · Solution Architect

Local Russian-language LLM deployed on 3× NVIDIA P104-100 mining GPUs (8 GB VRAM each) with tensor splitting and GGUF quantisation. Achieves 22–28 tokens/sec at 99%+ uptime on commodity hardware. Multi-platform architecture: web interface + Telegram bot with synchronised state.

Stack

Saiga Nemo 12B llama.cpp GGUF Q4_K_M NVIDIA P104-100 ×3 Tensor splitting Python FastAPI aiogram Docker

Performance

12B

Parameters

128K

Context

Tokens/sec

99%

Uptime

Interested in working together?

I'm open to freelance projects, full-time remote roles, and interesting collaborations. Let's talk.

Get in Touch