Local LLM VRAM Calculator

Check if your GPU can run a local LLM. Select model, quantisation, and GPU — get instant VRAM estimates.

Most popular — best speed/quality

5124K32K128K
It fits!

Comfortable fit — room for other applications

Model weights 4.8 GB
KV cache (4,096 ctx) 0.6 GB
Runtime overhead 0.5 GB
Total VRAM needed 5.9 GB
GPU VRAM available 16 GB
0 GB 37% used 16 GB

Formula

VRAM ≈ (params × bits_per_param / 8)
     + KV_cache(ctx_len, params)
     + overhead(~0.5-1.0 GB)
🧠

Want a self-hosted AI stack?

Our Local AI Stack includes Ollama, FastAPI wrapper, model management, and production deployment configs.

View Product →