Ollama in my homelab


Using Ollama and open WebUI I've harnessed the power of a repurposed Geforce 980GTX Ti graphics card in my Proxmox server, confining AI processing inside my controlled environment, without relying on cloud services. Also the power usage is only 70 W idle and about 300 W while processing prompts. What OpenAI or Anthropic uses on a prompt from a single user I don't know, but its most probably much more.

By leveraging modern quantization techniques with Q4 (parameters of 4 bits), I've successfully run large language models (LLMs) with up to 8B or 9B parameters - previously unimaginable on a machine with "just" 6GB GPU RAM.

I've only tested a few models so far, but they seem like being up to the task for code discussions and code generation. Below is an image of the start screen. Llama3.1 8B is quite good at both knowledge and coding.

Open WebUI