Getting Started with GPU

This is the standard vAquila setup when your host has NVIDIA GPU support available for Docker.

Prerequisites

Docker Desktop or Docker Engine
NVIDIA drivers
Docker GPU support (--gpus all)

Configure

Set a daemon-readable Hugging Face cache path:

VAQ_HF_CACHE_HOST_PATH=/absolute/path/to/huggingface/cache
VAQ_VLLM_IMAGE=vllm/vllm-openai:latest
VAQ_VLLM_CPU_IMAGE=vllm/vllm-openai-cpu:latest

Use the official latest image directly

docker pull ghcr.io/xschahl/vaquila:latest
docker run --rm ghcr.io/xschahl/vaquila:latest --help

CLI with GPU

docker run --rm \
  --gpus all \
  -e VAQ_HF_CACHE_HOST_PATH=/absolute/path/to/huggingface/cache \
  -v /absolute/path/to/huggingface/cache:/root/.cache/huggingface \
  ghcr.io/xschahl/vaquila:latest \
  run Qwen/Qwen3-0.6B --gpu 0 --port 8000

Web UI with GPU

Use this mode when you want GPU telemetry in the UI and the ability to launch GPU-backed models from the Web UI:

docker run --rm \
  --gpus all \
  -p 8787:8787 \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -e VAQ_HF_CACHE_HOST_PATH=/absolute/path/to/huggingface/cache \
  -e VAQ_VLLM_IMAGE=vllm/vllm-openai:latest \
  -e VAQ_VLLM_CPU_IMAGE=vllm/vllm-openai-cpu:latest \
  -e NVIDIA_VISIBLE_DEVICES=all \
  -e NVIDIA_DRIVER_CAPABILITIES=compute,utility \
  -v /absolute/path/to/huggingface/cache:/root/.cache/huggingface \
  ghcr.io/xschahl/vaquila:latest \
  ui --host 0.0.0.0 --port 8787

Then open http://localhost:8787 in your browser.

Use the latest image as a Dockerfile base

ARG VAQ_IMAGE=ghcr.io/xschahl/vaquila:latest
FROM ${VAQ_IMAGE}

Build it:

docker build -t vaquila-local .

Run the CLI:

docker run --rm vaquila-local --help

Run the Web UI:

docker run --rm -p 8787:8787 vaquila-local ui --host 0.0.0.0 --port 8787

Use Docker Compose with the latest image

Compose for CLI usage

services:
  vaq:
    image: ghcr.io/xschahl/vaquila:latest
    volumes:
      - ${VAQ_HF_CACHE_HOST_PATH}:/root/.cache/huggingface
    environment:
      VAQ_HF_CACHE_HOST_PATH: ${VAQ_HF_CACHE_HOST_PATH}
      VAQ_VLLM_IMAGE: vllm/vllm-openai:latest
      VAQ_VLLM_CPU_IMAGE: vllm/vllm-openai-cpu:latest

Run a GPU launch:

docker compose run --rm vaq run Qwen/Qwen3-0.6B --gpu 0 --port 8000

Compose for Web UI with GPU

services:
  vaq-webui:
    image: ghcr.io/xschahl/vaquila:latest
    gpus: all
    command: ["ui", "--host", "0.0.0.0", "--port", "8787"]
    ports:
      - "8787:8787"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ${VAQ_HF_CACHE_HOST_PATH}:/root/.cache/huggingface
    environment:
      VAQ_HF_CACHE_HOST_PATH: ${VAQ_HF_CACHE_HOST_PATH}
      VAQ_VLLM_IMAGE: vllm/vllm-openai:latest
      VAQ_VLLM_CPU_IMAGE: vllm/vllm-openai-cpu:latest
      NVIDIA_VISIBLE_DEVICES: all
      NVIDIA_DRIVER_CAPABILITIES: compute,utility

Start the UI:

docker compose up -d

Example directories

docs/examples/ghcr/docker-compose.yml
docs/examples/ghcr/Dockerfile
docs/examples/webui/docker-compose.yml

Prerequisites​

Configure​

Use the official latest image directly​

CLI with GPU​

Web UI with GPU​

Use the latest image as a Dockerfile base​

Use Docker Compose with the latest image​

Compose for CLI usage​

Compose for Web UI with GPU​

Example directories​