Self-hosted API gateway for Ollama with multi-user support, API key management, rate limiting, and real-time usage tracking.
# Clone and run with Docker $ git clone https://github.com/edoardoted99/llamapass.git $ cd llamapass $ cp .env.example .env $ docker compose up --build # Create your first admin user $ docker compose exec web python manage.py createsuperuser # Ready at http://localhost:8000
A complete gateway between your users and Ollama
Create, revoke, and manage keys with expiration, per-key model restrictions, and rate limits.
30-day analytics with charts for requests, tokens, latency, errors, and model breakdown per key.
Configurable per-key limits backed by Redis. Live monitoring shows how close each key is to its limit.
Transparent async proxy to Ollama. Full streaming support for chat and generate endpoints.
Test Chat, Generate, and Embeddings endpoints directly from the browser. No curl needed.
Works with OpenAI SDKs out of the box. Just point your base URL and use your LLamaPass API key.
Use any HTTP client or the OpenAI SDK
curl https://llamapass.org/ollama/api/chat \
-H "Authorization: Bearer oah_your_key" \
-d '{
"model": "gemma3:1b",
"messages": [
{"role": "user", "content": "Hello!"}
],
"stream": false
}'
from openai import OpenAI
client = OpenAI(
base_url="https://llamapass.org/ollama/v1",
api_key="oah_your_key",
)
response = client.chat.completions.create(
model="gemma3:1b",
messages=[
{"role": "user", "content": "Hello!"}
],
)
print(response.choices[0].message.content)
Deploy in minutes. Self-hosted. Open source.