Skip to content

Ollama Cloud

Run large language models without local GPU using Ollama's cloud service.

Quick Start

1. Get API Key

  • Create account at ollama.com
  • Generate API key from your profile

2. Configure .env

OLLAMA_API_KEY=your_api_key_here
OLLAMA_API_BASE=https://ollama.com
CAI_MODEL=ollama_cloud/gpt-oss:120b

3. Run

cai

Available Models

View in CAI with /model-show under "Ollama Cloud" category:

  • ollama_cloud/gpt-oss:120b - General purpose 120B model
  • ollama_cloud/llama3.3:70b - Llama 3.3 70B
  • ollama_cloud/qwen2.5:72b - Qwen 2.5 72B
  • ollama_cloud/deepseek-v3:671b - DeepSeek V3 671B

More models at ollama.com/library.

Model Selection

# By name
CAI> /model ollama_cloud/gpt-oss:120b

# By number (after /model-show)
CAI> /model 3

Local vs Cloud

Feature Local Cloud
Prefix ollama/ ollama_cloud/
API Key Not required Required
Endpoint http://localhost:8000/v1 https://ollama.com/v1
GPU Required Not required

Troubleshooting

Unauthorized error: Verify OLLAMA_API_KEY is set correctly

Path not found: Ensure OLLAMA_API_BASE=https://ollama.com (without /v1)

Model not listed: Check model prefix is ollama_cloud/, not ollama/

Validation

Test connection with curl:

curl https://ollama.com/v1/chat/completions \
  -H "Authorization: Bearer $OLLAMA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-oss:120b", "messages": [{"role": "user", "content": "test"}]}'

References