CAI API Backend
The cai --api mode exposes a stateful HTTP backend built with FastAPI. It uses per-session agents to keep conversation state and REST routes to run REPL commands or send prompts to the model.
Start the server
cai --api --api-host 0.0.0.0 --api-port 8080
# If 8080 (or your chosen port) is busy, the server auto-picks
# the next free port and prints it in the console.
CLI flags and environment variables:
| Flag | Env | Description |
|---|---|---|
--api |
CAI_API_MODE |
Enable the HTTP backend. |
--api-host |
CAI_API_HOST |
Bind host/interface (default 127.0.0.1). |
--api-port |
CAI_API_PORT |
Bind port (default 8000). |
--api-reload |
CAI_API_RELOAD |
Dev autoreload. |
--api-workers |
CAI_API_WORKERS |
Worker processes (ignored with reload). |
Interactive docs at /api/docs and OpenAPI spec at /api/openapi.json.
Authentication
- The API uses the client’s
ALIAS_API_KEYas the secret. SetALIAS_API_KEYand send it in headerX-CAI-API-Key(customizable viaCAI_API_KEY_HEADER). - If
ALIAS_API_KEYis not set, the API is unprotected (local dev only). For compatibility,CAI_API_KEYis accepted as a fallback.
Verbose/auth logging
- Server logs level: set CAI_API_LOG_LEVEL to debug (or trace) before cai --api.
- Request logging (method/path/headers/body preview): CAI_API_LOG_REQUESTS=true.
- Authentication decisions (why 401): CAI_API_LOG_AUTH=true.
- Dev autoreload: CAI_API_RELOAD=true.
Example:
ALIAS_API_KEY="your_key" \
CAI_API_LOG_LEVEL=debug \
CAI_API_LOG_REQUESTS=true \
CAI_API_LOG_AUTH=true \
CAI_API_RELOAD=true \
cai --api --api-host 0.0.0.0 --api-port 8080
Content types
- JSON for request/response payloads.
- Server-Sent Events (SSE) for streaming endpoint (
text/event-stream).
Endpoints
Below are the endpoints with request/response examples and headers. For authenticated calls, include:
X-CAI-API-Key: $ALIAS_API_KEY
Quick index - GET /api/v1/health - GET /api/v1/commands - POST /api/v1/commands/{command} - POST /api/v1/sessions - GET /api/v1/sessions - GET /api/v1/sessions/{id} - DELETE /api/v1/sessions/{id} - POST /api/v1/sessions/{id}/reset - POST /api/v1/sessions/{id}/messages - POST /api/v1/sessions/{id}/messages/stream - GET /api/v1/sessions/{id}/history - POST /api/v1/sessions/{id}/interrupt - POST /api/v1/sessions/{id}/reload - GET /api/v1/agents - GET /api/v1/models - POST /api/v1/sessions/{id}/ux/final_message/stream_tokens - POST /api/v1/ux/title - POST /api/v1/ux/summarize
GET /api/v1/health
- Description: Liveness check. No auth required.
- Response 200:
{"status":"ok","version":"<semver or dev>"}
GET /api/v1/commands
- Description: List all REPL commands (names, aliases, subcommands).
- Headers:
X-CAI-API-Key - Response 200:
{
"commands": [
{"name":"/memory","description":"memory ops","aliases":[],"subcommands":["show"]},
{"name":"/help","description":"display help","aliases":["/h"],"subcommands":[]}
]
}
POST /api/v1/commands/{command}
- Description: Execute a REPL command.
- Headers:
X-CAI-API-Key,Content-Type: application/json - Body:
{"args": ["show"], "auto_correct": true}
- Response 200:
{"handled": true, "suggested_command": null, "stdout": "...", "stderr": "", "exit_code": null}
POST /api/v1/sessions
- Description: Create a new stateful session with its own agent instance and memory.
- Headers:
X-CAI-API-Key,Content-Type: application/json - Body:
{"agent": "redteam_agent", "model": "alias1", "stateful": true, "metadata": {}}
- Response 201 (SessionDetailModel): includes summary + empty history initially.
GET /api/v1/sessions
- Description: List active sessions (summaries).
- Headers:
X-CAI-API-Key - Response 200:
{"sessions": [{"id":"<uuid>","agent":"redteam_agent","model":"alias1","stateful":true,"history_length":0, "created_at":"...","updated_at":"...","metadata":{}}]}
GET /api/v1/sessions/{id}
- Description: Get session detail (summary + full history).
- Headers:
X-CAI-API-Key
DELETE /api/v1/sessions/{id}
- Description: Delete a session.
- Headers:
X-CAI-API-Key - Response: 204 No Content
POST /api/v1/sessions/{id}/reset
- Description: Reset the session agent and clear history.
- Headers:
X-CAI-API-Key - Response 200: SessionDetailModel
POST /api/v1/sessions/{id}/messages
- Description: Non-streamed inference. Runs the agent and returns the final result.
- Headers:
X-CAI-API-Key,Content-Type: application/json - Body (InferenceRequest):
{"input": "List current risks", "context": {"org": "acme"}, "max_turns": 8}
- Response 200 (InferenceResponse):
{
"session": {"id": "<uuid>", ...},
"result": {
"messages": [/* semantic items: messages, tool calls, outputs, ... */],
"history": [/* updated message list */],
"final_output": {/* typed final output if agent uses an output schema, else string */},
"text_output": "<assistant final text, if any>",
"input_guardrails": [],
"output_guardrails": []
}
}
POST /api/v1/sessions/{id}/messages/stream (SSE)
- Description: Stream high-level reasoning steps live (no token streaming) and a final summary. Under the hood the API performs non-streaming model calls and streams steps via server-side hooks (tools, handoffs, messages).
- Headers:
X-CAI-API-Key,Content-Type: application/json,Accept: text/event-stream - Body (InferenceRequest): same as non-streamed.
- Stream format: Server-Sent Events with two event types:
event: reasoning_step— One event per step with JSONdata(examples below).event: final— Final event with{ steps, final_message, final_output }.
Reasoning step payloads (no token deltas):
// Message generated by the assistant
{"type":"message","agent":"Red Team","text":"...full assistant message..."}
// Tool call
{"type":"tool_call","agent":"Red Team","tool":"nmap_scan","arguments":{"target":"10.0.0.5"}}
// Tool output
{"type":"tool_output","agent":"Red Team","output":"open ports: 22,80"}
// Agent switch (handoff)
{"type":"handoff","from_agent":"Coordinator","to_agent":"Exploiter"}
// Explicit agent switch signal
{"type":"agent_switched","agent":"Exploiter"}
Final event payload:
{
"steps": [ /* the same reasoning steps emitted during the stream */ ],
"final_message": "...last assistant message (if any)...",
"final_output": {/* structured output if present, else string/null */}
}
Example with curl (SSE):
curl -N \
-H "Accept: text/event-stream" \
-H "Content-Type: application/json" \
-H "X-CAI-API-Key: $ALIAS_API_KEY" \
-d '{"input": "List current risks"}' \
http://localhost:8080/api/v1/sessions/<SESSION_ID>/messages/stream
POST /api/v1/sessions/{id}/messages/stream_tokens (SSE)
- Description: Token-level streaming (plus reasoning steps). This endpoint enables provider streaming internally and emits token deltas as they arrive. Use this only if you need character/token granularity.
- Headers:
X-CAI-API-Key,Content-Type: application/json,Accept: text/event-stream - Body (InferenceRequest): same as non-streamed.
- Stream events:
event: tokenwith data{ "type": "token_delta", "text": "..." }for each emitted text delta.event: tokenwith data{ "type": "message_start" }and{ "type": "message_end" }to mark boundaries.event: reasoning_stepfor high-level steps (same schema as /messages/stream).event: finalwith the same summary payload as /messages/stream.
Notes - Token streaming can be quite chatty; ensure your client handles backpressure and uses streaming-friendly APIs. - For iOS, prefer URLSession streaming (see sample below); Safari’s EventSource cannot set custom headers.
curl example (tokens):
curl -N \
-H "Accept: text/event-stream" \
-H "Content-Type: application/json" \
-H "X-CAI-API-Key: $ALIAS_API_KEY" \
-d '{"input": "Write a haiku about ports"}' \
http://localhost:8080/api/v1/sessions/<SESSION_ID>/messages/stream_tokens
iOS (Swift) streaming example (tokens)
let sid = "<SESSION_ID>"
var req = URLRequest(url: URL(string: "http://127.0.0.1:8080/api/v1/sessions/\(sid)/messages/stream_tokens")!)
req.httpMethod = "POST"
req.addValue("text/event-stream", forHTTPHeaderField: "Accept")
req.addValue("application/json", forHTTPHeaderField: "Content-Type")
req.addValue(ProcessInfo.processInfo.environment["ALIAS_API_KEY"] ?? "", forHTTPHeaderField: "X-CAI-API-Key")
req.httpBody = try! JSONSerialization.data(withJSONObject: ["input": "Hi"], options: [])
let task = URLSession.shared.streamTask(with: req)
task.resume()
task.readData(ofMinLength: 1, maxLength: 8192, timeout: 0) { data, atEOF, error in
if let data = data, let s = String(data: data, encoding: .utf8) {
// parse SSE lines: event: <name> / data: <json>
print(s)
}
}
Implementation notes (for curious devs) - API streaming never enables OpenAI chat completions token streaming. Instead: - We run the agent with non-streaming model calls and emit events via RunHooks (tools start/end, handoffs, agent switches). - We add one message step after each assistant turn (full text, no token deltas). - This guarantees that model streaming is always off while still providing live step updates.
Schemas (request/response fields)
- HealthResponse
- status: string
-
version: string
-
CommandMetadata
- name: string (e.g., "/memory")
- description: string
- aliases: string[] (e.g., ["/h"])
-
subcommands: string[] (e.g., ["show"])
-
CommandsResponse
-
commands: CommandMetadata[]
-
CommandRequest
- args: string[] (optional)
-
auto_correct: boolean (default true)
-
CommandResponse
- handled: boolean
- suggested_command: string | null
- stdout: string
- stderr: string
-
exit_code: number | null
-
CreateSessionRequest
- agent: string (optional; default from CAI_AGENT_TYPE)
- model: string (optional; default from CAI_MODEL)
- stateful: boolean (default true)
-
metadata: object (optional)
-
SessionSummary
- id: string (UUID)
- agent: string
- model: string
- stateful: boolean
- created_at: ISO8601 string
- updated_at: ISO8601 string
- history_length: number
-
metadata: object
-
SessionDetail
- All SessionSummary fields, plus:
-
history: ResponseInputItem[] (OpenAI Responses input items list – user/system/assistant/tool items)
-
SessionsResponse
-
sessions: SessionSummary[]
-
InferenceRequest
- input: string | ResponseInputItem[]
- context: object (optional)
-
max_turns: number (optional)
-
RunResultPayload
- messages: Item[] (list of semantic items generated during the run; see below)
- history: ResponseInputItem[] (original input plus generated items, suitable to continue)
- final_output: any (typed result if the agent defines an output schema; otherwise text or null)
- text_output: string | null (last assistant text message, if any)
- input_guardrails: object[] (guardrail outputs for input)
- output_guardrails: object[] (guardrail outputs for final output)
Item: messages[] entry (non-streamed endpoint)
- Common envelope:
- type: string (e.g., "message_output_item", "tool_call_item", "tool_call_output_item", "handoff_output_item")
- agent: string | null (agent name that produced it)
- payload: object (raw Pydantic model dump for the underlying output/input item)
-
output: any (only present for tool_call_output_item; the structured tool return value)
-
message_output_item
- payload: ResponseOutputMessage (OpenAI Responses message with content array)
-
text extraction: text_output consolidates last text chunk
-
tool_call_item
- payload: ResponseFunctionToolCall | ResponseComputerToolCall | ResponseFileSearchToolCall
-
typical fields (function call): name, arguments
-
tool_call_output_item
-
output: any (decoded tool result)
-
handoff_output_item
- payload: handoff input item
- Includes implicit source/target agent names in the envelope (agent + payload content)
Streaming events (reasoning_step)
- Emitted from /messages/stream; one SSE per step.
- step.type values and fields:
- message
- agent: string | null
- text: string (full assistant message; no token deltas)
- tool_call
- agent: string | null
- tool: string (tool/function name)
- arguments: object | string (as available)
- tool_output
- agent: string | null
- output: any (structured tool output)
- handoff
- from_agent: string | null
- to_agent: string | null
- agent_switched
- agent: string | null (new active agent)
Final event (event: final) - steps: the array of emitted reasoning_step payloads - final_message: string | null - final_output: any
Errors and status codes
- 401 Unauthorized — missing/invalid
X-CAI-API-Keywhen auth is enabled - {"detail":"Invalid or missing API key"}
- 404 Not Found — e.g., unknown session id
- {"detail":"Session not found"}
- 422 Unprocessable Entity — malformed request body
- Standard FastAPI validation error
- 500 Internal Server Error — unexpected agent execution failure
- {"detail":"Agent execution failed: ..."}
Building a client (quick recipes)
Python (requests; SSE via iter_lines)
import json
import os
import requests
BASE = "http://127.0.0.1:8080/api/v1"
HEADERS = {"X-CAI-API-Key": os.environ.get("ALIAS_API_KEY", ""), "Content-Type": "application/json"}
# 1) Create session
sess = requests.post(f"{BASE}/sessions", headers=HEADERS, json={"agent":"redteam_agent","model":"alias1","stateful":True}).json()
sid = sess["id"]
# 2) Non-streamed
res = requests.post(f"{BASE}/sessions/{sid}/messages", headers=HEADERS, json={"input":"List current risks"}).json()
print(res["result"]["text_output"]) # final message
# 3) Streaming (SSE)
stream_headers = HEADERS | {"Accept": "text/event-stream"}
with requests.post(f"{BASE}/sessions/{sid}/messages/stream", headers=stream_headers, json={"input":"List current risks"}, stream=True) as r:
for line in r.iter_lines(decode_unicode=True):
if not line:
continue
if line.startswith("event:"):
evt = line.split(":", 1)[1].strip()
elif line.startswith("data:"):
data = json.loads(line.split(":", 1)[1].strip())
if evt == "reasoning_step":
print("step:", data)
elif evt == "final":
print("final:", data)
Node (browser/EventSource)
const key = process.env.ALIAS_API_KEY;
const sid = "<SESSION_ID>"; // create via POST /sessions
const es = new EventSource(`http://localhost:8080/api/v1/sessions/${sid}/messages/stream`, {
withCredentials: false
});
// Note: To send headers with SSE in the browser, proxy or use fetch+ReadableStream.
es.addEventListener('reasoning_step', ev => console.log('step', JSON.parse(ev.data)));
es.addEventListener('final', ev => console.log('final', JSON.parse(ev.data)));
Node (fetch + ReadableStream; set auth header)
import fetch from 'node-fetch';
const key = process.env.ALIAS_API_KEY;
const sid = process.env.SID;
const resp = await fetch(`http://localhost:8080/api/v1/sessions/${sid}/messages/stream`, {
method: 'POST',
headers: { 'Content-Type':'application/json', 'Accept':'text/event-stream', 'X-CAI-API-Key': key },
body: JSON.stringify({ input: 'List current risks' })
});
for await (const chunk of resp.body) {
const s = chunk.toString();
// parse SSE lines: event: <name> / data: <json>
process.stdout.write(s);
}
Best practices
- Always include Accept: text/event-stream for streaming.
- Expect multiple reasoning_step events, then exactly one final event.
- No token deltas are emitted; each message step contains the full assistant message text.
- Tool calls can be frequent; handle backpressure in your client.
- Keep your connection timeouts relaxed for long runs.
Request examples (quick copy/paste)
# Healthcheck
curl -s http://localhost:8080/api/v1/health
# List agents
curl -s -H "X-CAI-API-Key: $ALIAS_API_KEY" http://localhost:8080/api/v1/agents | jq .
# List models
curl -s -H "X-CAI-API-Key: $ALIAS_API_KEY" http://localhost:8080/api/v1/models | jq .
# List commands
curl -s -H "X-CAI-API-Key: $ALIAS_API_KEY" http://localhost:8080/api/v1/commands
# Run a command
curl -s -X POST http://localhost:8080/api/v1/commands/memory \
-H 'Content-Type: application/json' \
-H "X-CAI-API-Key: $ALIAS_API_KEY" \
-d '{"args": ["show"]}'
# Create a session
curl -s -X POST http://localhost:8080/api/v1/sessions \
-H 'Content-Type: application/json' \
-H "X-CAI-API-Key: $ALIAS_API_KEY" \
-d '{"agent": "redteam_agent", "model": "alias1", "stateful": true}'
# Interrupt and reload
curl -s -X POST -H "X-CAI-API-Key: $ALIAS_API_KEY" \
http://localhost:8080/api/v1/sessions/<SESSION_ID>/interrupt
curl -s -X POST -H "Content-Type: application/json" -H "X-CAI-API-Key: $ALIAS_API_KEY" \
-d '{"preserve_history": true}' \
http://localhost:8080/api/v1/sessions/<SESSION_ID>/reload
# Send a non-streamed prompt
curl -s -X POST http://localhost:8080/api/v1/sessions/<SESSION_ID>/messages \
-H 'Content-Type: application/json' \
-H "X-CAI-API-Key: $ALIAS_API_KEY" \
-d '{"input": "List current risks"}'
# Stream reasoning steps (SSE)
curl -N -X POST http://localhost:8080/api/v1/sessions/<SESSION_ID>/messages/stream \
-H 'Content-Type: application/json' \
-H 'Accept: text/event-stream' \
-H "X-CAI-API-Key: $ALIAS_API_KEY" \
-d '{"input": "List current risks"}'
# Reset and delete session
curl -s -X POST -H "X-CAI-API-Key: $ALIAS_API_KEY" http://localhost:8080/api/v1/sessions/<SESSION_ID>/reset
curl -s -X DELETE -H "X-CAI-API-Key: $ALIAS_API_KEY" http://localhost:8080/api/v1/sessions/<SESSION_ID>
Example CLIs
examples/cai_api_cli.py— minimal loop: prompts → responses.examples/cai_api_tester.py— interactive menu that covers all endpoints including streaming.
GET /api/v1/agents
- Description: List available agents and patterns in the runtime (from
cai.agents). - Headers:
X-CAI-API-Key - Response 200 (AgentsResponse):
{
"agents": [
{
"name": "redteam_agent",
"description": "...",
"type": "agent",
"pattern_type": null,
"tools": [
{"name": "nmap_scan", "description": "Scan a host or subnet"},
{"name": "http_get", "description": "Fetch a URL"}
]
},
{
"name": "swarm_pattern",
"description": "Swarm agentic pattern",
"type": "pattern",
"pattern_type": "swarm",
"tools": []
}
]
}
GET /api/v1/models
- Description: List known models by combining predefined model catalog and
pricings/pricing.jsonif present. - Headers:
X-CAI-API-Key - Response 200 (ModelsResponse):
{
"models": [
{
"name": "alias1",
"provider": "OpenAI",
"category": "Alias",
"description": "Best model for Cybersecurity AI tasks",
"input_cost": 0.50,
"output_cost": 0.50,
"pricing": {
"input_cost_per_token": 0.000005,
"output_cost_per_token": 0.000005,
"max_tokens": 128000,
"max_input_tokens": 200000,
"max_output_tokens": 128000,
"supports_function_calling": true,
"supports_vision": true,
"supports_response_schema": true,
"supports_tool_choice": true
}
}
]
}
POST /api/v1/sessions/{id}/interrupt
- Description: Interrupt the currently running work (if any) for the given session. Cancels the active server-side run task.
- Headers:
X-CAI-API-Key - Response 200:
{"interrupted": true}
POST /api/v1/sessions/{id}/reload
- Description: Recreate the session’s agent. Optionally preserve message history.
- Headers:
X-CAI-API-Key,Content-Type: application/json - Body:
{"preserve_history": true}
- Response 200: SessionDetailModel
POST /api/v1/sessions/{id}/ux/final_message/stream_tokens (SSE)
- Description: Stream a final assistant message (token-level) that explains to the user what just happened. Your app calls this after a task completes, sending a prompt (tone/instructions) and optionally the steps you observed client-side; if you omit steps, the backend uses server-side steps.
- Headers:
X-CAI-API-Key,Content-Type: application/json,Accept: text/event-stream - Body (FinalMessageRequest):
{
"prompt": "Explain to the user what we found and next steps.",
"steps": [ /* optional: client-collected steps; otherwise server uses session.last_steps */ ],
"include_history": true,
"max_turns": 8
}
- Stream events:
event: tokenwith{ "type": "message_start" }event: tokenwith{ "type": "token_delta", "text": "..." }repeatedevent: tokenwith{ "type": "message_end" }event: reasoning_stepmay appear if the UX agent emits stepsevent: finalwith{ "steps": [...], "final_message": "...", "final_output": ... }
Notes for iOS
POST /api/v1/ux/title
- Description: Genera un título conciso mediante una única tool call en el modelo
alias1vía LiteLLM. No usa sesiones. - Headers:
X-CAI-API-Key,Content-Type: application/json - Body:
{
"messages": [
{"role": "user", "content": "Analiza CVE-2024-..."}
],
"title_hint": "(opcional)"
}
- Response 200:
{"title": "Analizando CVE-2024-..."}
POST /api/v1/ux/summarize
- Description: Devuelve un resumen en una línea usando una única tool call en
alias1vía LiteLLM. No usa sesiones. - Headers:
X-CAI-API-Key,Content-Type: application/json - Body:
{
"messages": [
{"role": "user", "content": "Escanea 10.0.0.5"}
],
"steps": [
{"type": "tool_call", "agent": "Red Team", "tool": "nmap_scan", "arguments": {"target": "10.0.0.5"}},
{"type": "tool_output", "agent": "Red Team"}
],
"max_len": 100
}
- Response 200:
{"summary_text": "Tool output procesado por Red Team"}
Implementation notes
- Ambos endpoints fuerzan tool_choice: required con una única función produce_title_and_summary y usan siempre model: alias1 con api_base Alias y ALIAS_API_KEY.
- El servidor no almacena ni lee estado de sesión.
- Call this to stream the “final message” of a task. Use a UX prompt tuned to your voice (“Explain briefly in a friendly tone, with next steps”).
- If you already collected steps client-side, pass them; otherwise the backend uses session.last_steps.
- Render arriving token_delta chunks into the chat bubble; close on message_end/final.