LLM Council

The LLM Council is a multi-model consensus system that queries multiple AI models, has them evaluate each other's responses, and synthesizes a final answer through a chairman. This approach improves accuracy for complex decisions by leveraging diverse model perspectives.

How It Works

The council operates in three stages:

User Query: "What's the best approach for this security task?"
│
├─ Stage 1: Council Members (parallel)
│  ├─ Agent + gpt-4o           → Text response only
│  ├─ Agent + gpt-5            → Text response only
│  └─ Agent + claude-sonnet-4-5 → Text response only
│
├─ Stage 2: Rankings (parallel)
│  ├─ Agent + gpt-4o           → Ranks anonymized responses
│  ├─ Agent + gpt-5            → Ranks anonymized responses
│  └─ Agent + claude-sonnet-4-5 → Ranks anonymized responses
│
└─ Stage 3: Chairman
   └─ Active Agent + TOOLS
      ├─ Synthesizes best answer
      └─ Can execute operations if requested

Key Points:

Stage 1 & 2: Council members provide text-only responses (no tool execution)
Stage 3: The chairman (your active agent) can use all available tools
All members use the current agent's instructions and context

Quick Start

# Configure council models
export CAI_COUNCIL="gpt-4o,gpt-5,claude-sonnet-4-20250514"

# In CAI REPL, load an agent first
CAI> /agent redteam

# Use the council command
CAI> /council What are the best practices for API security?

# Or use the short alias
CAI> /c How should I approach this vulnerability assessment?

Configuration

Environment Variables

Variable	Description	Default
`CAI_COUNCIL`	Comma-separated list of council member models	`gpt-4o,gpt-4o-mini`
`CAI_COUNCIL_AUTO`	Auto-convene setting: `false`, `true`/`1`, or interval number	`false`
`CAI_COUNCIL_PROMPT`	Custom prompt for auto-council reviews	See below
`CAI_COUNCIL_DEBUG`	Enable debug output (`1`, `true`, `yes`)	`false`

# Example configuration
export CAI_COUNCIL="gpt-4o,gpt-5,claude-sonnet-4-20250514"
export CAI_COUNCIL_AUTO="5"
export CAI_COUNCIL_PROMPT="Review the current progress and recommend the best approach."
export CAI_COUNCIL_DEBUG="1"

API Keys

Ensure you have the appropriate API keys set for your council models:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export ALIAS_API_KEY="..."

Verified Model Names

Use exact model names as shown in the /model command:

Provider	Models
OpenAI	`gpt-5`, `gpt-4o`, `gpt-4o-mini`, `o3-mini`
Anthropic	`claude-sonnet-4-20250514`, `claude-3-5-sonnet-20240620`
Alias	`alias1`
DeepSeek	`deepseek-v3`, `deepseek-r1`

Manual Usage

The /council command (alias /c) invokes the council manually:

# Load an agent
CAI> /agent redteam

# Ask the council
CAI> /council What vulnerabilities should I look for in this web application?

The council uses the active agent's:

Instructions/system prompt
Available tools (chairman only)
Guardrails

Auto-Council Mode

When CAI_COUNCIL_AUTO is enabled, the council convenes automatically at specified intervals during agent execution.

Configuration Options

false - Never auto-convene (use /council manually)
true or 1 - Convene at every agent interaction
5, 10, etc. - Convene every N interactions

Example: Every Interaction

export CAI_COUNCIL_AUTO="1"

CAI> run ps aux, then analyze the results, then check for vulnerabilities

🏛️ COUNCIL (auto-invoked at interaction [1])
[Stage 1, 2, 3 run...]
[1] Agent: "I'll run ps aux" → executes command

🏛️ COUNCIL (auto-invoked at interaction [2])
[Stage 1, 2, 3 run...]
[2] Agent: "Analyzing results..." → analyzes output

🏛️ COUNCIL (auto-invoked at interaction [3])
[Stage 1, 2, 3 run...]
[3] Agent: "Checking for vulnerabilities..." → performs check

Example: Every 5 Interactions

export CAI_COUNCIL_AUTO="5"

CAI> perform a comprehensive security audit

[1] Agent executes first task
[2] Agent executes second task
[3] Agent executes third task
[4] Agent executes fourth task

🏛️ COUNCIL (auto-invoked at interaction [5])
[Stage 1, 2, 3 run...]
[5] Agent executes fifth task

[6] Agent continues...
[7] Agent continues...
[8] Agent continues...
[9] Agent continues...

🏛️ COUNCIL (auto-invoked at interaction [10])
[Stage 1, 2, 3 run...]
[10] Agent executes tenth task

Programmatic Usage

You can use the council directly in Python code:

from cai.council import run_full_council_agents, CouncilAgentConfig
from cai.sdk.agents import Agent

# With an existing agent
stage1, stage2, stage3, metadata = await run_full_council_agents(
    base_agent=my_agent,
    user_query="Your question here",
)

# Access results
print(stage3["response"])  # Final answer
print(metadata["aggregate_rankings"])  # Model rankings
print(metadata["council_cost"])  # Total cost
print(metadata["council_input_tokens"])  # Input tokens
print(metadata["council_output_tokens"])  # Output tokens

Return Values

stage1_results: List[Dict]  # Individual responses from each model
stage2_results: List[Dict]  # Rankings from each model
stage3_result: Dict         # Final synthesized answer
metadata: Dict              # Rankings, cost, tokens

Metadata Structure

metadata = {
    "aggregate_rankings": [
        {"model": "gpt-4o", "average_rank": 1.33, "rankings_count": 3},
        {"model": "gpt-5", "average_rank": 2.0, "rankings_count": 3},
    ],
    "council_cost": 0.032,
    "council_input_tokens": 5000,
    "council_output_tokens": 2500,
}

Visual Display

During execution, the council shows an animated panel with progress:

╭───────────────────────  Alias Council  ────────────────────────╮
│                                                                 │
│  👑 Chairman: Red Team Agent (gpt-4o)                          │
│                                                                 │
│  ⠋ Stage 1: Collecting responses from council members          │
│      ██████████░░░░░░░░░░ 2/3                                  │
│       ✓   gpt-4o                                               │
│       ✓   gpt-5                                                │
│       ⠋   alias1                                               │
│                                                                 │
│  ○ Stage 2: Waiting...                                         │
│                                                                 │
│  💰 $0.012 (1.2k in / 800 out) ⏱ 15.2s                        │
│                                                                 │
╰─────────────────────────────────────────────────────────────────╯

Performance Considerations

Metric	Single Query	Council (3 models)
API Calls	1	~7 (2N + 1)
Cost	1x	3-4x
Latency	1x	2-3x
Accuracy	Base	Improved

When to Use Council

Use the council when accuracy matters more than speed or cost. It's particularly valuable for:

Complex security decisions
Architecture recommendations
Vulnerability assessments
Strategic planning tasks

Troubleshooting

Debug Mode

Enable detailed logging to diagnose issues:

export CAI_COUNCIL_DEBUG=1

Common Issues

"All models failed to respond"

Verify API keys are set correctly
Check model names with /model command
Check for rate limiting

Council hangs on Stage 1

Model name might be incorrect (verify with /model)
API key invalid or missing
Network connectivity issues

"Temperature not supported"

Handled automatically for GPT-5/O1/O3 models (temperature set to 1)

Test Individual Models

Before using council, verify each model works independently:

CAI> /model gpt-4o
CAI> What is 2+2?

Minimal Configuration

If experiencing issues, try a minimal setup:

export CAI_COUNCIL="gpt-4o,gpt-4o-mini"

Credits

Inspired by llm-council by Andrej Karpathy.