LLM Council
The LLM Council is a multi-model consensus system that queries multiple AI models, has them evaluate each other's responses, and synthesizes a final answer through a chairman. This approach improves accuracy for complex decisions by leveraging diverse model perspectives.
How It Works
The council operates in three stages:
User Query: "What's the best approach for this security task?"
│
├─ Stage 1: Council Members (parallel)
│ ├─ Agent + gpt-4o → Text response only
│ ├─ Agent + gpt-5 → Text response only
│ └─ Agent + claude-sonnet-4-5 → Text response only
│
├─ Stage 2: Rankings (parallel)
│ ├─ Agent + gpt-4o → Ranks anonymized responses
│ ├─ Agent + gpt-5 → Ranks anonymized responses
│ └─ Agent + claude-sonnet-4-5 → Ranks anonymized responses
│
└─ Stage 3: Chairman
└─ Active Agent + TOOLS
├─ Synthesizes best answer
└─ Can execute operations if requested
Key Points:
- Stage 1 & 2: Council members provide text-only responses (no tool execution)
- Stage 3: The chairman (your active agent) can use all available tools
- All members use the current agent's instructions and context
Quick Start
# Configure council models
export CAI_COUNCIL="gpt-4o,gpt-5,claude-sonnet-4-20250514"
# In CAI REPL, load an agent first
CAI> /agent redteam
# Use the council command
CAI> /council What are the best practices for API security?
# Or use the short alias
CAI> /c How should I approach this vulnerability assessment?
Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
CAI_COUNCIL |
Comma-separated list of council member models | gpt-4o,gpt-4o-mini |
CAI_COUNCIL_AUTO |
Auto-convene setting: false, true/1, or interval number |
false |
CAI_COUNCIL_PROMPT |
Custom prompt for auto-council reviews | See below |
CAI_COUNCIL_DEBUG |
Enable debug output (1, true, yes) |
false |
# Example configuration
export CAI_COUNCIL="gpt-4o,gpt-5,claude-sonnet-4-20250514"
export CAI_COUNCIL_AUTO="5"
export CAI_COUNCIL_PROMPT="Review the current progress and recommend the best approach."
export CAI_COUNCIL_DEBUG="1"
API Keys
Ensure you have the appropriate API keys set for your council models:
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export ALIAS_API_KEY="..."
Verified Model Names
Use exact model names as shown in the /model command:
| Provider | Models |
|---|---|
| OpenAI | gpt-5, gpt-4o, gpt-4o-mini, o3-mini |
| Anthropic | claude-sonnet-4-20250514, claude-3-5-sonnet-20240620 |
| Alias | alias1 |
| DeepSeek | deepseek-v3, deepseek-r1 |
Manual Usage
The /council command (alias /c) invokes the council manually:
# Load an agent
CAI> /agent redteam
# Ask the council
CAI> /council What vulnerabilities should I look for in this web application?
The council uses the active agent's:
- Instructions/system prompt
- Available tools (chairman only)
- Guardrails
Auto-Council Mode
When CAI_COUNCIL_AUTO is enabled, the council convenes automatically at specified intervals during agent execution.
Configuration Options
false- Never auto-convene (use/councilmanually)trueor1- Convene at every agent interaction5,10, etc. - Convene every N interactions
Example: Every Interaction
export CAI_COUNCIL_AUTO="1"
CAI> run ps aux, then analyze the results, then check for vulnerabilities
🏛️ COUNCIL (auto-invoked at interaction [1])
[Stage 1, 2, 3 run...]
[1] Agent: "I'll run ps aux" → executes command
🏛️ COUNCIL (auto-invoked at interaction [2])
[Stage 1, 2, 3 run...]
[2] Agent: "Analyzing results..." → analyzes output
🏛️ COUNCIL (auto-invoked at interaction [3])
[Stage 1, 2, 3 run...]
[3] Agent: "Checking for vulnerabilities..." → performs check
Example: Every 5 Interactions
export CAI_COUNCIL_AUTO="5"
CAI> perform a comprehensive security audit
[1] Agent executes first task
[2] Agent executes second task
[3] Agent executes third task
[4] Agent executes fourth task
🏛️ COUNCIL (auto-invoked at interaction [5])
[Stage 1, 2, 3 run...]
[5] Agent executes fifth task
[6] Agent continues...
[7] Agent continues...
[8] Agent continues...
[9] Agent continues...
🏛️ COUNCIL (auto-invoked at interaction [10])
[Stage 1, 2, 3 run...]
[10] Agent executes tenth task
Programmatic Usage
You can use the council directly in Python code:
from cai.council import run_full_council_agents, CouncilAgentConfig
from cai.sdk.agents import Agent
# With an existing agent
stage1, stage2, stage3, metadata = await run_full_council_agents(
base_agent=my_agent,
user_query="Your question here",
)
# Access results
print(stage3["response"]) # Final answer
print(metadata["aggregate_rankings"]) # Model rankings
print(metadata["council_cost"]) # Total cost
print(metadata["council_input_tokens"]) # Input tokens
print(metadata["council_output_tokens"]) # Output tokens
Return Values
stage1_results: List[Dict] # Individual responses from each model
stage2_results: List[Dict] # Rankings from each model
stage3_result: Dict # Final synthesized answer
metadata: Dict # Rankings, cost, tokens
Metadata Structure
metadata = {
"aggregate_rankings": [
{"model": "gpt-4o", "average_rank": 1.33, "rankings_count": 3},
{"model": "gpt-5", "average_rank": 2.0, "rankings_count": 3},
],
"council_cost": 0.032,
"council_input_tokens": 5000,
"council_output_tokens": 2500,
}
Visual Display
During execution, the council shows an animated panel with progress:
╭─────────────────────── Alias Council ────────────────────────╮
│ │
│ 👑 Chairman: Red Team Agent (gpt-4o) │
│ │
│ ⠋ Stage 1: Collecting responses from council members │
│ ██████████░░░░░░░░░░ 2/3 │
│ ✓ gpt-4o │
│ ✓ gpt-5 │
│ ⠋ alias1 │
│ │
│ ○ Stage 2: Waiting... │
│ │
│ 💰 $0.012 (1.2k in / 800 out) ⏱ 15.2s │
│ │
╰─────────────────────────────────────────────────────────────────╯
Performance Considerations
| Metric | Single Query | Council (3 models) |
|---|---|---|
| API Calls | 1 | ~7 (2N + 1) |
| Cost | 1x | 3-4x |
| Latency | 1x | 2-3x |
| Accuracy | Base | Improved |
When to Use Council
Use the council when accuracy matters more than speed or cost. It's particularly valuable for:
- Complex security decisions
- Architecture recommendations
- Vulnerability assessments
- Strategic planning tasks
Troubleshooting
Debug Mode
Enable detailed logging to diagnose issues:
export CAI_COUNCIL_DEBUG=1
Common Issues
"All models failed to respond"
- Verify API keys are set correctly
- Check model names with
/modelcommand - Check for rate limiting
Council hangs on Stage 1
- Model name might be incorrect (verify with
/model) - API key invalid or missing
- Network connectivity issues
"Temperature not supported"
- Handled automatically for GPT-5/O1/O3 models (temperature set to 1)
Test Individual Models
Before using council, verify each model works independently:
CAI> /model gpt-4o
CAI> What is 2+2?
Minimal Configuration
If experiencing issues, try a minimal setup:
export CAI_COUNCIL="gpt-4o,gpt-4o-mini"
Credits
Inspired by llm-council by Andrej Karpathy.