18 Commits

Author SHA1 Message Date
tgrosinger 2163ea45d2 OpenCode: Add OpenCode as a new provider
The OpenCode provider allows using a variety of models with an agent
harness that can gather more information from the codebase as required
(like with claude-code, codex, or gemini-cli).

This is an alternative to using OpenRouter directly, where the api
provider is more like a chatbot and cannot gather any additional context
beyond what was handed to it.
2026-05-29 16:19:13 -07:00
tgrosinger e4790ac77e Allow specifying tmp dir when preparing prompt 2026-05-29 16:19:13 -07:00
tgrosinger f642e58070 Claude: Use default xhigh effort
With opus-4.8, Claude defaults to "high". Bump up one level for review.
2026-05-28 10:24:02 -07:00
tgrosinger d666f7e08b Codex: Restrict permissions 2026-05-28 10:22:55 -07:00
tgrosinger 9e7989671a Add a flag to disable jokes 2026-05-28 10:22:53 -07:00
tgrosinger a8578beacd OpenRouter: Add OpenRouter as a new provider 2026-05-28 10:22:51 -07:00
tgrosinger 823333a4f5 Claude: Remove dangerously-skip-permissions
Instead, hard-code a list of allowed tools for claude that gives it
general read access.
2026-05-27 20:26:09 -07:00
Li Liu cafd72bcd5 fix: claude-code provider explicitly passes --effort max
settings.json effortLevel="max" gets silently demoted to "xhigh" by the
schema validator; the CLI flag form is honored correctly. Pass --effort max
explicitly so every claude-code invocation (reviewer / analyzer / summarizer /
audit) actually runs at the real max effort tier rather than the demoted
xhigh tier from settings.json fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 00:20:32 +00:00
Li Liu d2ec538dbf feat: audit reads house-rules from ~/.magpie/house-rules/<owner>_<repo>.md
Per-repo conventions now live at a stable user-level path instead of being
read from cwd. Audit extracts owner/repo from the PR URL in taskPrompt and
looks up ~/.magpie/house-rules/<owner>_<repo>.md. Works for both bot mode
and CLI mode without anyone needing to stage files into the worktree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 00:07:39 +00:00
Li Liu 30be792070 fix: assign verifyIssues return value back to parsedIssues 2026-05-26 23:23:11 +00:00
Li Liu e3fd28c0f0 feat: omniscient audit + tightened reviewer/structurizer/analyzer prompts
Major changes:

1. Audit (verifyIssues) rewrite — now THE final judge instead of a severity-recalibrator:
   - Inputs structured issues + Task line with PR URL (audit fetches diff itself via
     `gh pr diff` + Read/Grep/Glob). Does NOT consume reviewer chat or pre-stuffed diff.
   - Output schema extended: verdict (keep/rewrite/drop/new), body, evidence, reason.
   - Can DELETE false positives (not just downgrade), REWRITE weak descriptions, ADD
     missed issues — especially cross-file pattern repetition.
   - Optional .magpie-house-rules.md picked up from cwd as authoritative repo conventions.
   - New config block `audit:` with claude-opus-4-7 + max effort by default.

2. Reviewer prompts (Round 1 + Round 2):
   - Add severity vocabulary at reviewer stage (was only at structurizer before).
   - Add reverse rubric: do NOT report build script polish, missing comments, forward-
     compat hypotheticals, style preferences, theoretical-but-impossible cases.
   - Require file:line + code quote + failure scenario for every issue.
   - Drop "Review EVERY file / don't stop early" — brevity over completeness.
   - Round 2: drop "find what others MISSED" anti-pattern; agreeing is fine.

3. Structurizer:
   - line field now REQUIRED (drop issues that can't be anchored to a hunk line).
   - Description must capture WHY + FAILURE scenario + FIX (so audit has basis to verify).
   - Drop "STRICT — choose LOWER" severity bias.

4. Analyzer: add 6th "建议的 review 重点" section; parseFocusAreas now matches
   English + Chinese headings, with-/no-space, bold variant; handles `-` `*` `•`
   `·` ①-⑳ `1.`/`1、`/`1)` bullets.

5. Convergence judge: fix parse bug (verdict swallowed by trailing punctuation);
   explicit one-word verdict format constraint.

Schema additions:
- MergedIssue: verdict, body, evidence, auditReason
- MagpieConfig: audit?: ReviewerConfig

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 22:34:33 +00:00
Li Liu 6862947368 test: drop obsolete summaries assertion after summary-step removal
The per-reviewer summary step was removed in 0f03726, dropping the
summaries field from DebateResult, but this test still asserted on it
and failed. Remove the stale assertion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 14:30:06 -07:00
Li Liu 629ed8b00e feat: add --fail-fast option to abort review/discuss on any reviewer failure
By default the orchestrator is resilient: a single reviewer (or context
gatherer) failure is logged and the round continues with the survivors,
aborting only when all reviewers fail.

The new --fail-fast flag flips to strict mode — any reviewer or
context-gathering failure re-throws immediately and terminates the
whole flow. Wired through the review and discuss commands via
OrchestratorOptions.failFast, with a regression test and README docs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 14:30:06 -07:00
Li Liu da7097c1b6 fix: use stream-json output to prevent false inactivity timeout
Claude CLI in -p mode only outputs text to stdout when generating the
final response. During tool-heavy reviews (reading files, running
commands), no stdout/stderr is produced, causing the 900s inactivity
timeout to kill actively-working Claude processes.

Switch runClaudeStream to --output-format stream-json --verbose, which
emits JSON events for every tool call, tool result, and assistant
message. This keeps lastActivity alive during tool execution. The final
result text is extracted from the "result" event.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 12:43:28 +00:00
Li Liu 20d5434e13 fix: reset CLI session on error to prevent stale session reuse
When chat/chatStream throws (timeout, crash, etc.), the session ID was
left intact, causing subsequent rounds to --resume a dead session and
fail with "Session ID already in use". Now all 4 CLI providers reset
to a fresh session ID on error.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 03:04:41 +00:00
Li Liu 577121675c docs: update README for code-aware review pipeline
Reflect the new architecture: code-aware reviewers, integrated
verify+audit, multi-language context gathering, --no-conclusion flag,
and updated review dimensions (compatibility, extensibility,
feature interaction).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 11:01:51 +00:00
Li Liu afaa4d8f90 feat: major review pipeline overhaul — code-aware reviewers, integrated verify+audit
- Reviewers now fetch diff and read code themselves (CLI providers)
  instead of receiving pre-embedded diff text. Enables verification
  of issues against actual code context during review.
- Merge audit into magpie as verifyIssues() with Read/Grep/Glob tools,
  replacing the separate downstream audit step in li-bot.
- Add --no-conclusion flag to skip summarize step (bot mode).
- Context gatherer: support Go/C++/Proto/Python/Java/Scala symbol
  extraction and multi-language grep (was JS/TS only).
- Structurizer: standardize categories to 12 enums, add strict severity
  definitions, simplify description template for direct GitHub posting.
- Add isCliModel() helper to detect CLI vs API providers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 10:59:19 +00:00
Li Liu a28009101b fix: capture gemini CLI stderr for diagnostics
Stream mode was discarding stderr (_data), making it impossible to
diagnose exit code 1 failures. Now buffers stderr (capped at 10KB)
and appends the last 500 chars to error messages on crash or timeout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 06:29:13 +00:00
25 changed files with 1474 additions and 202 deletions
+101 -27
View File
@@ -1,14 +1,15 @@
# Magpie
Multi-AI adversarial PR review tool. Let different AI models review your code like Linus Torvalds, generating more comprehensive reviews through debate.
Multi-AI adversarial code review tool. Multiple AI models independently review your PR, debate their findings, then a code-aware verifier audits each issue against the actual codebase.
## Core Concepts
- **Same Perspective, Different Models**: All reviewers use the same prompt (Linus-style), but are powered by different AI models
- **Natural Adversarial**: Differences between models naturally create disagreements and debates
- **Anti-Sycophancy**: Explicitly tells AI they're debating with other AIs, preventing mutual agreement bias
- **Fair Debate Model**: All reviewers in the same round see identical information - no unfair advantage from execution order
- **Parallel Execution**: Same-round reviewers run concurrently for faster reviews
- **Code-Aware Review**: CLI-based reviewers (Claude Code, Codex, Gemini CLI) read the actual source files via tools — not just the diff text. They can grep for callers, read surrounding context, and verify their findings before reporting.
- **Multi-Dimensional Review**: Beyond correctness/security, reviewers check compatibility (rolling upgrade risks, breaking changes), feature interaction (shared state, cross-feature conflicts), and extensibility.
- **Natural Adversarial**: Different AI models naturally create disagreements and cross-validation through debate.
- **Integrated Verify+Audit**: After issues are extracted, a tool-equipped verifier reads the actual code to confirm each issue, filter false positives, and re-calibrate severity — all within magpie's pipeline.
- **Fair Debate Model**: All reviewers in the same round see identical information — no unfair advantage from execution order.
- **Parallel Execution**: Same-round reviewers run concurrently for faster reviews.
## Supported AI Providers
@@ -17,11 +18,13 @@ Multi-AI adversarial PR review tool. Let different AI models review your code li
| `claude-code` | CLI | Claude Code CLI (uses your subscription, no API key) |
| `codex-cli` | CLI | OpenAI Codex CLI (uses your subscription, no API key) |
| `gemini-cli` | CLI | Gemini CLI (uses Google account login, no API key) |
| `opencode-cli` | CLI | OpenCode CLI — runs any model (typically via OpenRouter) as a code-aware agent (requires backing provider's API key) |
| `qwen-code` | CLI | Alibaba Qwen Code CLI (uses OAuth login, no API key) |
| `claude-*` | API | Anthropic API (requires ANTHROPIC_API_KEY) |
| `gpt-*` | API | OpenAI API (requires OPENAI_API_KEY) |
| `gemini-*` | API | Google Gemini API (requires GOOGLE_API_KEY) |
| `minimax` | API | MiniMax API (requires MINIMAX_API_KEY) |
| `openrouter/*` | API | OpenRouter API, OpenAI-compatible (requires OPENROUTER_API_KEY) |
| `mock` | Debug | Mock provider for testing (no API key, see [Debug Mode](#debug-mode)) |
**Recommended**: Use CLI providers (claude-code, codex-cli, gemini-cli, qwen-code) - they're free with your subscriptions and don't require API keys.
@@ -40,6 +43,47 @@ providers:
base_url: https://my-proxy.example.com
```
### OpenRouter
OpenRouter exposes hundreds of models through a single OpenAI-compatible API. Magpie routes any model whose ID starts with `openrouter/` through OpenRouter:
```yaml
providers:
openrouter:
api_key: ${OPENROUTER_API_KEY}
# base_url: https://openrouter.ai/api/v1 # optional, this is the default
reviewers:
sonnet:
model: openrouter/anthropic/claude-3.5-sonnet
prompt: |
...
llama:
model: openrouter/meta-llama/llama-3-70b-instruct
prompt: |
...
```
The portion after `openrouter/` is sent to OpenRouter verbatim, so use any model ID listed at https://openrouter.ai/models.
### OpenCode CLI
Models routed through `openrouter/*` reach the model purely as a chat completion — the reviewer sees only the diff and prompt and cannot read source files. To get a code-aware agent on top of OpenRouter (or any other backing provider), use the `opencode-cli` provider, which wraps the [OpenCode](https://opencode.ai/) CLI:
```yaml
providers:
openrouter:
api_key: ${OPENROUTER_API_KEY}
reviewers:
sonnet-agent:
model: opencode-cli:openrouter/anthropic/claude-sonnet-4
prompt: |
...
```
The portion after `opencode-cli:` is passed verbatim to opencode's `-m provider/model` flag. Reviewers run with a read-only tool allowlist (Read, Grep, Glob, plus `gh`/`git`/`rg`) — matching the claude-code provider's permissions. API keys from `providers.openrouter.api_key` (and `anthropic`/`openai`/`google` if configured) are forwarded into opencode's environment, so you don't need a second copy of your keys.
## Installation
```bash
@@ -102,33 +146,30 @@ reviewers:
claude:
model: claude-code
prompt: |
You are a senior engineer reviewing this PR. Be direct and concise like Linus Torvalds,
but constructive rather than harsh.
You are a senior engineer reviewing this PR. Be precise and evidence-based.
Review dimensions: Correctness, Security, Compatibility (rolling upgrade,
breaking changes), Feature Interaction (shared state, cross-feature conflicts),
Extensibility, Architecture, Performance & Resources.
Use Read/Grep tools to verify findings against actual code.
Focus on:
1. **Correctness** - Will this code work? Edge cases?
2. **Security** - Any vulnerabilities? Input validation?
3. **Architecture** - Does this fit the overall design? Any coupling issues?
4. **Simplicity** - Is this the simplest solution? Over-engineering?
gemini:
model: gemini-cli
codex:
model: codex-cli
prompt: |
# Same as above...
# Same dimensions as above
# Analyzer - PR analysis (before debate)
analyzer:
model: claude-code
prompt: |
You are a senior engineer providing PR context analysis.
Analyze this PR and provide:
1. What this PR does
2. Architecture/design decisions
3. Purpose
4. Trade-offs
5. Things to note
3. Affected interfaces/APIs (flag breaking changes)
4. Compatibility risks (rolling upgrade, serialization changes)
5. Feature interaction risks (callers, shared state)
6. Suggested review focus (specific files + line ranges)
# Summarizer - final conclusion
# Summarizer - final conclusion + verify+audit
summarizer:
model: claude-code
prompt: |
@@ -177,6 +218,8 @@ Options:
--git-remote <remote> Git remote for PR URL detection (default: origin)
--skip-context Skip context gathering phase
--no-post Skip post-processing (GitHub comment flow)
--no-conclusion Skip final conclusion generation (for bot/CI use)
--fail-fast Abort the entire review immediately if any reviewer fails
--plan-only Generate review plan without executing
--reanalyze Force re-analyze features (ignore cache)
@@ -206,6 +249,7 @@ Options:
--reviewers <ids> Comma-separated reviewer IDs
-a, --all Use all configured reviewers
-d, --devil-advocate Add a Devil's Advocate to challenge consensus
--fail-fast Abort the entire discussion immediately if any reviewer fails
--list List all discuss sessions
--resume <id> Resume a discuss session with follow-up question
```
@@ -324,24 +368,33 @@ Discussion features:
```
1. Context Gathering (if enabled)
│ Collects: affected modules, related PRs, call chains
│ Supports: Go, C++, Python, Java, Scala, TS/JS, Rust, Proto
2. Analyzer analyzes PR
│ Outputs: summary, interface changes, compatibility risks,
│ interaction risks, specific review focus areas
3. [Interactive] Post-analysis Q&A (ask specific reviewers)
4. Multi-round debate
├─ Round 1: All reviewers give INDEPENDENT opinions (parallel)
No reviewer sees others' responses yet
CLI reviewers fetch diff + read code via tools
│ ↓
├─ Convergence check: Did reviewers reach consensus?
│ ↓
├─ Round 2+: Reviewers see ALL previous rounds (parallel)
Each reviewer responds to others' points
│ Same-round reviewers see identical information
Cross-validate findings, challenge weak arguments
│ ↓
└─ ... (repeat until max rounds or convergence)
5. Summarizer produces final conclusion from full debate history
5. Structurizer extracts issues into structured JSON
6. Verify+Audit (tool-equipped)
│ For each issue: Read/Grep actual code to verify
│ Filters: false positives, by-design patterns, pre-existing issues
│ Re-calibrates severity based on evidence
7. [Optional] Summarizer produces final conclusion (--no-conclusion to skip)
```
### Fair Debate Model
@@ -363,7 +416,7 @@ Before the review begins, Magpie automatically gathers system-level context to h
- **Affected Modules**: Identifies which parts of the system are impacted (core, moderate, low)
- **Related PRs**: Finds relevant past PRs from project history
- **Call Chain Analysis**: Traces how changed code connects to the rest of the system
- **Call Chain Analysis**: Traces how changed code connects to the rest of the system (supports Go, C++, Python, Java, Scala, TypeScript, Rust, Proto)
```
┌─ System Context ─────────────────────────────────────────┐
@@ -436,6 +489,20 @@ magpie review 12345 --no-converge
Set `defaults.check_convergence: false` in config to disable by default.
### Failure Handling
By default, Magpie is **resilient**: if a single reviewer fails (network error, rate limit, model unavailable), the round continues with the surviving reviewers and only aborts if *all* reviewers fail. The failed reviewer's slot shows `[Review failed: ...]` and is excluded from subsequent rounds.
Use `--fail-fast` to flip to strict mode — any single reviewer failure (or context-gathering failure) immediately terminates the entire flow with an error:
```bash
# Strict mode: abort the moment anything fails
magpie review 12345 --fail-fast
magpie discuss "Should we use microservices?" --fail-fast
```
Useful when you want to guarantee every configured reviewer participated, or when you're debugging provider/auth issues and don't want failures swallowed.
### Markdown Rendering
All outputs (analysis, reviewer comments, final conclusion) are rendered with proper markdown formatting in terminal - headers, bold, tables, code blocks all display correctly.
@@ -462,6 +529,13 @@ While waiting for AI reviewers, enjoy programmer jokes:
⠋ claude is thinking... | Why do programmers confuse Halloween and Christmas? Because Oct 31 = Dec 25
```
Disable them via config if you prefer a quieter spinner:
```yaml
defaults:
show_jokes: false
```
### Post-Review Discussion Phase (Interactive Mode)
In interactive mode (`-i`), after the debate concludes, you can enter a **discussion phase** to chat with any role (reviewers, analyzer, or summarizer) before the comment posting step:
+12 -7
View File
@@ -199,6 +199,7 @@ interface DiscussOptions {
list?: boolean
resume?: string
devilAdvocate?: boolean
failFast?: boolean
}
async function runDiscussion(
@@ -230,6 +231,7 @@ async function runDiscussion(
const isSoloDiscussion = reviewers.length === 1
const maxRounds = isSoloDiscussion ? 1 : parseInt(options.rounds, 10)
const checkConvergence = !isSoloDiscussion && options.converge !== false && (config.defaults.check_convergence !== false)
const showJokes = config.defaults.show_jokes !== false
const summarizer: Reviewer = {
id: 'summarizer',
@@ -302,6 +304,7 @@ async function runDiscussion(
checkConvergence,
language: lang,
interruptState,
failFast: !!options.failFast,
onWaiting: (reviewerId) => {
flushBuffer()
if (spinnerRef.spinner) spinnerRef.spinner.stop()
@@ -320,29 +323,30 @@ async function runDiscussion(
`${reviewerId} is thinking`
const updateSpinner = () => {
const joke = getRandomJoke()
if (spinnerRef.spinner) {
if (!spinnerRef.spinner) return
const jokeSuffix = showJokes ? ` ${chalk.dim(`| ${getRandomJoke()}`)}` : ''
if (spinnerRef.parallelStatuses && isParallelRound) {
const round = parseInt(reviewerId.split('-')[1])
const statusLine = formatParallelStatus(round, spinnerRef.parallelStatuses)
spinnerRef.spinner.text = `${statusLine} ${chalk.dim(`| ${joke}`)}`
spinnerRef.spinner.text = `${statusLine}${jokeSuffix}`
} else {
spinnerRef.spinner.text = `${baseLabel}... ${chalk.dim(`| ${joke}`)}`
}
spinnerRef.spinner.text = `${baseLabel}...${jokeSuffix}`
}
}
spinnerRef.parallelStatuses = null
spinnerRef.spinner = ora({ text: `${baseLabel}...`, discardStdin: false }).start()
updateSpinner()
if (showJokes) {
spinnerRef.interval = setInterval(updateSpinner, 8000)
}
},
onParallelStatus: (round, statuses) => {
spinnerRef.parallelStatuses = statuses
if (spinnerRef.spinner) {
const joke = getRandomJoke()
const jokeSuffix = showJokes ? ` ${chalk.dim(`| ${getRandomJoke()}`)}` : ''
const statusLine = formatParallelStatus(round, statuses)
spinnerRef.spinner.text = `${statusLine} ${chalk.dim(`| ${joke}`)}`
spinnerRef.spinner.text = `${statusLine}${jokeSuffix}`
}
},
onMessage: (reviewerId, chunk) => {
@@ -443,6 +447,7 @@ export const discussCommand = new Command('discuss')
.option('--reviewers <ids>', 'Comma-separated reviewer IDs')
.option('-a, --all', 'Use all reviewers')
.option('-d, --devil-advocate', "Add a Devil's Advocate to challenge consensus")
.option('--fail-fast', 'Abort the entire discussion immediately if any reviewer fails')
.option('--list', 'List all discuss sessions')
.option('--resume <id>', 'Resume a discuss session')
.action(async (topic: string | undefined, options: DiscussOptions) => {
+1
View File
@@ -79,6 +79,7 @@ export const initCommand = new Command('init')
if (r.provider === 'anthropic') envVars.add('ANTHROPIC_API_KEY')
if (r.provider === 'openai') envVars.add('OPENAI_API_KEY')
if (r.provider === 'google') envVars.add('GOOGLE_API_KEY')
if (r.provider === 'openrouter') envVars.add('OPENROUTER_API_KEY')
})
envVars.forEach(v => console.log(` - ${v}`))
}
+54 -20
View File
@@ -3,7 +3,7 @@ import chalk from 'chalk'
import ora from 'ora'
import { execSync } from 'child_process'
import { loadConfig } from '../config/loader.js'
import { createProvider } from '../providers/factory.js'
import { createProvider, isCliModel } from '../providers/factory.js'
import { DebateOrchestrator } from '../orchestrator/orchestrator.js'
import type { Reviewer, ReviewerStatus } from '../orchestrator/types.js'
import { createInterface } from 'readline'
@@ -55,6 +55,8 @@ export const reviewCommand = new Command('review')
.option('--export <file>', 'Export completed review to markdown')
.option('--skip-context', 'Skip context gathering phase')
.option('--no-post', 'Skip post-processing (GitHub comment flow)')
.option('--no-conclusion', 'Skip final conclusion generation (bot mode)')
.option('--fail-fast', 'Abort the entire review immediately if any reviewer (or context gatherer) fails')
.action(async (pr: string | undefined, options) => {
const spinner = ora('Loading configuration...').start()
@@ -224,10 +226,35 @@ export const reviewCommand = new Command('review')
}
}
// Pre-fetch PR diff and info so all reviewers (including API-only models) get the code
let prDiff = ''
// Fetch PR metadata (title/body) — always needed
let prTitle = ''
let prBody = ''
try {
const prInfo = JSON.parse(execSync(`gh pr view ${prUrl} --json title,body`, { encoding: 'utf-8', timeout: 30000 }))
prTitle = prInfo.title || ''
prBody = prInfo.body || ''
} catch {
// Non-fatal: reviewers can still work without metadata
}
// Check if all reviewers (+ analyzer) are CLI-based.
// CLI providers can fetch diff and read code themselves via tools.
// API providers need the diff pre-fetched and embedded in the prompt.
const allModels = [
...Object.values(config.reviewers).map(r => r.model),
config.analyzer.model,
config.summarizer.model,
]
const allCli = allModels.every(m => isCliModel(m))
let prPrompt: string
if (allCli) {
// CLI mode: reviewers fetch diff and read code themselves
console.log(chalk.dim(` CLI-only reviewers detected — reviewers will fetch diff and read code directly`))
prPrompt = `Please review ${prUrl}.\n\nTitle: ${prTitle}\n\nDescription:\n${prBody}\n\nYou have full access to the repository. Use \`gh pr diff ${prUrl}\` to get the diff, then use Read/Grep tools to examine the actual source files for context. Review every changed file and function systematically.`
} else {
// API mode: pre-fetch diff and embed in prompt
let prDiff = ''
let diffTruncationNote = ''
try {
prDiff = execSync(`gh pr diff ${prUrl}`, { encoding: 'utf-8', timeout: 60000, maxBuffer: 10 * 1024 * 1024 })
@@ -270,17 +297,11 @@ export const reviewCommand = new Command('review')
console.error(chalk.yellow(`Warning: Could not pre-fetch PR diff: ${errMsg.slice(0, 100)}`))
}
}
try {
const prInfo = JSON.parse(execSync(`gh pr view ${prUrl} --json title,body`, { encoding: 'utf-8', timeout: 30000 }))
prTitle = prInfo.title || ''
prBody = prInfo.body || ''
} catch {
// Non-fatal: reviewers can still work with just the diff
}
const prPrompt = prDiff
prPrompt = prDiff
? `Please review ${prUrl}.\n\nTitle: ${prTitle}\n\nDescription:\n${prBody}${diffTruncationNote}\n\nHere is the PR diff:\n\n\`\`\`diff\n${prDiff}\`\`\`\n\nAnalyze these changes and provide your feedback. You already have the complete diff above — do NOT attempt to fetch it again.`
: `Please review ${prUrl}. Get the PR details and diff using any method available to you, then analyze the changes.`
}
target = {
type: 'pr',
@@ -361,6 +382,16 @@ export const reviewCommand = new Command('review')
systemPrompt: config.analyzer.prompt
}
// Create auditor (final judge). Uses config.audit if present; else falls back
// to summarizer (caller side just passes undefined so the orchestrator default kicks in).
const auditor: Reviewer | undefined = config.audit
? {
id: 'auditor',
provider: createProvider(config.audit.model, config),
systemPrompt: config.audit.prompt
}
: undefined
// Create context gatherer (if enabled)
let contextGatherer: ContextGatherer | undefined
const contextEnabled = !options.skipContext && (config.contextGatherer?.enabled !== false)
@@ -382,6 +413,7 @@ export const reviewCommand = new Command('review')
const maxRounds = isSoloReview ? 1 : parseInt(options.rounds, 10)
// Convergence: disable for solo review; otherwise default from config, CLI can override with --no-converge
const checkConvergence = !isSoloReview && options.converge !== false && (config.defaults.check_convergence !== false)
const showJokes = config.defaults.show_jokes !== false
console.log()
console.log(chalk.bgBlue.white.bold(` ${target.label} Review `))
@@ -433,6 +465,8 @@ export const reviewCommand = new Command('review')
checkConvergence,
language: config.defaults.language,
interruptState,
skipConclusion: options.conclusion === false,
failFast: !!options.failFast,
onWaiting: (reviewerId) => {
// Flush previous reviewer's buffer before showing spinner
flushBuffer()
@@ -460,31 +494,31 @@ export const reviewCommand = new Command('review')
// Show spinner with a joke (and parallel status if available)
const updateSpinner = () => {
const joke = getRandomJoke()
if (spinnerRef.spinner) {
if (!spinnerRef.spinner) return
const jokeSuffix = showJokes ? ` ${chalk.dim(`| ${getRandomJoke()}`)}` : ''
if (spinnerRef.parallelStatuses && isParallelRound) {
const round = parseInt(reviewerId.split('-')[1])
const statusLine = formatParallelStatus(round, spinnerRef.parallelStatuses)
spinnerRef.spinner.text = `${statusLine} ${chalk.dim(`| ${joke}`)}`
spinnerRef.spinner.text = `${statusLine}${jokeSuffix}`
} else {
spinnerRef.spinner.text = `${baseLabel}... ${chalk.dim(`| ${joke}`)}`
}
spinnerRef.spinner.text = `${baseLabel}...${jokeSuffix}`
}
}
spinnerRef.parallelStatuses = null // Reset for new waiting phase
spinnerRef.spinner = ora({ text: `${baseLabel}...`, discardStdin: false }).start()
updateSpinner()
// Update joke every 15 seconds
if (showJokes) {
spinnerRef.interval = setInterval(updateSpinner, 15000)
}
},
onParallelStatus: (round, statuses) => {
spinnerRef.parallelStatuses = statuses
// Immediately update spinner to show new status
if (spinnerRef.spinner) {
const joke = getRandomJoke()
const jokeSuffix = showJokes ? ` ${chalk.dim(`| ${getRandomJoke()}`)}` : ''
const statusLine = formatParallelStatus(round, statuses)
spinnerRef.spinner.text = `${statusLine} ${chalk.dim(`| ${joke}`)}`
spinnerRef.spinner.text = `${statusLine}${jokeSuffix}`
}
},
onMessage: (reviewerId, chunk) => {
@@ -606,7 +640,7 @@ export const reviewCommand = new Command('review')
console.log(marked(fixMarkdown(context.summary)))
}
}
}, contextGatherer)
}, contextGatherer, auditor)
const result = await orchestrator.runStreaming(target.label, target.prompt)
+26 -2
View File
@@ -9,7 +9,7 @@ export interface ReviewerOption {
model: string
description: string
needsApiKey: boolean
provider?: 'anthropic' | 'openai' | 'google'
provider?: 'anthropic' | 'openai' | 'google' | 'openrouter'
}
export const AVAILABLE_REVIEWERS: ReviewerOption[] = [
@@ -34,6 +34,14 @@ export const AVAILABLE_REVIEWERS: ReviewerOption[] = [
description: 'Uses your Gemini CLI (Google account, no API key needed)',
needsApiKey: false
},
{
id: 'opencode-cli',
name: 'OpenCode (via OpenRouter)',
model: 'opencode-cli:openrouter/anthropic/claude-3.5-sonnet',
description: 'Runs any OpenRouter model as a code-aware agent via the OpenCode CLI (requires OPENROUTER_API_KEY)',
needsApiKey: true,
provider: 'openrouter'
},
{
id: 'claude-api',
name: 'Claude Sonnet 4.5',
@@ -57,6 +65,14 @@ export const AVAILABLE_REVIEWERS: ReviewerOption[] = [
description: 'Uses Google AI API (requires GOOGLE_API_KEY)',
needsApiKey: true,
provider: 'google'
},
{
id: 'openrouter',
name: 'OpenRouter (Claude 3.5 Sonnet)',
model: 'openrouter/anthropic/claude-3.5-sonnet',
description: 'Uses OpenRouter API (requires OPENROUTER_API_KEY). Change the model field to any OpenRouter-supported ID.',
needsApiKey: true,
provider: 'openrouter'
}
]
@@ -98,6 +114,7 @@ export function generateConfig(selectedReviewerIds: string[]): string {
const needsAnthropic = selectedReviewers.some(r => r.provider === 'anthropic')
const needsOpenai = selectedReviewers.some(r => r.provider === 'openai')
const needsGoogle = selectedReviewers.some(r => r.provider === 'google')
const needsOpenrouter = selectedReviewers.some(r => r.provider === 'openrouter')
// Build providers section
let providersSection = '# AI Provider API Keys (use environment variables)\nproviders:'
@@ -116,7 +133,13 @@ export function generateConfig(selectedReviewerIds: string[]): string {
google:
api_key: \${GOOGLE_API_KEY}`
}
if (!needsAnthropic && !needsOpenai && !needsGoogle) {
if (needsOpenrouter) {
providersSection += `
openrouter:
api_key: \${OPENROUTER_API_KEY}
# base_url: https://openrouter.ai/api/v1 # optional, this is the default`
}
if (!needsAnthropic && !needsOpenai && !needsGoogle && !needsOpenrouter) {
providersSection += ' {}' // Empty providers if only CLI tools are used
}
@@ -142,6 +165,7 @@ defaults:
max_rounds: 5
output_format: markdown
check_convergence: true # Stop early when reviewers reach consensus
show_jokes: true # Show rotating programmer jokes in the spinner while waiting
${reviewersSection}
+4
View File
@@ -83,6 +83,10 @@ function validateConfig(config: MagpieConfig): void {
}
validateReviewerConfig('analyzer', config.analyzer)
if (config.audit) {
validateReviewerConfig('audit', config.audit)
}
// Warn (don't throw) if API keys look empty — CLI providers don't need them
if (!config.providers) return
for (const [name, prov] of Object.entries(config.providers)) {
+4
View File
@@ -15,6 +15,7 @@ export interface DefaultsConfig {
check_convergence: boolean
language?: string // Output language (e.g., 'zh', 'en', 'ja')
diff_exclude?: string[] // Glob patterns for files to exclude from diff (e.g., '*.pb.go', '*generated*')
show_jokes?: boolean // Show rotating programmer jokes in spinner text while waiting (default: true)
}
export interface ContextGathererConfigOptions {
@@ -41,13 +42,16 @@ export interface MagpieConfig {
google?: ProviderConfig
'claude-code'?: { enabled: boolean }
'codex-cli'?: { enabled: boolean }
'opencode-cli'?: { enabled: boolean }
'qwen-code'?: { enabled: boolean }
minimax?: ProviderConfig
openrouter?: ProviderConfig
}
mock?: boolean
defaults: DefaultsConfig
reviewers: Record<string, ReviewerConfig>
summarizer: ReviewerConfig
analyzer: ReviewerConfig
audit?: ReviewerConfig // Omniscient final judge; falls back to summarizer if absent
contextGatherer?: ContextGathererConfigOptions
}
@@ -3,35 +3,86 @@ import { spawnSync } from 'child_process'
import type { RawReference } from '../types.js'
/**
* Extract function/class names from diff
* Common keywords to exclude from symbol extraction (language-spanning)
*/
const STOP_SYMBOLS = new Set([
// JS/TS
'get', 'set', 'new', 'for', 'if', 'do', 'var', 'let', 'const', 'return',
'else', 'case', 'break', 'continue', 'switch', 'while', 'try', 'catch',
'throw', 'typeof', 'void', 'delete', 'import', 'export', 'default', 'from',
'async', 'await', 'yield', 'class', 'extends', 'super', 'this',
// Go
'func', 'type', 'struct', 'interface', 'map', 'chan', 'range', 'defer',
'select', 'nil', 'err', 'error', 'string', 'bool', 'int', 'int32', 'int64',
'uint', 'uint32', 'uint64', 'float32', 'float64', 'byte', 'rune', 'len',
'cap', 'make', 'append', 'copy', 'close', 'panic', 'recover', 'println',
'true', 'false', 'init', 'main',
// C/C++
'void', 'int', 'char', 'bool', 'auto', 'long', 'short', 'unsigned',
'signed', 'float', 'double', 'size_t', 'nullptr', 'static', 'const',
'virtual', 'override', 'inline', 'explicit', 'template', 'typename',
'namespace', 'using', 'public', 'private', 'protected',
// Proto
'message', 'service', 'rpc', 'enum', 'oneof', 'optional', 'repeated',
'required', 'reserved', 'returns', 'option',
// Python
'def', 'self', 'cls', 'None', 'True', 'False', 'pass', 'with', 'lambda',
// Java/Scala
'public', 'private', 'protected', 'static', 'final', 'abstract', 'synchronized',
'val', 'var', 'object', 'trait', 'extends', 'with', 'override',
])
/**
* Extract function/class/struct names from diff (multi-language)
*/
export function extractSymbolsFromDiff(diff: string): string[] {
const symbols: Set<string> = new Set()
// Match function definitions: function name(, async function name(, const name = (, etc.
const functionPatterns = [
const patterns: RegExp[] = [
// JS/TS: function name(, async function name(
/^\+.*(?:function|async function)\s+([a-zA-Z_][a-zA-Z0-9_]*)\s*\(/gm,
// JS/TS: const name = (, const name = async (
/^\+.*(?:const|let|var)\s+([a-zA-Z_][a-zA-Z0-9_]*)\s*=\s*(?:async\s*)?\(/gm,
// JS/TS: const name = (...) =>
/^\+.*(?:const|let|var)\s+([a-zA-Z_][a-zA-Z0-9_]*)\s*=\s*(?:async\s*)?\([^)]*\)\s*=>/gm,
// JS/TS: class Name
/^\+.*class\s+([a-zA-Z_][a-zA-Z0-9_]*)/gm,
// JS/TS: method definitions in classes
/^\+\s+(?:async\s+)?([a-zA-Z_][a-zA-Z0-9_]*)\s*\([^)]*\)\s*[:{]/gm,
// JS/TS: export declarations
/^\+.*export\s+(?:const|let|var|function|class|async function)\s+([a-zA-Z_][a-zA-Z0-9_]*)/gm,
// Go: func Name(, func (receiver) Name(
/^\+.*func\s+(?:\([^)]*\)\s+)?([A-Z][a-zA-Z0-9_]*)\s*\(/gm,
// Go: type Name struct/interface
/^\+.*type\s+([A-Z][a-zA-Z0-9_]*)\s+(?:struct|interface)\b/gm,
// C/C++: return-type FunctionName(
/^\+.*(?:void|int|bool|char|auto|Status|string|std::string|size_t|int32_t|int64_t|uint32_t|uint64_t|float|double)\s+([A-Z][a-zA-Z0-9_]*)\s*\(/gm,
// C/C++: ClassName::MethodName(
/^\+.*([A-Z][a-zA-Z0-9_]*)::\s*([A-Z][a-zA-Z0-9_]*)\s*\(/gm,
// C/C++: class/struct Name
/^\+.*(?:class|struct)\s+([A-Z][a-zA-Z0-9_]*)/gm,
// Proto: message Name, service Name, rpc Name
/^\+\s*(?:message|service)\s+([A-Z][a-zA-Z0-9_]*)/gm,
/^\+\s*rpc\s+([A-Z][a-zA-Z0-9_]*)\s*\(/gm,
// Python: def name(, class Name
/^\+\s*def\s+([a-zA-Z_][a-zA-Z0-9_]*)\s*\(/gm,
/^\+\s*class\s+([A-Z][a-zA-Z0-9_]*)/gm,
// Java/Scala: public/private type Name(
/^\+\s*(?:public|private|protected)?\s*(?:static\s+)?(?:def|void|int|boolean|String|long|double|float|[A-Z][a-zA-Z0-9_<>]*)\s+([a-zA-Z_][a-zA-Z0-9_]*)\s*\(/gm,
]
// Match class definitions
const classPattern = /^\+.*class\s+([a-zA-Z_][a-zA-Z0-9_]*)/gm
// Match method definitions in classes
const methodPattern = /^\+\s+(?:async\s+)?([a-zA-Z_][a-zA-Z0-9_]*)\s*\([^)]*\)\s*[:{]/gm
// Match exported names
const exportPattern = /^\+.*export\s+(?:const|let|var|function|class|async function)\s+([a-zA-Z_][a-zA-Z0-9_]*)/gm
for (const pattern of [...functionPatterns, classPattern, methodPattern, exportPattern]) {
for (const pattern of patterns) {
let match
while ((match = pattern.exec(diff)) !== null) {
const name = match[1]
// Filter out common keywords and short names
if (name && name.length > 2 && !['get', 'set', 'new', 'for', 'if', 'do'].includes(name)) {
// For C++ ClassName::MethodName pattern, capture both parts
const name = match[2] || match[1]
if (name && name.length > 2 && !STOP_SYMBOLS.has(name)) {
symbols.add(name)
}
// Also add the class name for Class::Method patterns
if (match[2] && match[1] && match[1].length > 2 && !STOP_SYMBOLS.has(match[1])) {
symbols.add(match[1])
}
}
}
@@ -53,7 +104,8 @@ export function findReferences(symbols: string[], cwd: string = process.cwd()):
'-n', '-H', '--no-heading',
'-F',
'-e', symbol,
'--type', 'ts', '--type', 'js',
'--type-add', 'code:*.{go,cpp,cc,cxx,h,hpp,hxx,c,py,java,scala,ts,tsx,js,jsx,rs,proto,cs}',
'--type', 'code',
], { cwd, encoding: 'utf-8', maxBuffer: 5 * 1024 * 1024 })
const output = result.stdout || ''
+34 -6
View File
@@ -105,16 +105,44 @@ export function deduplicateIssues(
/**
* Extract suggested review focus areas from analyzer output.
* Looks for a "## Suggested Review Focus" section with bullet points.
* Matches the focus section heading in several flavors:
* - "## Suggested Review Focus" (English heading)
* - "## 建议的 review 重点" (Chinese heading with space)
* - "## 建议的review重点" (Chinese heading no space)
* - "**建议的 review 重点**" (bold variant)
* - "**Suggested Review Focus**" (English bold variant)
* Reads until the next heading (##, **bold heading**) or end of section.
*/
export function parseFocusAreas(analysis: string): string[] {
const match = analysis.match(/## Suggested Review Focus\s*\n([\s\S]*?)(?=\n##|\n*$)/)
// Heading pattern: either a markdown heading (##) or a standalone bold line (**...**)
// Title text matches Chinese or English variants.
const titlePattern = '(?:Suggested\\s+Review\\s+Focus|建议的\\s*review\\s*重点)'
// Optional leading numbering like "6.", "6、", "6" before the title (analyzer may inline-number sections).
const numberPrefix = '(?:\\d+[\\.、\\)]\\s*)?'
const headingRegex = new RegExp(
// Either: line starting with ## (optional number prefix), then title (optionally wrapped in **)
// Or: a standalone bold line **title** (with optional number prefix inside)
`(?:^|\\n)(?:#{1,6}\\s*${numberPrefix}\\*{0,2}${titlePattern}\\*{0,2}|\\*\\*${numberPrefix}${titlePattern}\\*\\*)[^\\n]*\\n([\\s\\S]*?)(?=\\n#{1,6}\\s|\\n\\*\\*[^\\n*]+\\*\\*\\s*\\n|$)`,
'i'
)
const match = analysis.match(headingRegex)
if (!match) return []
const lines = match[1].trim().split('\n')
return lines
.map(line => line.replace(/^[-*]\s*/, '').trim())
.filter(line => line.length > 0)
const body = match[1].trim()
if (!body) return []
// Pull out lines that look like bulleted items.
// Supported markers: "-", "*", "1.", "1)", "1、", "①", "•", and Chinese full-width number variants.
const bulletRegex = /^\s*(?:[-*·]|[-]|[\d]+[\.\)])\s+/u
const lines = body.split('\n')
const items: string[] = []
for (const raw of lines) {
if (!bulletRegex.test(raw)) continue
const stripped = raw.replace(bulletRegex, '').trim()
if (stripped.length === 0) continue
items.push(stripped)
}
return items
}
const STOP_WORDS = new Set(['the', 'a', 'in', 'of', 'is', 'to', 'and', 'for', 'with', 'this', 'that', 'it'])
+290 -30
View File
@@ -79,16 +79,20 @@ export class DebateOrchestrator {
private taskPrompt: string = '' // Original task prompt (contains PR number, etc.)
private lastSeenIndex: Map<string, number> = new Map() // Track what each reviewer has seen
private auditor: Reviewer // Final judge. Falls back to summarizer if not configured.
constructor(
reviewers: Reviewer[],
summarizer: Reviewer,
analyzer: Reviewer,
options: OrchestratorOptions,
contextGatherer?: ContextGatherer
contextGatherer?: ContextGatherer,
auditor?: Reviewer
) {
this.reviewers = reviewers
this.summarizer = summarizer
this.analyzer = analyzer
this.auditor = auditor || summarizer
this.contextGatherer = contextGatherer || null
this.options = options
}
@@ -187,7 +191,14 @@ Reviews from Round ${roundsCompleted}:
${messagesText}
First, provide a brief reasoning (2-3 sentences) explaining your judgment.
Then on the LAST line, respond with EXACTLY one word: CONVERGED or NOT_CONVERGED`
Output your verdict on the LAST LINE with EXACTLY this format (no punctuation, no extra words):
CONVERGED
or
NOT_CONVERGED`
const messages: Message[] = [{ role: 'user', content: prompt }]
const response = await this.summarizer.provider.chat(
@@ -197,8 +208,10 @@ Then on the LAST line, respond with EXACTLY one word: CONVERGED or NOT_CONVERGED
// Parse response - extract verdict from last line, rest is reasoning
const lines = response.trim().split('\n')
const lastLine = lines[lines.length - 1].trim().toUpperCase()
const verdict = lastLine.split(/\s+/)[0]
const lastLine = lines[lines.length - 1].trim()
// Strip all non-letter characters and uppercase to match verdict robustly:
// "CONVERGED.", "Verdict: converged", "**CONVERGED**" all work.
const verdict = lastLine.replace(/[^A-Za-z_]/g, '').toUpperCase()
const isConverged = verdict === 'CONVERGED'
// Extract reasoning (everything except the last line)
@@ -298,15 +311,24 @@ Then on the LAST line, respond with EXACTLY one word: CONVERGED or NOT_CONVERGED
this.checkInterrupt()
// Get final conclusion directly from conversation history
const finalConclusion = await this.getFinalConclusion()
let finalConclusion = ''
let verifiedConclusion: string | undefined
// Verify the conclusion against the actual PR/code
const verifiedConclusion = await this.verifyConclusion(finalConclusion)
if (!this.options.skipConclusion) {
finalConclusion = await this.getFinalConclusion()
}
// End summarizer session for clean JSON extraction call
this.summarizer.provider.endSession?.()
const parsedIssues = await this.extractIssues()
let parsedIssues = await this.extractIssues()
if (parsedIssues.length > 0) {
parsedIssues = await this.verifyIssues(parsedIssues)
}
if (finalConclusion && !this.options.skipConclusion) {
verifiedConclusion = await this.verifyConclusion(finalConclusion)
}
return {
prNumber: label,
@@ -352,6 +374,9 @@ Then on the LAST line, respond with EXACTLY one word: CONVERGED or NOT_CONVERGED
const diff = this.extractDiffFromPrompt(prompt)
this.gatheredContext = await this.contextGatherer!.gather(diff, label, 'main')
} catch (error) {
if (this.options.failFast) {
throw new Error(`Context gathering failed (fail-fast): ${error instanceof Error ? error.message : String(error)}`)
}
logger.warn('Context gathering failed:', error)
}
})()
@@ -489,6 +514,10 @@ Then on the LAST line, respond with EXACTLY one word: CONVERGED or NOT_CONVERGED
duration: (endTime - startTime) / 1000
}
this.options.onParallelStatus?.(round, statuses)
if (this.options.failFast) {
// Re-throw so Promise.all rejects immediately and aborts the whole flow
throw new Error(`Reviewer ${reviewer.id} failed in round ${round} (fail-fast): ${err instanceof Error ? err.message : String(err)}`)
}
logger.warn(`Reviewer ${reviewer.id} failed in round ${round}:`, err)
return { reviewer, fullResponse: '', inputText: '', failed: true as const, error: err }
}
@@ -536,18 +565,34 @@ Then on the LAST line, respond with EXACTLY one word: CONVERGED or NOT_CONVERGED
}
this.checkInterrupt()
this.options.onWaiting?.('summarizer')
const finalConclusion = await this.getFinalConclusion()
// Verify the conclusion against the actual PR/code
this.options.onWaiting?.('verifier')
const verifiedConclusion = await this.verifyConclusion(finalConclusion)
let finalConclusion = ''
let verifiedConclusion: string | undefined
if (!this.options.skipConclusion) {
this.options.onWaiting?.('summarizer')
finalConclusion = await this.getFinalConclusion()
}
// End summarizer session before structurization so it gets a clean,
// non-session call. The session context (convergence + conclusion) would
// pollute the JSON extraction and --resume ignores custom system prompts.
this.summarizer.provider.endSession?.()
const parsedIssues = await this.extractIssues()
let parsedIssues = await this.extractIssues()
// Verify+Audit: check each issue against actual code using tools.
// This replaces both the old text-only verifyConclusion and the
// downstream audit step in li-bot.
if (parsedIssues.length > 0) {
this.options.onWaiting?.('verifier')
parsedIssues = await this.verifyIssues(parsedIssues)
}
// Legacy: if conclusion was generated and skipConclusion is false,
// also verify conclusion text (for CLI interactive mode)
if (finalConclusion && !this.options.skipConclusion) {
verifiedConclusion = await this.verifyConclusion(finalConclusion)
}
return {
prNumber: label,
@@ -609,9 +654,32 @@ ${contextSection}${focusSection}${callChainSection}Here is the analysis:
${this.analysis}
You are [${currentReviewerId}]. Review EVERY changed file and EVERY changed function/block — do not skip any.
For each change, check: correctness, security, performance, error handling, edge cases, maintainability.
If you reviewed a file and found no issues, say so briefly. Do not stop early.${this.langSuffix}`
You are [${currentReviewerId}]. Review the PR systematically.
For every issue you raise, you MUST include:
1. The specific \`file:line\` — only lines inside diff hunks (lines outside hunks are wasted, GitHub can't anchor them)
2. A quote of the offending code (1-3 lines max)
3. The concrete failure scenario — what input or state triggers it, what happens, what the user/system experiences as a result
4. A self-assessed severity (use these definitions exactly):
- critical = data corruption, security hole, guaranteed crash on common input
- high = will trigger under realistic conditions, observable user-facing breakage
- medium = edge case with plausible trigger, missing error handling
- low = code quality, minor concern
- nitpick = style-only preference (won't be posted)
DO NOT REPORT:
- Build script / CI polish (LD_PATH ordering, include order, dead asserts in build helpers, etc.)
- Missing comments / docstrings unless load-bearing for correctness
- "Forward-compat risk" / "if someone later adds X" without a concrete trigger
- Dead code unless it carries real risk
- Style preferences (naming, formatting, brace style)
- Issues outside the diff hunk unless severity >= high
- Theoretically-correct-but-impossible cases (e.g., int64 * byte_width overflow on 64-bit systems)
If a file has nothing meaningful wrong, skip it. Do NOT produce filler.
Brevity is a feature — 5 well-evidenced issues > 20 weak ones.
Use \`gh pr diff\` and Read/Grep to verify your claims before reporting.${this.langSuffix}`
return [{ role: 'user', content: prompt }]
}
@@ -654,21 +722,20 @@ If you reviewed a file and found no issues, say so briefly. Do not stop early.${
return [{
role: 'user',
content: `You are [${currentReviewerId}]. Here's what others said in the previous round:\n\n${newContent}\n\nDo three things:\n1. Continue your own exhaustive review — are there changed files or functions you haven't covered yet? Cover them now.\n2. Point out what the other reviewers MISSED — which files or changes did they skip or gloss over?\n3. Respond to their points — agree where valid, challenge where you disagree.${this.langSuffix}`
content: `You are [${currentReviewerId}]. Here's what others said in the previous round:\n\n${newContent}\n\nDo this:\n1. If the others' findings are correct and you have nothing substantive to add, say "I agree with [reviewer]'s findings, no additional issues." That is a fine outcome — do not pad.\n2. If you disagree with any of their claims, challenge with code evidence — quote the line that disproves their concern.\n3. ONLY add new issues if they are concrete (file:line + code quote + failure scenario) AND genuinely missed by the others. Do not manufacture issues to look productive — padding hurts review quality.${this.langSuffix}`
}]
}
// Non-session mode: full context with all previous rounds
const debateContext = `You are [${currentReviewerId}] in a code review debate with [${otherReviewerIds.join('], [')}].
Your shared goal: find ALL real issues in the code — leave nothing uncovered.
Your shared goal: converge on the real issues — quality over quantity.
IMPORTANT:
- You are [${currentReviewerId}], the other reviewer${otherReviewerIds.length > 1 ? 's are' : ' is'} [${otherReviewerIds.join('], [')}]
- Continue your own exhaustive review — cover any changed files or functions you haven't addressed yet
- Point out what others MISSED — which files or changes did they skip or gloss over?
- Challenge weak arguments - don't agree just to be polite
- Acknowledge good points and build on them
- If you disagree, explain why with evidence`
- If the others' findings are correct and you have nothing substantive to add, say "I agree with [reviewer]'s findings, no additional issues." That is a fine outcome — do not pad.
- If you disagree with any claim, challenge with code evidence — quote the line that disproves the concern.
- ONLY add new issues if they are concrete (file:line + code quote + failure scenario) AND genuinely missed by the others. Do not manufacture issues to look productive — padding hurts review quality.
- Acknowledge good points and build on them.`
let prompt = `Task: ${this.taskPrompt}
@@ -770,11 +837,11 @@ Output ONLY a JSON block (no other text):
"issues": [
{
"severity": "critical|high|medium|low|nitpick",
"category": "security|performance|error-handling|style|correctness|architecture",
"category": "correctness|security|performance|concurrency|resource-leak|error-handling|build|testing|documentation|architecture|compatibility|style",
"file": "path/to/file",
"line": 42,
"title": "One-line summary",
"description": "Detailed markdown explanation (see rules below)",
"description": "Concise explanation for GitHub PR comment (see rules below)",
"suggestedFix": "Brief one-line fix summary",
"raisedBy": ["reviewer-id-1", "reviewer-id-2"]
}
@@ -784,11 +851,18 @@ Output ONLY a JSON block (no other text):
Rules:
- Include every issue mentioned by any reviewer
- The "description" field will be posted as a GitHub PR comment. Make it comprehensive markdown covering: (1) What the problem is, (2) Why it matters (impact/risk), (3) The original problematic code quoted in a code block, (4) The suggested fix shown as code, (5) Why the fix is correct
- "description" field — write this as if you were a senior engineer leaving an inline PR comment. Must capture: (1) WHAT — the problem with a brief code quote (1-3 lines) anchored at line, (2) WHY — what makes this a bug / what assumption is broken / what invariant is violated (this is critical — audit will judge against this), (3) FAILURE — concrete scenario that triggers it and what the user/system experiences, (4) FIX — suggested fix if non-obvious. 1-3 sentences total, no boilerplate headers, no severity labels, no "raised by [X]" metadata. Plain prose only.
- "category" MUST be one of the 12 values listed above. Choose the closest match, do not invent new categories.
- If multiple reviewers mention the same issue, list all their IDs in raisedBy
- Use the exact reviewer IDs: ${reviewerIds}
- If a file path or line number is mentioned, include it; otherwise omit the field
- Severity: critical = blocks merge, high = should fix, medium = worth fixing, low = minor, nitpick = style only${changedFilesConstraint}${this.options.language ? `\n- Write the "title", "description", and "suggestedFix" fields in ${this.options.language}. Keep JSON keys and severity/category values in English.` : ''}`
- "line" field: REQUIRED for every issue. If the reviewer's text doesn't pin a specific line but anchors to a function or block, look at the diff hunk and pick the most representative line yourself. If you genuinely cannot anchor an issue to any line in the diff hunk, DROP that issue (don't emit it). Issues without lines cannot be posted as inline comments and waste reader attention.
- Severity — use the rubric exactly. Do NOT bias systematically low or high:
critical = data corruption, security hole, guaranteed crash on common input
high = will trigger under realistic conditions, observable user-facing breakage
medium = edge case with plausible trigger, missing error handling
low = code quality, minor concern
nitpick = style-only preference
If the reviewer's reasoning supports a higher severity, use the higher one.${changedFilesConstraint}${this.options.language ? `\n- Write the "title", "description", and "suggestedFix" fields in ${this.options.language}. Keep JSON keys and severity/category values in English.` : ''}`
const systemPrompt = 'You extract structured issues from code review text. Output only valid JSON.'
const chatOpts = { disableTools: true }
@@ -898,4 +972,190 @@ Then provide your **Verified Final Conclusion** that:
this.trackTokens('summarizer', prompt + (systemPrompt || ''), response)
return response
}
/**
* Audit (omniscient final judge): for every reviewer-flagged issue, verify against
* actual code (Read/Grep/Glob + `gh pr diff`); rewrite weak descriptions; drop false
* positives; add issues reviewers missed (especially cross-file pattern repetition).
* Returns the post-audit issue list.
*/
private async verifyIssues(issues: MergedIssue[]): Promise<MergedIssue[]> {
const issuesText = issues.map((iss, i) =>
`### Issue ${i} [severity: ${iss.severity}] [category: ${iss.category}]\nfile: ${iss.file}${iss.line ? `:${iss.line}` : ''}\ntitle: ${iss.title}\ndescription: ${iss.description}${iss.suggestedFix ? `\nsuggestedFix: ${iss.suggestedFix}` : ''}`
).join('\n\n')
// Optional repo-specific conventions file at ~/.magpie/house-rules/<owner>_<repo>.md.
// Parse owner/repo from the PR URL embedded in taskPrompt.
let houseRules = ''
try {
const { readFileSync, existsSync } = await import('fs')
const { join } = await import('path')
const { homedir } = await import('os')
const repoMatch = this.taskPrompt.match(/github\.com\/([^/\s]+)\/([^/\s]+)\/pull\//)
if (repoMatch) {
const owner = repoMatch[1]
const repo = repoMatch[2]
const hrPath = join(homedir(), '.magpie', 'house-rules', `${owner}_${repo}.md`)
if (existsSync(hrPath)) {
houseRules = readFileSync(hrPath, 'utf-8').trim()
logger.info(`Audit using house-rules from ${hrPath}`)
}
}
} catch { /* no house rules — that's fine */ }
const prompt = `${this.taskPrompt}
You have access to Read, Grep, Glob, and Bash. Run \`gh pr diff\` (the URL is in the task above) to see the actual changes, then Read the touched files. **Read the code before judging — never guess.**
## Issues raised by reviewers
${issuesText}
${houseRules ? `\n## Repository conventions (MUST respect — these override reviewer claims)\n\n${houseRules}\n` : ''}
## Your job
### Task 1: Verify each issue above
For every numbered issue, decide a verdict:
- **keep** — issue is real and the description is fine as-is. You may adjust severity.
- **rewrite** — issue is real but the description is weak (machine-sounding, missing evidence, vague, or includes decoration). Write a clean replacement.
- **drop** — false positive. Must give a \`reason\` (one of):
* \`codebase-convention\` — violates repo idiom (e.g. AssertInfo throws, doesn't abort; assert in writer_c.cpp is invariant not input validation)
* \`pre-existing\` — not introduced by this PR and unrelated to PR touch
* \`theoretically-correct-but-impossible\` — true in theory but real-world impossible (e.g. int64*byte_width overflow on 64-bit)
* \`style-out-of-scope\` — pure style, PR doesn't touch that concern
* \`false-claim\` — reviewer misread the code
For every keep/rewrite you MUST include \`evidence\` quoting the actual code you Read (file:line + the line itself).
### Task 2: Find issues reviewers MISSED
After verifying, scan the diff yourself:
a) **Coverage** — did reviewers skip files or functions in the diff? Read what they didn't.
b) **Cross-file pattern repetition** — for every kept/rewritten issue, grep the entire diff for the same pattern in other files. New occurrence = new issue.
c) **Architecture** — does this fix break an abstraction, introduce coupling, violate a pattern visible elsewhere?
d) **Orthogonal interactions** — grep callers/consumers of touched interfaces; flag any module that should be updated together.
New issues use \`verdict: "new"\`. Same evidence rules apply.
## Output JSON (only this — no narrative, no preamble)
\`\`\`json
{
"verifiedIssues": [
{
"verdict": "keep" | "rewrite" | "drop" | "new",
"originalIndex": 0,
"file": "internal/...",
"line": 42,
"severity": "critical" | "high" | "medium" | "low" | "nitpick",
"category": "correctness",
"body": "Plain prose, 1-3 sentences.",
"evidence": "at file.cpp:118 saw \`if (!p) goto cleanup\` — confirms ...",
"reason": "codebase-convention"
}
]
}
\`\`\`
## Hard rules
- For verdict=keep/rewrite/drop: \`originalIndex\` is REQUIRED (references the issue number above).
- For verdict=keep/rewrite/new: \`file\` + \`line\` + \`severity\` + \`category\` + \`body\` + \`evidence\` are REQUIRED. \`line\` MUST be inside a diff hunk — run \`gh pr diff\` and verify.
- For verdict=drop: \`reason\` is REQUIRED. Other fields ignored.
- For verdict=keep: \`body\` may be omitted (signals "original description is fine"). If you set it, that replaces the original.
- \`body\` must be plain prose. NO emoji decorations, NO \`[meta]\` tags, NO "Severity: X" labels, NO "raised by Y" suffix. Write like a senior engineer leaves an inline comment.
- No evidence = no issue. Don't ship anything you didn't verify with code reads.
- If repository conventions above conflict with a reviewer claim, conventions win.
- Every issue must appear in \`verifiedIssues\` (every keep/rewrite/drop + any new).${this.langSuffix}`
const messages: Message[] = [{ role: 'user', content: prompt }]
const systemPrompt = this.withLang(this.auditor.systemPrompt)
try {
const response = await this.auditor.provider.chat(messages, systemPrompt)
this.trackTokens('verifier', prompt + (systemPrompt || ''), response)
// Parse the audit result
const jsonMatch = response.match(/```json\s*([\s\S]*?)\s*```/)
const jsonStr = jsonMatch?.[1] || response
const match = jsonStr.match(/\{[\s\S]*"verifiedIssues"\s*:\s*\[[\s\S]*?\]\s*\}/)
if (!match) {
logger.warn('Audit returned unparseable output; keeping original issues')
return issues
}
const parsed = JSON.parse(match[0])
if (!Array.isArray(parsed.verifiedIssues)) {
logger.warn('Audit verifiedIssues field is not an array; keeping originals')
return issues
}
type V = {
verdict: 'keep' | 'rewrite' | 'drop' | 'new'
originalIndex?: number
file?: string
line?: number
severity?: MergedIssue['severity']
category?: string
body?: string
evidence?: string
reason?: string
}
const result: MergedIssue[] = []
const droppedOrigIdx = new Set<number>()
let dropCount = 0, rewriteCount = 0, newCount = 0
for (const v of parsed.verifiedIssues as V[]) {
if (v.verdict === 'drop') {
if (typeof v.originalIndex === 'number') droppedOrigIdx.add(v.originalIndex)
dropCount++
continue
}
if (v.verdict === 'new') {
if (!v.file || typeof v.line !== 'number' || !v.body || !v.evidence) continue
result.push({
severity: (v.severity || 'low'),
category: v.category || 'general',
file: v.file,
line: v.line,
title: v.body.split(/[.!?\n]/)[0].slice(0, 100),
description: v.body,
raisedBy: ['auditor'],
descriptions: [v.body],
verdict: 'new',
body: v.body,
evidence: v.evidence
})
newCount++
continue
}
// keep or rewrite
if (typeof v.originalIndex !== 'number' || v.originalIndex < 0 || v.originalIndex >= issues.length) {
continue
}
const orig = issues[v.originalIndex]
if (droppedOrigIdx.has(v.originalIndex)) continue // already dropped, skip duplicate
const merged: MergedIssue = {
...orig,
severity: v.severity || orig.severity,
file: v.file || orig.file,
line: typeof v.line === 'number' ? v.line : orig.line,
verdict: v.verdict,
body: v.body, // undefined for keep-no-change is fine
evidence: v.evidence
}
if (v.verdict === 'rewrite') rewriteCount++
result.push(merged)
}
logger.info(`Audit: ${result.length - newCount} kept/rewritten (${rewriteCount} rewrites), ${dropCount} dropped, ${newCount} new`)
return result
} catch (err) {
logger.warn('Audit failed; returning original issues:', err)
return issues
}
}
}
+8
View File
@@ -56,6 +56,8 @@ export interface OrchestratorOptions {
onPostAnalysisQA?: () => Promise<{ target: string; question: string } | null>
onContextGathered?: (context: GatheredContext) => void // Context gathering complete callback
interruptState?: { interrupted: boolean } // External interrupt signal (e.g., Ctrl+C)
skipConclusion?: boolean // Skip getFinalConclusion + old verifyConclusion (bot mode)
failFast?: boolean // Abort the entire flow as soon as any reviewer (or context gatherer) fails
}
/** Structured issue from a reviewer */
@@ -83,4 +85,10 @@ export interface ReviewerOutput {
export interface MergedIssue extends ReviewIssue {
raisedBy: string[] // reviewer IDs who found this issue
descriptions: string[] // each reviewer's description
// Populated by the audit stage (verifyIssues). Absent if audit didn't run.
verdict?: 'keep' | 'rewrite' | 'drop' | 'new'
body?: string // Audit-authored post text (replaces description for posting). Plain prose.
evidence?: string // Audit's cited code reference (file:line + quote)
auditReason?: string // For verdict=drop: drop reason category
}
+43 -6
View File
@@ -4,6 +4,12 @@ import { CliSessionHelper } from './session-helper.js'
import { preparePromptForCli } from '../utils/prompt-file.js'
import { withRetry } from '../utils/retry.js'
// Tools magpie reviewers are pre-approved to use without prompting.
// Read-only file/code access plus the specific Bash commands needed
// to inspect PRs (gh), git history, and search (rg). General Bash,
// Edit, and Write are intentionally NOT included.
const ALLOWED_TOOLS = 'Read,Grep,Glob,Bash(gh:*),Bash(git:*),Bash(rg:*)'
export class ClaudeCodeProvider implements AIProvider {
name = 'claude-code'
private cwd: string
@@ -37,17 +43,29 @@ export class ClaudeCodeProvider implements AIProvider {
const prompt = this.session.shouldSendFullHistory()
? this.session.buildPrompt(messages, systemPrompt)
: this.session.buildPromptLastOnly(messages)
try {
const result = await withRetry(() => this.runClaude(prompt, systemPrompt, options))
this.session.markMessageSent()
return result
} catch (err) {
this.session.start(this.session.sessionName)
throw err
}
}
async *chatStream(messages: Message[], systemPrompt?: string): AsyncGenerator<string, void, unknown> {
const prompt = this.session.shouldSendFullHistory()
? this.session.buildPrompt(messages, systemPrompt)
: this.session.buildPromptLastOnly(messages)
try {
yield* this.runClaudeStream(prompt, systemPrompt)
this.session.markMessageSent()
} catch (err) {
// Reset to a fresh session ID so the next round doesn't try to --resume
// or --session-id a dead/stuck session
this.session.start(this.session.sessionName)
throw err
}
}
// Spawn env: clear CLAUDECODE to avoid nested session detection when run from Claude Code
@@ -62,8 +80,7 @@ export class ClaudeCodeProvider implements AIProvider {
return new Promise((resolve, reject) => {
// Build args based on session state
// Use --dangerously-skip-permissions to allow network access (e.g., gh commands)
const args = ['-p', '-', '--dangerously-skip-permissions']
const args = ['-p', '-', '--effort', 'xhigh', '--allowed-tools', ALLOWED_TOOLS]
if (this.cliModel) {
args.push('--model', this.cliModel)
}
@@ -125,9 +142,11 @@ export class ClaudeCodeProvider implements AIProvider {
private async *runClaudeStream(prompt: string, systemPrompt?: string): AsyncGenerator<string, void, unknown> {
const { prompt: stdinPrompt, cleanup } = preparePromptForCli(prompt)
// Build args based on session state
// Use --dangerously-skip-permissions to allow network access (e.g., gh commands)
const args = ['-p', '-', '--dangerously-skip-permissions']
// Build args based on session state.
// Use --output-format stream-json --verbose so that tool activity (Read, Bash, etc.)
// produces stdout events, preventing the inactivity timeout from killing Claude
// while it's actively investigating code.
const args = ['-p', '-', '--allowed-tools', ALLOWED_TOOLS, '--effort', 'xhigh', '--output-format', 'stream-json', '--verbose']
if (this.cliModel) {
args.push('--model', this.cliModel)
}
@@ -153,6 +172,7 @@ export class ClaudeCodeProvider implements AIProvider {
let done = false
let error: Error | null = null
let lastActivity = Date.now()
let lineBuf = ''
// Timeout checker - kill if no activity for too long
const timeoutChecker = this.timeout > 0 ? setInterval(() => {
@@ -173,13 +193,30 @@ export class ClaudeCodeProvider implements AIProvider {
child.stdout.on('data', (data) => {
lastActivity = Date.now()
const chunk = data.toString()
// Parse stream-json: each line is a JSON event.
// Every event (tool_use, tool_result, assistant, etc.) updates lastActivity.
// We only yield the final result text to the caller.
lineBuf += data.toString()
let idx
while ((idx = lineBuf.indexOf('\n')) !== -1) {
const line = lineBuf.slice(0, idx).trim()
lineBuf = lineBuf.slice(idx + 1)
if (!line) continue
try {
const event = JSON.parse(line)
if (event.type === 'result' && typeof event.result === 'string') {
const chunk = event.result
if (resolveNext) {
resolveNext({ chunk })
resolveNext = null
} else {
chunks.push(chunk)
}
}
} catch {
// Not valid JSON, ignore
}
}
})
let stderrOutput = ''
+18 -1
View File
@@ -41,21 +41,38 @@ export class CodexCliProvider implements AIProvider {
const prompt = this.sessionEnabled && !this.session.shouldSendFullHistory()
? this.session.buildPromptLastOnly(messages)
: this.session.buildPrompt(messages, systemPrompt)
try {
const result = await withRetry(() => this.runCodex(prompt))
this.session.markMessageSent()
return result
} catch (err) {
this.startSession(this.session.sessionName)
throw err
}
}
async *chatStream(messages: Message[], systemPrompt?: string): AsyncGenerator<string, void, unknown> {
const prompt = this.sessionEnabled && !this.session.shouldSendFullHistory()
? this.session.buildPromptLastOnly(messages)
: this.session.buildPrompt(messages, systemPrompt)
try {
yield* this.runCodexStream(prompt)
this.session.markMessageSent()
} catch (err) {
this.startSession(this.session.sessionName)
throw err
}
}
private buildArgs(): string[] {
const baseArgs = ['--json', '--dangerously-bypass-approvals-and-sandbox']
// workspace-write (not read-only) because codex's read-only sandbox
// also blocks network, which breaks `gh pr diff` for reviewers.
const baseArgs = [
'--json',
'--sandbox', 'workspace-write',
'-c', 'approval_policy="never"',
'-c', 'sandbox_workspace_write.network_access=true',
]
if (this.cliModel) {
baseArgs.push('--model', this.cliModel)
}
+53 -2
View File
@@ -7,6 +7,7 @@ import { ClaudeCodeProvider } from './claude-code.js'
import { CodexCliProvider } from './codex-cli.js'
import { GeminiCliProvider } from './gemini-cli.js'
import { GeminiProvider } from './gemini.js'
import { OpencodeCliProvider } from './opencode-cli.js'
import { QwenCodeProvider } from './qwen-code.js'
import { MiniMaxProvider } from './minimax.js'
import { MockProvider } from './mock.js'
@@ -14,9 +15,20 @@ import { checkCliBinary } from './cli-check.js'
// Parse CLI model string: 'gemini-cli:gemini-2.5-pro' → { provider: 'gemini-cli', cliModel: 'gemini-2.5-pro' }
// Plain 'gemini-cli' → { provider: 'gemini-cli', cliModel: undefined }
const CLI_PROVIDERS = ['claude-code', 'codex-cli', 'gemini-cli', 'qwen-code'] as const
const CLI_PROVIDERS = ['claude-code', 'codex-cli', 'gemini-cli', 'opencode-cli', 'qwen-code'] as const
type CliProviderName = typeof CLI_PROVIDERS[number]
const OPENROUTER_PREFIX = 'openrouter/'
const DEFAULT_OPENROUTER_BASE_URL = 'https://openrouter.ai/api/v1'
// OpenRouter model IDs look like 'openrouter/<vendor>/<model>',
// e.g. 'openrouter/anthropic/claude-3.5-sonnet'. The prefix routes to
// the OpenAI client (OpenRouter is OpenAI-compatible); the rest is the
// model ID the OpenRouter API expects.
function stripOpenRouterPrefix(model: string): string {
return model.slice(OPENROUTER_PREFIX.length)
}
export function parseCliModel(model: string): { provider: string; cliModel?: string } {
for (const cli of CLI_PROVIDERS) {
if (model === cli) {
@@ -29,7 +41,16 @@ export function parseCliModel(model: string): { provider: string; cliModel?: str
return { provider: model }
}
export function getProviderForModel(model: string): 'anthropic' | 'openai' | 'google' | 'claude-code' | 'codex-cli' | 'gemini-cli' | 'qwen-code' | 'minimax' | 'mock' {
/** Check if a model string maps to a CLI-based provider (has tool access / can read files) */
export function isCliModel(model: string): boolean {
const { provider } = parseCliModel(model)
return (CLI_PROVIDERS as readonly string[]).includes(provider)
}
export function getProviderForModel(model: string): 'anthropic' | 'openai' | 'google' | 'claude-code' | 'codex-cli' | 'gemini-cli' | 'opencode-cli' | 'qwen-code' | 'minimax' | 'mock' | 'openrouter' {
if (model.startsWith(OPENROUTER_PREFIX)) {
return 'openrouter'
}
const { provider } = parseCliModel(model)
if ((CLI_PROVIDERS as readonly string[]).includes(provider)) {
return provider as CliProviderName
@@ -86,11 +107,41 @@ export function createProvider(model: string, config: MagpieConfig): AIProvider
return new QwenCodeProvider({ cliModel })
}
// OpenCode CLI is the one CLI provider that needs upstream API keys —
// it routes to OpenRouter (or another provider) for the actual model call.
// We forward whatever keys magpie already has configured.
if (providerName === 'opencode-cli') {
checkCliBinary('opencode', 'OpenCode')
return new OpencodeCliProvider({ cliModel, config })
}
// Mock provider for debug mode — no API key needed
if (providerName === 'mock') {
return new MockProvider()
}
// OpenRouter is OpenAI-compatible: route through the OpenAI client,
// strip the 'openrouter/' prefix from the model, and point at OpenRouter's API.
if (providerName === 'openrouter') {
const openRouterModel = stripOpenRouterPrefix(model).trim()
if (!openRouterModel) {
throw new Error(`Invalid OpenRouter model "${model}": must include a model ID after "${OPENROUTER_PREFIX}" (e.g. "openrouter/anthropic/claude-3.5-sonnet").`)
}
const providerConfig = config.providers['openrouter']
const apiKey = providerConfig?.api_key || process.env.OPENROUTER_API_KEY || ''
if (!apiKey) {
throw new Error('OpenRouter API key is required. Set OPENROUTER_API_KEY env var or providers.openrouter.api_key in config.')
}
// NOTE: the returned provider's `.name` will be 'openai', not 'openrouter',
// because OpenRouter requests are dispatched through the OpenAI client.
// Logs/UI keyed on provider name will show 'openai' for OpenRouter traffic.
return new OpenAIProvider({
apiKey,
model: openRouterModel,
baseURL: providerConfig?.base_url || DEFAULT_OPENROUTER_BASE_URL,
})
}
// MiniMax uses API key from config or env
if (providerName === 'minimax') {
const providerConfig = config.providers['minimax']
+18 -3
View File
@@ -41,17 +41,27 @@ export class GeminiCliProvider implements AIProvider {
const prompt = this.sessionEnabled && !this.session.shouldSendFullHistory()
? this.session.buildPromptLastOnly(messages)
: this.session.buildPrompt(messages, systemPrompt)
try {
const result = await withRetry(() => this.runGemini(prompt))
this.session.markMessageSent()
return result
} catch (err) {
this.startSession(this.session.sessionName)
throw err
}
}
async *chatStream(messages: Message[], systemPrompt?: string): AsyncGenerator<string, void, unknown> {
const prompt = this.sessionEnabled && !this.session.shouldSendFullHistory()
? this.session.buildPromptLastOnly(messages)
: this.session.buildPrompt(messages, systemPrompt)
try {
yield* this.runGeminiStream(prompt)
this.session.markMessageSent()
} catch (err) {
this.startSession(this.session.sessionName)
throw err
}
}
private runGemini(prompt: string): Promise<string> {
@@ -135,6 +145,7 @@ export class GeminiCliProvider implements AIProvider {
let error: Error | null = null
let lastActivity = Date.now()
let lineBuf = '' // Buffer for NDJSON line parsing
let stderrBuf = ''
// Timeout checker - kill if no activity for too long
const timeoutChecker = this.timeout > 0 ? setInterval(() => {
@@ -146,7 +157,8 @@ export class GeminiCliProvider implements AIProvider {
}, 5000)
forceKill.unref()
done = true
error = new Error(`Gemini CLI timed out after ${this.timeout / 1000}s of inactivity`)
const stderr = stderrBuf.trim()
error = new Error(`Gemini CLI timed out after ${this.timeout / 1000}s of inactivity${stderr ? ': ' + stderr.slice(-500) : ''}`)
if (resolveNext) {
resolveNext({ chunk: null })
}
@@ -186,8 +198,10 @@ export class GeminiCliProvider implements AIProvider {
}
})
child.stderr.on('data', (_data) => {
child.stderr.on('data', (data) => {
lastActivity = Date.now() // Activity on stderr also counts
stderrBuf += data.toString()
if (stderrBuf.length > 10000) stderrBuf = stderrBuf.slice(-10000)
})
child.on('close', (code) => {
@@ -208,7 +222,8 @@ export class GeminiCliProvider implements AIProvider {
}
done = true
if (code !== 0 && !error) {
error = new Error(`Gemini CLI exited with code ${code}`)
const stderr = stderrBuf.trim()
error = new Error(`Gemini CLI exited with code ${code}${stderr ? ': ' + stderr.slice(-500) : ''}`)
}
if (resolveNext) {
resolveNext({ chunk: null })
+355
View File
@@ -0,0 +1,355 @@
import { spawn } from 'child_process'
import type { AIProvider, Message, CliProviderOptions, ChatOptions } from './types.js'
import type { MagpieConfig } from '../config/types.js'
import { CliSessionHelper } from './session-helper.js'
import { preparePromptForCli } from '../utils/prompt-file.js'
import { withRetry } from '../utils/retry.js'
// Read-only tool allowlist for opencode reviewers, mirroring claude-code's
// ALLOWED_TOOLS. Injected via the OPENCODE_CONFIG_CONTENT env var so we don't
// touch the user's own opencode.json. With --dangerously-skip-permissions,
// explicit "deny" entries still block — unspecified categories auto-allow,
// which keeps us forward-compatible with new opencode tools.
//
// IMPORTANT — bash rule order: opencode applies the LAST matching pattern,
// not the most specific one. The catch-all `'*': 'deny'` MUST come first,
// followed by the specific allows, or every gh/git/rg call gets denied and
// opencode drops the bash tool from the model's available tool list entirely.
const PERMISSION_CONFIG = JSON.stringify({
$schema: 'https://opencode.ai/config.json',
permission: {
read: 'allow',
grep: 'allow',
glob: 'allow',
list: 'allow',
todowrite: 'allow',
edit: 'deny',
task: 'deny',
webfetch: 'deny',
websearch: 'deny',
// Large prompts (>100KB) are materialized to a file via preparePromptForCli
// and we pass tmpDir: this.cwd so that file lives inside --dir <cwd>.
// That keeps external_directory denied: a prompt injection cannot trick
// the reviewer into reading ~/.ssh, /etc/passwd, or anything else outside
// the repo.
external_directory: 'deny',
bash: {
'*': 'deny',
'gh *': 'allow',
'git *': 'allow',
'rg *': 'allow',
},
},
})
// Magpie provider key → opencode env var. Forwarded so the user only needs
// to configure each key once (in magpie's config) rather than also exporting
// it to opencode's environment.
const API_KEY_FORWARDS: Array<{ env: string; providerKey: 'openrouter' | 'anthropic' | 'openai' | 'google' }> = [
{ env: 'OPENROUTER_API_KEY', providerKey: 'openrouter' },
{ env: 'ANTHROPIC_API_KEY', providerKey: 'anthropic' },
{ env: 'OPENAI_API_KEY', providerKey: 'openai' },
{ env: 'GOOGLE_API_KEY', providerKey: 'google' },
]
export interface OpencodeCliProviderOptions extends CliProviderOptions {
/** MagpieConfig is needed so we can forward API keys to opencode's env. */
config?: MagpieConfig
}
export class OpencodeCliProvider implements AIProvider {
name = 'opencode-cli'
private cwd: string
private timeout: number // ms, 0 = no timeout
private cliModel?: string
private config?: MagpieConfig
private session = new CliSessionHelper()
// Like codex-cli: opencode generates its own session id and returns it in
// the first response's event stream. We never pre-generate one — that
// would risk telling opencode to "continue" a session it has never seen.
private sessionEnabled = false
get sessionId() { return this.session.sessionId }
constructor(options?: OpencodeCliProviderOptions) {
this.cwd = process.cwd()
this.timeout = 15 * 60 * 1000 // 15 minutes
this.cliModel = options?.cliModel
this.config = options?.config
}
setCwd(cwd: string) {
this.cwd = cwd
}
startSession(name?: string): void {
this.sessionEnabled = true
this.session.start(name)
this.session.sessionId = undefined // Captured from the first response, not pre-generated
}
endSession(): void {
this.sessionEnabled = false
this.session.end()
}
async chat(messages: Message[], systemPrompt?: string, _options?: ChatOptions): Promise<string> {
const prompt = this.session.shouldSendFullHistory()
? this.session.buildPrompt(messages, systemPrompt)
: this.session.buildPromptLastOnly(messages)
try {
const result = await withRetry(() => this.runOpencode(prompt))
this.session.markMessageSent()
return result
} catch (err) {
if (this.sessionEnabled) this.startSession(this.session.sessionName)
throw err
}
}
async *chatStream(messages: Message[], systemPrompt?: string): AsyncGenerator<string, void, unknown> {
const prompt = this.session.shouldSendFullHistory()
? this.session.buildPrompt(messages, systemPrompt)
: this.session.buildPromptLastOnly(messages)
try {
yield* this.runOpencodeStream(prompt)
this.session.markMessageSent()
} catch (err) {
if (this.sessionEnabled) this.startSession(this.session.sessionName)
throw err
}
}
private spawnEnv(): NodeJS.ProcessEnv {
const env: NodeJS.ProcessEnv = { ...process.env, OPENCODE_CONFIG_CONTENT: PERMISSION_CONFIG }
if (this.config) {
for (const { env: envKey, providerKey } of API_KEY_FORWARDS) {
const pc = this.config.providers[providerKey] as { api_key?: string } | undefined
if (pc?.api_key) {
env[envKey] = pc.api_key
}
}
}
return env
}
private buildArgs(): string[] {
// opencode run reads stdin and concatenates with positional args, so we
// can deliver the prompt via stdin like the other CLI providers.
// --dangerously-skip-permissions auto-allows unspecified categories;
// explicit "deny" entries in PERMISSION_CONFIG still block.
const args = [
'run',
'--format', 'json',
'--dir', this.cwd,
'--dangerously-skip-permissions',
]
if (this.cliModel) {
args.push('-m', this.cliModel)
}
// Pass the captured session id on follow-up turns. We never use
// --continue (which resumes opencode's globally-last session and would
// race when multiple magpie reviewers run concurrently), and we never
// pass an unseen id on the first turn (opencode generates the id).
if (this.sessionEnabled && this.session.sessionId && !this.session.isFirstMessage) {
args.push('--session', this.session.sessionId)
}
return args
}
// Event schema (verified against opencode 1.15.11):
// {type:"step_start", sessionID:"ses_...", part:{...}}
// {type:"text", sessionID:"ses_...", part:{type:"text", text:"..."}}
// Each model turn emits one consolidated `text` event — no streaming deltas.
// Tool-use events are ignored for text extraction.
private extractEventText(event: unknown): string {
if (!event || typeof event !== 'object') return ''
const e = event as { type?: unknown; sessionID?: unknown; part?: { type?: unknown; text?: unknown } }
if (this.sessionEnabled && !this.session.sessionId && typeof e.sessionID === 'string') {
this.session.sessionId = e.sessionID
}
if (e.type === 'text' && e.part?.type === 'text' && typeof e.part.text === 'string') {
return e.part.text
}
return ''
}
private parseJsonOutput(output: string): string {
let text = ''
for (const line of output.split('\n')) {
const trimmed = line.trim()
if (!trimmed) continue
try {
text += this.extractEventText(JSON.parse(trimmed))
} catch {
// not JSON — ignore
}
}
return text
}
private runOpencode(prompt: string): Promise<string> {
// Write the spilled prompt file inside --dir <cwd> so the read tool can
// reach it; external_directory: 'deny' would otherwise block /tmp paths.
const { prompt: stdinPrompt, cleanup } = preparePromptForCli(prompt, { tmpDir: this.cwd })
return new Promise((resolve, reject) => {
const args = this.buildArgs()
const child = spawn('opencode', args, {
cwd: this.cwd,
stdio: ['pipe', 'pipe', 'pipe'],
env: this.spawnEnv(),
})
let output = ''
let error = ''
child.stdout.on('data', (data) => {
output += data.toString()
})
child.stderr.on('data', (data) => {
error += data.toString()
})
child.on('close', (code) => {
cleanup()
if (code !== 0) {
reject(new Error(`OpenCode CLI exited with code ${code}: ${error}`))
} else {
resolve(this.parseJsonOutput(output).trim())
}
})
child.on('error', (err) => {
cleanup()
reject(new Error(`Failed to run opencode CLI: ${err.message}`))
})
child.stdin.on('error', () => {})
child.stdin.write(stdinPrompt)
child.stdin.end()
})
}
private async *runOpencodeStream(prompt: string): AsyncGenerator<string, void, unknown> {
// Write the spilled prompt file inside --dir <cwd> so the read tool can
// reach it; external_directory: 'deny' would otherwise block /tmp paths.
const { prompt: stdinPrompt, cleanup } = preparePromptForCli(prompt, { tmpDir: this.cwd })
const args = this.buildArgs()
const child = spawn('opencode', args, {
cwd: this.cwd,
stdio: ['pipe', 'pipe', 'pipe'],
env: this.spawnEnv(),
})
const chunks: string[] = []
let resolveNext: ((value: { chunk: string | null }) => void) | null = null
let done = false
let error: Error | null = null
let lastActivity = Date.now()
let lineBuf = ''
let stderrOutput = ''
const timeoutChecker = this.timeout > 0 ? setInterval(() => {
if (Date.now() - lastActivity > this.timeout) {
child.kill('SIGTERM')
const forceKill = setTimeout(() => {
try { child.kill('SIGKILL') } catch {}
}, 5000)
forceKill.unref()
done = true
error = new Error(`OpenCode CLI timed out after ${this.timeout / 1000}s of inactivity`)
if (resolveNext) {
resolveNext({ chunk: null })
}
}
}, 10000) : null
const pushChunk = (chunk: string) => {
if (!chunk) return
if (resolveNext) {
resolveNext({ chunk })
resolveNext = null
} else {
chunks.push(chunk)
}
}
child.stdout.on('data', (data) => {
lastActivity = Date.now()
lineBuf += data.toString()
let idx
while ((idx = lineBuf.indexOf('\n')) !== -1) {
const line = lineBuf.slice(0, idx).trim()
lineBuf = lineBuf.slice(idx + 1)
if (!line) continue
try {
const event = JSON.parse(line) as Record<string, unknown>
const piece = this.extractEventText(event)
if (piece) pushChunk(piece)
} catch {
// Not JSON, ignore
}
}
})
child.stderr.on('data', (data) => {
lastActivity = Date.now()
stderrOutput += data.toString()
})
child.on('close', (code) => {
cleanup()
if (timeoutChecker) clearInterval(timeoutChecker)
if (lineBuf.trim()) {
try {
const event = JSON.parse(lineBuf.trim()) as Record<string, unknown>
const piece = this.extractEventText(event)
if (piece) pushChunk(piece)
} catch {}
}
done = true
if (code !== 0 && !error) {
error = new Error(`OpenCode CLI exited with code ${code}${stderrOutput ? ': ' + stderrOutput.slice(0, 500) : ''}`)
}
if (resolveNext) {
resolveNext({ chunk: null })
}
})
child.on('error', (err) => {
cleanup()
if (timeoutChecker) clearInterval(timeoutChecker)
done = true
error = new Error(`Failed to run opencode CLI: ${err.message}`)
if (resolveNext) {
resolveNext({ chunk: null })
}
})
child.stdin.on('error', () => {})
child.stdin.write(stdinPrompt)
child.stdin.end()
while (!done || chunks.length > 0) {
if (chunks.length > 0) {
yield chunks.shift()!
} else if (!done) {
const result = await new Promise<{ chunk: string | null }>((resolve) => {
resolveNext = resolve
})
if (result.chunk !== null) {
yield result.chunk
}
}
}
if (error) {
throw error
}
}
}
+10
View File
@@ -37,17 +37,27 @@ export class QwenCodeProvider implements AIProvider {
const prompt = this.session.shouldSendFullHistory()
? this.session.buildPrompt(messages, systemPrompt)
: this.session.buildPromptLastOnly(messages)
try {
const result = await withRetry(() => this.runQwen(prompt, systemPrompt, options))
this.session.markMessageSent()
return result
} catch (err) {
this.session.start(this.session.sessionName)
throw err
}
}
async *chatStream(messages: Message[], systemPrompt?: string): AsyncGenerator<string, void, unknown> {
const prompt = this.session.shouldSendFullHistory()
? this.session.buildPrompt(messages, systemPrompt)
: this.session.buildPromptLastOnly(messages)
try {
yield* this.runQwenStream(prompt, systemPrompt)
this.session.markMessageSent()
} catch (err) {
this.session.start(this.session.sessionName)
throw err
}
}
private runQwen(prompt: string, systemPrompt?: string, options?: ChatOptions): Promise<string> {
+13 -2
View File
@@ -23,14 +23,25 @@ export interface PreparedPrompt {
cleanup: () => void
}
export function preparePromptForCli(prompt: string): PreparedPrompt {
export interface PreparePromptOptions {
/**
* Directory to materialize the prompt file in when it exceeds the size
* threshold. Defaults to os.tmpdir(). Override when the consuming CLI
* cannot read outside a specific root — e.g. opencode-cli denies reads
* outside its --dir, so the prompt file must live inside the repo.
*/
tmpDir?: string
}
export function preparePromptForCli(prompt: string, options?: PreparePromptOptions): PreparedPrompt {
if (Buffer.byteLength(prompt, 'utf-8') <= PROMPT_SIZE_THRESHOLD) {
return { prompt, cleanup: () => {} }
}
registerExitHandler()
const tmpFile = join(tmpdir(), `magpie_prompt_${Date.now()}_${Math.random().toString(36).slice(2)}.txt`)
const dir = options?.tmpDir ?? tmpdir()
const tmpFile = join(dir, `magpie_prompt_${Date.now()}_${Math.random().toString(36).slice(2)}.txt`)
writeFileSync(tmpFile, prompt, 'utf-8')
activeTempFiles.add(tmpFile)
+32 -1
View File
@@ -102,7 +102,7 @@ describe('deduplicateIssues', () => {
})
describe('parseFocusAreas', () => {
it('should extract focus areas from analysis', () => {
it('should extract focus areas from English analysis', () => {
const analysis = `## What this PR does
Some analysis here.
@@ -114,6 +114,37 @@ Some analysis here.
const focus = parseFocusAreas(analysis)
expect(focus).toHaveLength(3)
expect(focus[0]).toContain('Security')
expect(focus[1]).toContain('Performance')
expect(focus[2]).toContain('Error handling')
})
it('should extract focus areas from Chinese analysis', () => {
const analysis = `## 这个 PR 做了什么
一些分析内容。
## 建议的 review 重点
- 安全性:登录处理函数的输入校验
- 性能:新增的数据库查询
- 错误处理:异步路径缺少 try/catch`
const focus = parseFocusAreas(analysis)
expect(focus).toHaveLength(3)
expect(focus[0]).toContain('安全性')
expect(focus[1]).toContain('性能')
expect(focus[2]).toContain('错误处理')
})
it('should support bold-heading variant with Chinese title', () => {
const analysis = `**建议的 review 重点**
1. src/auth.ts 的鉴权改动
2. 新增的并发逻辑
其他段落...`
const focus = parseFocusAreas(analysis)
expect(focus).toHaveLength(2)
expect(focus[0]).toContain('src/auth.ts')
expect(focus[1]).toContain('并发')
})
it('should return empty array if no focus section', () => {
@@ -63,4 +63,26 @@ describe('DebateOrchestrator resilience', () => {
await expect(orchestrator.runStreaming('test', 'Review this code'))
.rejects.toThrow('All reviewers failed')
})
it('should abort immediately when failFast is enabled and any reviewer fails', async () => {
const goodProvider = makeProvider('good', 'LGTM, no issues found.')
const badProvider = makeFailingProvider('bad')
const reviewers = [
makeReviewer('good-reviewer', goodProvider),
makeReviewer('bad-reviewer', badProvider),
]
const summarizer = makeReviewer('summarizer', makeProvider('sum', 'Final conclusion.'))
const analyzer = makeReviewer('analyzer', makeProvider('analyzer', 'Analysis done.'))
const orchestrator = new DebateOrchestrator(reviewers, summarizer, analyzer, {
maxRounds: 1,
interactive: false,
checkConvergence: false,
failFast: true,
})
await expect(orchestrator.runStreaming('test', 'Review this code'))
.rejects.toThrow(/bad-reviewer.*fail-fast/)
})
})
-1
View File
@@ -50,7 +50,6 @@ describe('DebateOrchestrator', () => {
expect(result.prNumber).toBe('123')
expect(result.analysis).toBe('PR analysis result')
expect(result.messages.length).toBe(4) // 2 reviewers * 2 rounds
expect(result.summaries.length).toBe(2)
expect(result.finalConclusion).toBe('Final conclusion')
})
+49 -1
View File
@@ -1,9 +1,13 @@
// tests/providers/factory.test.ts
import { describe, it, expect } from 'vitest'
import { describe, it, expect, vi, afterEach } from 'vitest'
import { createProvider, getProviderForModel } from '../../src/providers/factory.js'
import type { MagpieConfig } from '../../src/config/types.js'
describe('Provider Factory', () => {
afterEach(() => {
vi.unstubAllEnvs()
})
const mockConfig: MagpieConfig = {
providers: {
anthropic: { api_key: 'ant-key' },
@@ -39,6 +43,17 @@ describe('Provider Factory', () => {
it('should return codex-cli for codex-cli model', () => {
expect(getProviderForModel('codex-cli')).toBe('codex-cli')
})
it('should return opencode-cli for opencode-cli model (with and without :model suffix)', () => {
expect(getProviderForModel('opencode-cli')).toBe('opencode-cli')
expect(getProviderForModel('opencode-cli:openrouter/anthropic/claude-sonnet-4')).toBe('opencode-cli')
})
it('should return openrouter for openrouter/ prefixed models', () => {
expect(getProviderForModel('openrouter/anthropic/claude-3.5-sonnet')).toBe('openrouter')
expect(getProviderForModel('openrouter/meta-llama/llama-3-70b-instruct')).toBe('openrouter')
expect(getProviderForModel('openrouter/openai/gpt-4o')).toBe('openrouter')
})
})
describe('createProvider', () => {
@@ -76,6 +91,16 @@ describe('Provider Factory', () => {
expect(provider.name).toBe('codex-cli')
})
it('should create opencode-cli provider with no extra config', () => {
const provider = createProvider('opencode-cli', mockConfig)
expect(provider.name).toBe('opencode-cli')
})
it('should create opencode-cli provider with a model suffix', () => {
const provider = createProvider('opencode-cli:openrouter/anthropic/claude-sonnet-4', mockConfig)
expect(provider.name).toBe('opencode-cli')
})
it('should pass base_url through to API providers', () => {
const configWithBaseUrl: MagpieConfig = {
...mockConfig,
@@ -95,5 +120,28 @@ describe('Provider Factory', () => {
const provider = createProvider('claude-sonnet-4-20250514', mockConfig)
expect(provider.name).toBe('anthropic')
})
it('should create openrouter provider (via openai client) with api key from config', () => {
const configWithOpenrouter: MagpieConfig = {
...mockConfig,
providers: { ...mockConfig.providers, openrouter: { api_key: 'or-key' } }
}
const provider = createProvider('openrouter/anthropic/claude-3.5-sonnet', configWithOpenrouter)
// OpenRouter is routed through the OpenAI client, so .name === 'openai'
expect(provider.name).toBe('openai')
})
it('should pick up OPENROUTER_API_KEY env var when config is absent', () => {
vi.stubEnv('OPENROUTER_API_KEY', 'env-or-key')
const provider = createProvider('openrouter/anthropic/claude-3.5-sonnet', mockConfig)
expect(provider.name).toBe('openai')
})
it('should throw when OpenRouter has no api key configured', () => {
vi.stubEnv('OPENROUTER_API_KEY', '')
expect(() =>
createProvider('openrouter/anthropic/claude-3.5-sonnet', mockConfig)
).toThrow(/OpenRouter API key/)
})
})
})
+53 -1
View File
@@ -1,15 +1,21 @@
import { describe, it, expect, vi } from 'vitest'
import { OpenAIProvider } from '../../src/providers/openai'
import { createProvider } from '../../src/providers/factory'
import type { MagpieConfig } from '../../src/config/types'
let lastConstructorOptions: Record<string, unknown> = {}
let lastCreateOptions: Record<string, unknown> = {}
vi.mock('openai', () => ({
default: class MockOpenAI {
chat = {
completions: {
create: vi.fn().mockResolvedValue({
create: vi.fn().mockImplementation((opts: Record<string, unknown>) => {
lastCreateOptions = opts
return Promise.resolve({
choices: [{ message: { content: 'Mock response' } }]
})
})
}
}
constructor(options: Record<string, unknown>) {
@@ -40,3 +46,49 @@ describe('OpenAIProvider', () => {
expect(lastConstructorOptions.baseURL).toBeUndefined()
})
})
describe('OpenRouter via OpenAI client', () => {
const baseConfig: MagpieConfig = {
providers: {},
defaults: { max_rounds: 3, output_format: 'markdown' },
reviewers: {},
summarizer: { model: 'openrouter/anthropic/claude-3.5-sonnet', prompt: '' },
analyzer: { model: 'openrouter/anthropic/claude-3.5-sonnet', prompt: '' }
}
it('strips the openrouter/ prefix from the model and defaults baseURL to OpenRouter', async () => {
const config: MagpieConfig = {
...baseConfig,
providers: { openrouter: { api_key: 'or-key' } }
}
const provider = createProvider('openrouter/anthropic/claude-3.5-sonnet', config)
expect(lastConstructorOptions.apiKey).toBe('or-key')
expect(lastConstructorOptions.baseURL).toBe('https://openrouter.ai/api/v1')
// Invoke chat() so the stripped model reaches chat.completions.create
await provider.chat([{ role: 'user', content: 'hi' }])
expect(lastCreateOptions.model).toBe('anthropic/claude-3.5-sonnet')
})
it('honors a custom base_url from config and forwards the stripped model', async () => {
const config: MagpieConfig = {
...baseConfig,
providers: {
openrouter: { api_key: 'or-key', base_url: 'https://my-openrouter-proxy.example.com/v1' }
}
}
const provider = createProvider('openrouter/meta-llama/llama-3-70b-instruct', config)
expect(lastConstructorOptions.baseURL).toBe('https://my-openrouter-proxy.example.com/v1')
await provider.chat([{ role: 'user', content: 'hi' }])
expect(lastCreateOptions.model).toBe('meta-llama/llama-3-70b-instruct')
})
it('throws when the model is just "openrouter/" with no ID after it', () => {
const config: MagpieConfig = {
...baseConfig,
providers: { openrouter: { api_key: 'or-key' } }
}
expect(() => createProvider('openrouter/', config)).toThrow(/must include a model ID/)
})
})
+109
View File
@@ -0,0 +1,109 @@
// Verifies the JSON event parser against captured opencode 1.15.11 output.
// The schema is internal to opencode; if it changes, these tests fail loudly
// rather than the provider silently returning empty reviewer responses.
import { describe, it, expect } from 'vitest'
import { OpencodeCliProvider } from '../../src/providers/opencode-cli.js'
// Real captures from `opencode run --format json -m openrouter/openai/gpt-4o-mini`.
const STEP_START_EVENT = '{"type":"step_start","timestamp":1780089625130,"sessionID":"ses_abc","part":{"id":"prt_1","messageID":"msg_1","sessionID":"ses_abc","type":"step-start"}}'
const TEXT_EVENT = '{"type":"text","timestamp":1780089625396,"sessionID":"ses_abc","part":{"id":"prt_2","messageID":"msg_1","sessionID":"ses_abc","type":"text","text":"ok","time":{"start":1780089625131,"end":1780089625393}}}'
// Access private parser methods. They're pure logic and worth testing directly;
// extracting them into a separate module just for visibility would be churn.
type ParserHandle = {
extractEventText(event: unknown): string
parseJsonOutput(output: string): string
}
function asParser(p: OpencodeCliProvider): ParserHandle {
return p as unknown as ParserHandle
}
describe('OpencodeCliProvider parser', () => {
describe('extractEventText', () => {
it('returns the text from a text-part event', () => {
const parser = asParser(new OpencodeCliProvider())
expect(parser.extractEventText(JSON.parse(TEXT_EVENT))).toBe('ok')
})
it('returns empty for a step_start event', () => {
const parser = asParser(new OpencodeCliProvider())
expect(parser.extractEventText(JSON.parse(STEP_START_EVENT))).toBe('')
})
it('returns empty for an unknown event type', () => {
const parser = asParser(new OpencodeCliProvider())
expect(parser.extractEventText({ type: 'tool.use', sessionID: 'ses_abc', tool: 'read' })).toBe('')
})
it('returns empty for non-object inputs', () => {
const parser = asParser(new OpencodeCliProvider())
expect(parser.extractEventText(null)).toBe('')
expect(parser.extractEventText('text')).toBe('')
expect(parser.extractEventText(42)).toBe('')
})
})
describe('parseJsonOutput', () => {
it('concatenates text across multiple events, ignoring others', () => {
const provider = new OpencodeCliProvider()
const output = [
STEP_START_EVENT,
'{"type":"text","sessionID":"ses_abc","part":{"type":"text","text":"hello "}}',
'{"type":"tool.use","sessionID":"ses_abc"}',
'{"type":"text","sessionID":"ses_abc","part":{"type":"text","text":"world"}}',
].join('\n')
expect(asParser(provider).parseJsonOutput(output)).toBe('hello world')
})
it('skips blank lines and malformed JSON', () => {
const provider = new OpencodeCliProvider()
const output = [
'',
'not valid json',
TEXT_EVENT,
'{ partial',
'',
].join('\n')
expect(asParser(provider).parseJsonOutput(output)).toBe('ok')
})
it('returns empty when no text events are present', () => {
const provider = new OpencodeCliProvider()
expect(asParser(provider).parseJsonOutput(STEP_START_EVENT)).toBe('')
})
})
describe('session id capture', () => {
it('does not capture sessionID when sessions are disabled', () => {
const provider = new OpencodeCliProvider()
asParser(provider).parseJsonOutput(TEXT_EVENT)
expect(provider.sessionId).toBeUndefined()
})
it('captures sessionID from the first event after startSession', () => {
const provider = new OpencodeCliProvider()
provider.startSession('reviewer-1')
expect(provider.sessionId).toBeUndefined() // not pre-generated
asParser(provider).parseJsonOutput(STEP_START_EVENT)
expect(provider.sessionId).toBe('ses_abc')
})
it('does not overwrite a captured sessionID with a later event', () => {
const provider = new OpencodeCliProvider()
provider.startSession('reviewer-1')
asParser(provider).parseJsonOutput(STEP_START_EVENT)
const laterEvent = '{"type":"text","sessionID":"ses_different","part":{"type":"text","text":"x"}}'
asParser(provider).parseJsonOutput(laterEvent)
expect(provider.sessionId).toBe('ses_abc')
})
it('clears sessionID on endSession', () => {
const provider = new OpencodeCliProvider()
provider.startSession('reviewer-1')
asParser(provider).parseJsonOutput(TEXT_EVENT)
expect(provider.sessionId).toBe('ses_abc')
provider.endSession()
expect(provider.sessionId).toBeUndefined()
})
})
})
+22 -1
View File
@@ -1,5 +1,7 @@
import { describe, it, expect } from 'vitest'
import { existsSync } from 'fs'
import { existsSync, mkdtempSync, rmSync } from 'fs'
import { tmpdir } from 'os'
import { join, dirname } from 'path'
import { preparePromptForCli } from '../../src/utils/prompt-file.js'
describe('preparePromptForCli', () => {
@@ -23,4 +25,23 @@ describe('preparePromptForCli', () => {
result.cleanup()
expect(existsSync(tmpPath)).toBe(false)
})
it('writes the spilled prompt into the supplied tmpDir', () => {
const customDir = mkdtempSync(join(tmpdir(), 'magpie-tmpdir-test-'))
try {
const largePrompt = 'y'.repeat(200 * 1024)
const result = preparePromptForCli(largePrompt, { tmpDir: customDir })
const pathMatch = result.prompt.match(/\/.*magpie_prompt_\S+/)
expect(pathMatch).toBeTruthy()
const tmpPath = pathMatch![0]
expect(dirname(tmpPath)).toBe(customDir)
expect(existsSync(tmpPath)).toBe(true)
result.cleanup()
expect(existsSync(tmpPath)).toBe(false)
} finally {
rmSync(customDir, { recursive: true, force: true })
}
})
})