Human-in-the-Loop Guide
Human-in-the-Loop (HITL) ensures that high-stakes AI agent decisions are reviewed by human operators before execution. This guide covers the approval flow, confidence scoring, timeout behavior, escalation, and notification channels.
How HITL Works
Agent produces output
|
v
Confidence scoring
|
+----+----+
| |
>= 0.85 < 0.85
| |
v v
Auto- Pause for
approve human review
|
+----+----+
| |
Approve Reject
| |
v v
Continue Retry or
pipeline terminate- An agent task completes and produces output.
- A meta-model evaluates the output against the prompt and assigns a confidence score (0.0 to 1.0).
- If confidence >= threshold, the task is auto-approved and the pipeline continues.
- If confidence < threshold, the task is paused and a notification is sent to human reviewers.
- A reviewer can approve (optionally with edits), reject (optionally with retry), or let it timeout.
Confidence Scoring
The confidence score reflects the meta-model's assessment of output quality across four dimensions:
| Dimension | Weight | What It Measures |
|---|---|---|
| Factual Consistency | 30% | Are claims internally consistent and supported? |
| Prompt Adherence | 25% | Does the output address all parts of the prompt? |
| Completeness | 25% | Is the response thorough or does it miss aspects? |
| Formatting | 20% | Does the output match the requested format? |
Score Interpretation
| Range | Meaning | Typical Action |
|---|---|---|
| 0.95 - 1.00 | Very high confidence | Auto-approve |
| 0.85 - 0.94 | High confidence | Auto-approve (default threshold) |
| 0.70 - 0.84 | Moderate confidence | Route to reviewer |
| 0.50 - 0.69 | Low confidence | Route to senior reviewer |
| 0.00 - 0.49 | Very low confidence | Route to senior reviewer with priority flag |
Confidence scoring is a triage signal, not a correctness guarantee. A high confidence score means the output appears well-structured and prompt-adherent, not that it is factually correct.
Configuration
Task-Level HITL
const task = await client.createTask({
prompt: 'Analyze this contract for liability exposure...',
model: 'opus',
hitl: {
required: true, // Always require review
confidenceThreshold: 0.90, // Override default threshold
timeoutMs: 7200_000, // 2 hours before escalation
},
});Plan-Level HITL
const execution = await client.executePlan(planId, {
input: { contract: contractText },
hitl: {
requiredFor: ['analyze-risk', 'generate-report'],
confidenceThreshold: 0.85,
timeoutMs: 3600_000,
notifyChannels: ['email', 'dashboard', 'slack'],
escalation: {
afterMs: 1800_000, // Escalate after 30 minutes
to: 'senior-legal-team', // Escalation target
},
},
});Notification Channels
When a task requires approval, notifications are sent through configured channels:
| Channel | Configuration | Latency |
|---|---|---|
| Dashboard | Always enabled | Real-time (WebSocket) |
| User's registered email | 1-2 minutes | |
| Slack | Webhook URL in org settings | Real-time |
| Webhook | Custom URL | Real-time |
Slack Integration
# Configure Slack webhook in org settings
devteam config set slack-webhook https://hooks.slack.com/services/T.../B.../xxx
# Or per deployment
devteam templates deploy contract-review-v1 \
--input contract=@./contract.pdf \
--hitl analyze-risk \
--hitl-notify slack,emailCustom Webhook
const execution = await client.executePlan(planId, {
hitl: {
requiredFor: ['critical-step'],
notifyChannels: ['webhook'],
webhookUrl: 'https://your-app.com/api/devteam/approvals',
webhookSecret: process.env.WEBHOOK_SECRET,
},
});Webhook payload:
{
"event": "hitl.pending",
"taskId": "dt_task_abc123",
"planExecutionId": "dt_exec_xyz789",
"stepId": "critical-step",
"confidence": 0.72,
"summary": "Risk analysis identified 3 high-severity issues in sections 4.2, 5.1, and 8.3",
"output": "...",
"reviewUrl": "https://devteam.marsala.dev/approvals/dt_task_abc123",
"timestamp": "2026-02-20T10:00:00Z"
}Timeout and Escalation
If a task is not reviewed within the configured timeout:
- Default behavior: The task remains paused indefinitely (no timeout).
- With timeout: After
timeoutMs, the task is auto-rejected or escalated. - With escalation: After
escalation.afterMs, the task is reassigned to the escalation target.
Task awaiting approval
|
timeout (2h)
|
+----+----+
| |
No escal. Escalation configured
| |
v v
Auto- Reassign to
reject escalation target
|
timeout (4h)
|
v
Auto-rejectEscalation Chain
hitl: {
requiredFor: ['analyze-risk'],
confidenceThreshold: 0.85,
escalation: {
afterMs: 1800_000, // 30 min: escalate to team lead
to: 'legal-team-lead',
secondary: {
afterMs: 3600_000, // 1 hr: escalate to department head
to: 'legal-dept-head',
},
finalTimeout: 7200_000, // 2 hr: auto-reject
finalAction: 'reject', // 'reject' | 'approve' | 'skip'
},
},Reviewer Assignment
Automatic Assignment
Tasks are assigned based on:
- Reviewer availability (not at capacity)
- Tag-based routing (
legaltasks to legal reviewers) - Queue-based routing (specific reviewers per queue)
- Round-robin within a reviewer group
Manual Assignment
// Assign a specific reviewer when creating the task
const task = await client.createTask({
prompt: '...',
hitl: {
required: true,
assignTo: 'reviewer_jane',
},
});Approval with Modifications
Reviewers can approve with edits to the agent's output:
await client.approveTask('dt_task_abc123', {
comment: 'Good analysis but Section 4.2 risk was understated.',
modifiedOutput: 'Corrected output with Section 4.2 risk elevated to HIGH...',
reviewerId: 'reviewer_jane',
});The modified output replaces the agent's original output in the pipeline. Downstream steps receive the reviewer's version.
Rejection with Retry
When a reviewer rejects a task, they can optionally trigger a retry with additional guidance:
await client.rejectTask('dt_task_abc123', {
reason: 'Missed indemnification clause in Section 7.',
retry: true,
retryPrompt: `Re-analyze the contract with special attention to:
1. Section 7 indemnification terms
2. Liability caps in Section 5.2
Previous analysis missed critical exposure.`,
reviewerId: 'reviewer_jane',
});The task is re-queued with the enhanced prompt and the rejection feedback appended as context.
Dashboard Integration
The DevTeam Dashboard provides a dedicated approval interface:
- Real-time queue of pending approvals
- Side-by-side view of prompt and output
- Inline editing for modifications
- One-click approve/reject with optional comments
- Confidence score visualization
- Approval history and audit trail
Audit Trail
All HITL actions are logged for compliance:
const history = await client.getApprovalHistory('dt_task_abc123');
// [
// { action: 'pending', timestamp: '...', confidence: 0.72 },
// { action: 'escalated', timestamp: '...', from: 'reviewer_bob', to: 'reviewer_jane' },
// { action: 'approved', timestamp: '...', reviewer: 'reviewer_jane', comment: '...' },
// ]Next Steps
- SDK HITL Reference -- API methods for HITL management
- DAG Workflows -- Combining HITL with DAG pipelines