Human-in-the-Loop Guide

Human-in-the-Loop (HITL) ensures that high-stakes AI agent decisions are reviewed by human operators before execution. This guide covers the approval flow, confidence scoring, timeout behavior, escalation, and notification channels.

How HITL Works

  Agent produces output
         |
         v
  Confidence scoring
         |
    +----+----+
    |         |
  >= 0.85   < 0.85
    |         |
    v         v
  Auto-     Pause for
  approve   human review
              |
         +----+----+
         |         |
       Approve   Reject
         |         |
         v         v
       Continue  Retry or
       pipeline  terminate

An agent task completes and produces output.
A meta-model evaluates the output against the prompt and assigns a confidence score (0.0 to 1.0).
If confidence >= threshold, the task is auto-approved and the pipeline continues.
If confidence < threshold, the task is paused and a notification is sent to human reviewers.
A reviewer can approve (optionally with edits), reject (optionally with retry), or let it timeout.

Confidence Scoring

The confidence score reflects the meta-model's assessment of output quality across four dimensions:

Dimension	Weight	What It Measures
Factual Consistency	30%	Are claims internally consistent and supported?
Prompt Adherence	25%	Does the output address all parts of the prompt?
Completeness	25%	Is the response thorough or does it miss aspects?
Formatting	20%	Does the output match the requested format?

Score Interpretation

Range	Meaning	Typical Action
0.95 - 1.00	Very high confidence	Auto-approve
0.85 - 0.94	High confidence	Auto-approve (default threshold)
0.70 - 0.84	Moderate confidence	Route to reviewer
0.50 - 0.69	Low confidence	Route to senior reviewer
0.00 - 0.49	Very low confidence	Route to senior reviewer with priority flag

Confidence scoring is a triage signal, not a correctness guarantee. A high confidence score means the output appears well-structured and prompt-adherent, not that it is factually correct.

Configuration

Task-Level HITL

const task = await client.createTask({
  prompt: 'Analyze this contract for liability exposure...',
  model: 'opus',
  hitl: {
    required: true,                // Always require review
    confidenceThreshold: 0.90,     // Override default threshold
    timeoutMs: 7200_000,           // 2 hours before escalation
  },
});

Plan-Level HITL

const execution = await client.executePlan(planId, {
  input: { contract: contractText },
  hitl: {
    requiredFor: ['analyze-risk', 'generate-report'],
    confidenceThreshold: 0.85,
    timeoutMs: 3600_000,
    notifyChannels: ['email', 'dashboard', 'slack'],
    escalation: {
      afterMs: 1800_000,          // Escalate after 30 minutes
      to: 'senior-legal-team',    // Escalation target
    },
  },
});

Notification Channels

When a task requires approval, notifications are sent through configured channels:

Channel	Configuration	Latency
Dashboard	Always enabled	Real-time (WebSocket)
Email	User's registered email	1-2 minutes
Slack	Webhook URL in org settings	Real-time
Webhook	Custom URL	Real-time

Slack Integration

# Configure Slack webhook in org settings
devteam config set slack-webhook https://hooks.slack.com/services/T.../B.../xxx
 
# Or per deployment
devteam templates deploy contract-review-v1 \
  --input contract=@./contract.pdf \
  --hitl analyze-risk \
  --hitl-notify slack,email

Custom Webhook

const execution = await client.executePlan(planId, {
  hitl: {
    requiredFor: ['critical-step'],
    notifyChannels: ['webhook'],
    webhookUrl: 'https://your-app.com/api/devteam/approvals',
    webhookSecret: process.env.WEBHOOK_SECRET,
  },
});

Webhook payload:

{
  "event": "hitl.pending",
  "taskId": "dt_task_abc123",
  "planExecutionId": "dt_exec_xyz789",
  "stepId": "critical-step",
  "confidence": 0.72,
  "summary": "Risk analysis identified 3 high-severity issues in sections 4.2, 5.1, and 8.3",
  "output": "...",
  "reviewUrl": "https://devteam.marsala.dev/approvals/dt_task_abc123",
  "timestamp": "2026-02-20T10:00:00Z"
}

Timeout and Escalation

If a task is not reviewed within the configured timeout:

Default behavior: The task remains paused indefinitely (no timeout).
With timeout: After timeoutMs, the task is auto-rejected or escalated.
With escalation: After escalation.afterMs, the task is reassigned to the escalation target.

  Task awaiting approval
         |
    timeout (2h)
         |
    +----+----+
    |         |
  No escal. Escalation configured
    |         |
    v         v
  Auto-     Reassign to
  reject    escalation target
              |
         timeout (4h)
              |
              v
          Auto-reject

Escalation Chain

hitl: {
  requiredFor: ['analyze-risk'],
  confidenceThreshold: 0.85,
  escalation: {
    afterMs: 1800_000,      // 30 min: escalate to team lead
    to: 'legal-team-lead',
    secondary: {
      afterMs: 3600_000,    // 1 hr: escalate to department head
      to: 'legal-dept-head',
    },
    finalTimeout: 7200_000,  // 2 hr: auto-reject
    finalAction: 'reject',   // 'reject' | 'approve' | 'skip'
  },
},

Reviewer Assignment

Automatic Assignment

Tasks are assigned based on:

Reviewer availability (not at capacity)
Tag-based routing (legal tasks to legal reviewers)
Queue-based routing (specific reviewers per queue)
Round-robin within a reviewer group

Manual Assignment

// Assign a specific reviewer when creating the task
const task = await client.createTask({
  prompt: '...',
  hitl: {
    required: true,
    assignTo: 'reviewer_jane',
  },
});

Approval with Modifications

Reviewers can approve with edits to the agent's output:

await client.approveTask('dt_task_abc123', {
  comment: 'Good analysis but Section 4.2 risk was understated.',
  modifiedOutput: 'Corrected output with Section 4.2 risk elevated to HIGH...',
  reviewerId: 'reviewer_jane',
});

The modified output replaces the agent's original output in the pipeline. Downstream steps receive the reviewer's version.

Rejection with Retry

When a reviewer rejects a task, they can optionally trigger a retry with additional guidance:

await client.rejectTask('dt_task_abc123', {
  reason: 'Missed indemnification clause in Section 7.',
  retry: true,
  retryPrompt: `Re-analyze the contract with special attention to:
    1. Section 7 indemnification terms
    2. Liability caps in Section 5.2
    Previous analysis missed critical exposure.`,
  reviewerId: 'reviewer_jane',
});

The task is re-queued with the enhanced prompt and the rejection feedback appended as context.

Dashboard Integration

The DevTeam Dashboard provides a dedicated approval interface:

Real-time queue of pending approvals
Side-by-side view of prompt and output
Inline editing for modifications
One-click approve/reject with optional comments
Confidence score visualization
Approval history and audit trail

Audit Trail

All HITL actions are logged for compliance:

const history = await client.getApprovalHistory('dt_task_abc123');
// [
//   { action: 'pending', timestamp: '...', confidence: 0.72 },
//   { action: 'escalated', timestamp: '...', from: 'reviewer_bob', to: 'reviewer_jane' },
//   { action: 'approved', timestamp: '...', reviewer: 'reviewer_jane', comment: '...' },
// ]

Next Steps

SDK HITL Reference -- API methods for HITL management
DAG Workflows -- Combining HITL with DAG pipelines

Creating Templates DAG Workflows