Guides
Human-in-the-Loop

Human-in-the-Loop Guide

Human-in-the-Loop (HITL) ensures that high-stakes AI agent decisions are reviewed by human operators before execution. This guide covers the approval flow, confidence scoring, timeout behavior, escalation, and notification channels.

How HITL Works

  Agent produces output
         |
         v
  Confidence scoring
         |
    +----+----+
    |         |
  >= 0.85   < 0.85
    |         |
    v         v
  Auto-     Pause for
  approve   human review
              |
         +----+----+
         |         |
       Approve   Reject
         |         |
         v         v
       Continue  Retry or
       pipeline  terminate
  1. An agent task completes and produces output.
  2. A meta-model evaluates the output against the prompt and assigns a confidence score (0.0 to 1.0).
  3. If confidence >= threshold, the task is auto-approved and the pipeline continues.
  4. If confidence < threshold, the task is paused and a notification is sent to human reviewers.
  5. A reviewer can approve (optionally with edits), reject (optionally with retry), or let it timeout.

Confidence Scoring

The confidence score reflects the meta-model's assessment of output quality across four dimensions:

DimensionWeightWhat It Measures
Factual Consistency30%Are claims internally consistent and supported?
Prompt Adherence25%Does the output address all parts of the prompt?
Completeness25%Is the response thorough or does it miss aspects?
Formatting20%Does the output match the requested format?

Score Interpretation

RangeMeaningTypical Action
0.95 - 1.00Very high confidenceAuto-approve
0.85 - 0.94High confidenceAuto-approve (default threshold)
0.70 - 0.84Moderate confidenceRoute to reviewer
0.50 - 0.69Low confidenceRoute to senior reviewer
0.00 - 0.49Very low confidenceRoute to senior reviewer with priority flag

Confidence scoring is a triage signal, not a correctness guarantee. A high confidence score means the output appears well-structured and prompt-adherent, not that it is factually correct.

Configuration

Task-Level HITL

const task = await client.createTask({
  prompt: 'Analyze this contract for liability exposure...',
  model: 'opus',
  hitl: {
    required: true,                // Always require review
    confidenceThreshold: 0.90,     // Override default threshold
    timeoutMs: 7200_000,           // 2 hours before escalation
  },
});

Plan-Level HITL

const execution = await client.executePlan(planId, {
  input: { contract: contractText },
  hitl: {
    requiredFor: ['analyze-risk', 'generate-report'],
    confidenceThreshold: 0.85,
    timeoutMs: 3600_000,
    notifyChannels: ['email', 'dashboard', 'slack'],
    escalation: {
      afterMs: 1800_000,          // Escalate after 30 minutes
      to: 'senior-legal-team',    // Escalation target
    },
  },
});

Notification Channels

When a task requires approval, notifications are sent through configured channels:

ChannelConfigurationLatency
DashboardAlways enabledReal-time (WebSocket)
EmailUser's registered email1-2 minutes
SlackWebhook URL in org settingsReal-time
WebhookCustom URLReal-time

Slack Integration

# Configure Slack webhook in org settings
devteam config set slack-webhook https://hooks.slack.com/services/T.../B.../xxx
 
# Or per deployment
devteam templates deploy contract-review-v1 \
  --input contract=@./contract.pdf \
  --hitl analyze-risk \
  --hitl-notify slack,email

Custom Webhook

const execution = await client.executePlan(planId, {
  hitl: {
    requiredFor: ['critical-step'],
    notifyChannels: ['webhook'],
    webhookUrl: 'https://your-app.com/api/devteam/approvals',
    webhookSecret: process.env.WEBHOOK_SECRET,
  },
});

Webhook payload:

{
  "event": "hitl.pending",
  "taskId": "dt_task_abc123",
  "planExecutionId": "dt_exec_xyz789",
  "stepId": "critical-step",
  "confidence": 0.72,
  "summary": "Risk analysis identified 3 high-severity issues in sections 4.2, 5.1, and 8.3",
  "output": "...",
  "reviewUrl": "https://devteam.marsala.dev/approvals/dt_task_abc123",
  "timestamp": "2026-02-20T10:00:00Z"
}

Timeout and Escalation

If a task is not reviewed within the configured timeout:

  1. Default behavior: The task remains paused indefinitely (no timeout).
  2. With timeout: After timeoutMs, the task is auto-rejected or escalated.
  3. With escalation: After escalation.afterMs, the task is reassigned to the escalation target.
  Task awaiting approval
         |
    timeout (2h)
         |
    +----+----+
    |         |
  No escal. Escalation configured
    |         |
    v         v
  Auto-     Reassign to
  reject    escalation target
              |
         timeout (4h)
              |
              v
          Auto-reject

Escalation Chain

hitl: {
  requiredFor: ['analyze-risk'],
  confidenceThreshold: 0.85,
  escalation: {
    afterMs: 1800_000,      // 30 min: escalate to team lead
    to: 'legal-team-lead',
    secondary: {
      afterMs: 3600_000,    // 1 hr: escalate to department head
      to: 'legal-dept-head',
    },
    finalTimeout: 7200_000,  // 2 hr: auto-reject
    finalAction: 'reject',   // 'reject' | 'approve' | 'skip'
  },
},

Reviewer Assignment

Automatic Assignment

Tasks are assigned based on:

  1. Reviewer availability (not at capacity)
  2. Tag-based routing (legal tasks to legal reviewers)
  3. Queue-based routing (specific reviewers per queue)
  4. Round-robin within a reviewer group

Manual Assignment

// Assign a specific reviewer when creating the task
const task = await client.createTask({
  prompt: '...',
  hitl: {
    required: true,
    assignTo: 'reviewer_jane',
  },
});

Approval with Modifications

Reviewers can approve with edits to the agent's output:

await client.approveTask('dt_task_abc123', {
  comment: 'Good analysis but Section 4.2 risk was understated.',
  modifiedOutput: 'Corrected output with Section 4.2 risk elevated to HIGH...',
  reviewerId: 'reviewer_jane',
});

The modified output replaces the agent's original output in the pipeline. Downstream steps receive the reviewer's version.

Rejection with Retry

When a reviewer rejects a task, they can optionally trigger a retry with additional guidance:

await client.rejectTask('dt_task_abc123', {
  reason: 'Missed indemnification clause in Section 7.',
  retry: true,
  retryPrompt: `Re-analyze the contract with special attention to:
    1. Section 7 indemnification terms
    2. Liability caps in Section 5.2
    Previous analysis missed critical exposure.`,
  reviewerId: 'reviewer_jane',
});

The task is re-queued with the enhanced prompt and the rejection feedback appended as context.

Dashboard Integration

The DevTeam Dashboard provides a dedicated approval interface:

  • Real-time queue of pending approvals
  • Side-by-side view of prompt and output
  • Inline editing for modifications
  • One-click approve/reject with optional comments
  • Confidence score visualization
  • Approval history and audit trail

Audit Trail

All HITL actions are logged for compliance:

const history = await client.getApprovalHistory('dt_task_abc123');
// [
//   { action: 'pending', timestamp: '...', confidence: 0.72 },
//   { action: 'escalated', timestamp: '...', from: 'reviewer_bob', to: 'reviewer_jane' },
//   { action: 'approved', timestamp: '...', reviewer: 'reviewer_jane', comment: '...' },
// ]

Next Steps