Multi-Agent Orchestration
DevTeam Orchestrator enables distributed multi-agent workflows where different agents run on different machines, use different models, and collaborate through shared task queues and Temporal workflows.
Architecture
+-------------------+
| Orchestrator |
| API + Temporal |
+--------+----------+
|
+-------------+-------------+
| | |
+-------+------+ +---+--------+ +--+----------+
| GPU Worker | | CPU Worker | | CPU Worker |
| RTX 5080 | | Surface | | GR-14 |
| ollama local | | 8GB RAM | | 16GB RAM |
| Queue: gpu | | Queue: cpu | | Queue: gr14 |
+-------+------+ +---+--------+ +--+----------+
| | |
+-------+------+ +---+--------+ +--+----------+
| Models: | | Models: | | Models: |
| - opus | | - sonnet | | - haiku |
| - local LLM | | - haiku | | - sonnet |
| - 70B params | | - fast | | - batch |
+--------------+ +------------+ +-------------+Queue-Based Routing
Route tasks to specific workers based on model requirements and hardware capabilities:
// GPU-intensive task → GPU worker
await client.createTask({
prompt: 'Complex reasoning task requiring large model...',
model: 'opus',
queue: 'gpu-queue',
});
// Fast batch processing → CPU workers
await client.createTask({
prompt: 'Quick classification task...',
model: 'haiku',
queue: 'cpu-queue',
});
// Local model (no API cost) → GPU worker with Ollama
await client.createTask({
prompt: 'Process this with local model...',
model: 'ollama:llama3.3',
queue: 'gpu-queue',
});Queue Configuration
worker:
id: worker-asus-gpu
queues:
- name: gpu-queue
concurrency: 4
models:
- opus
- sonnet
- ollama:*
- name: default
concurrency: 2
models:
- sonnet
- haiku
resources:
gpu: true
gpuModel: RTX 5080
ramGB: 24
cpuCores: 8
heartbeat:
intervalMs: 30000
endpoint: https://devteam.marsala.dev/api/workers/heartbeatWorker Registration
Workers auto-register when they connect:
import { DevTeamWorker } from 'devteam-worker';
const worker = new DevTeamWorker({
apiUrl: 'https://devteam.marsala.dev',
apiKey: process.env.DEVTEAM_API_KEY,
temporalAddress: 'temporal.marsala.dev:7233',
workerId: 'worker-asus-gpu',
queues: ['gpu-queue', 'default'],
concurrency: 4,
capabilities: {
gpu: true,
gpuModel: 'RTX 5080',
localModels: ['llama3.3:70b', 'dolphin3:8b'],
},
});
worker.on('task', async (task) => {
console.log(`Processing task ${task.id} with model ${task.model}`);
});
worker.on('error', (error) => {
console.error('Worker error:', error);
});
await worker.start();
// Worker worker-asus-gpu registered
// Listening on queues: gpu-queue, default (concurrency: 4)Agent Specialization
Assign specialized system prompts and tools to different agents:
const plan = await client.createPlan({
name: 'research-pipeline',
steps: [
{
id: 'researcher',
prompt: 'Research the topic: {{input.topic}}',
model: 'sonnet',
queue: 'research-queue',
systemPrompt: `You are a research analyst with access to web search
and document retrieval tools. Gather comprehensive
information from multiple sources.`,
},
{
id: 'analyst',
prompt: 'Analyze research findings: {{researcher.output}}',
model: 'opus',
queue: 'gpu-queue',
systemPrompt: `You are a senior analyst specializing in quantitative
analysis and pattern recognition. Identify key insights,
trends, and anomalies in the research data.`,
dependsOn: ['researcher'],
},
{
id: 'writer',
prompt: 'Write a report based on analysis: {{analyst.output}}',
model: 'sonnet',
queue: 'default',
systemPrompt: `You are a professional report writer. Create clear,
well-structured reports with executive summaries,
data tables, and actionable recommendations.`,
dependsOn: ['analyst'],
},
],
});Load Balancing
The orchestrator distributes tasks across workers using configurable strategies:
| Strategy | Description | Best For |
|---|---|---|
round-robin | Distribute evenly across workers | Homogeneous workers |
least-loaded | Route to worker with fewest active tasks | Heterogeneous workers |
capability-match | Route based on model/GPU requirements | Mixed GPU/CPU clusters |
locality | Prefer workers close to data source | RAG-heavy workloads |
const client = new DevTeamClient({
loadBalancing: {
strategy: 'capability-match',
preferences: {
'opus': ['gpu-queue'],
'ollama:*': ['gpu-queue'],
'haiku': ['cpu-queue', 'default'],
},
},
});Health Monitoring
Workers report health via heartbeat:
// Check worker status
const workers = await client.getWorkers();
workers.forEach((w) => {
console.log(`${w.id}: ${w.status} (${w.activeTasks}/${w.concurrency} tasks)`);
console.log(` Queues: ${w.queues.join(', ')}`);
console.log(` Last heartbeat: ${w.lastHeartbeat}`);
console.log(` Uptime: ${w.uptimeMs / 3600000}h`);
});Auto-Recovery
If a worker stops sending heartbeats:
- After 60 seconds: Worker marked as
unhealthy - After 120 seconds: Active tasks are re-queued to other workers
- After 300 seconds: Worker deregistered
worker:
heartbeat:
intervalMs: 30000
unhealthyAfterMs: 60000
rebalanceAfterMs: 120000
deregisterAfterMs: 300000When tasks are re-queued after worker failure, they restart from the beginning. Design tasks to be idempotent to prevent duplicate side effects.
Scaling Patterns
Horizontal Scaling
Add more workers to handle increased load:
# Start additional workers on new machines
ssh worker-node-5 'devteam-worker start --queue cpu-queue --concurrency 8'
ssh worker-node-6 'devteam-worker start --queue cpu-queue --concurrency 8'Vertical Scaling
Increase concurrency on existing workers:
devteam-worker config set concurrency 8
devteam-worker restartAuto-Scaling (Kubernetes)
apiVersion: apps/v1
kind: Deployment
metadata:
name: devteam-worker
namespace: workers
spec:
replicas: 3
selector:
matchLabels:
app: devteam-worker
template:
spec:
containers:
- name: worker
image: matwal/devteam-worker:v1
env:
- name: DEVTEAM_API_URL
value: "https://devteam.marsala.dev"
- name: DEVTEAM_QUEUE
value: "cpu-queue"
- name: DEVTEAM_CONCURRENCY
value: "4"
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "4"
memory: "8Gi"
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: devteam-worker-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: devteam-worker
minReplicas: 2
maxReplicas: 10
metrics:
- type: External
external:
metric:
name: devteam_queue_depth
target:
type: AverageValue
averageValue: "5"Next Steps
- RAG Integration -- Connect agents to vector search
- Deployment Guide -- Production deployment patterns