RAG Integration
DevTeam Orchestrator integrates with Weaviate vector search to provide Retrieval-Augmented Generation (RAG) capabilities. This allows agent tasks to query a knowledge base before generating responses, improving factual accuracy and domain specificity.
How RAG Works in DevTeam
Task received
|
v
[RAG Step: Search Weaviate]
|
Retrieve relevant documents
|
v
[Agent Step: Generate with context]
|
Model receives: prompt + retrieved docs
|
v
Output with grounded references- A task or plan step includes a RAG configuration.
- Before the model is called, the orchestrator queries Weaviate for semantically similar documents.
- Retrieved documents are injected into the prompt as context.
- The model generates a response grounded in the retrieved information.
Configuration
Weaviate Connection
const client = new DevTeamClient({
apiUrl: 'https://devteam.marsala.dev',
apiKey: process.env.DEVTEAM_API_KEY,
rag: {
provider: 'weaviate',
endpoint: 'http://localhost:8080',
defaultCollection: 'DocChunk',
defaultLimit: 10,
defaultCertainty: 0.75,
},
});Environment Variables
.env
WEAVIATE_URL=http://localhost:8080
WEAVIATE_DEFAULT_COLLECTION=DocChunk
WEAVIATE_CERTAINTY_THRESHOLD=0.75Using RAG in Tasks
Simple RAG Task
const result = await client.createTask({
prompt: 'What are the tax implications of forming an LLC in Delaware?',
model: 'sonnet',
rag: {
collection: 'USIRSTaxRule',
query: 'LLC formation Delaware tax implications',
limit: 5,
certainty: 0.80,
},
});The orchestrator will:
- Search
USIRSTaxRulefor documents matching "LLC formation Delaware tax implications" - Retrieve the top 5 results with certainty >= 0.80
- Prepend them to the prompt as
Context: [retrieved documents] - Send the augmented prompt to the model
RAG in Plan Steps
steps:
- id: research
prompt: |
Based on the following context documents, answer:
{{input.question}}
Context will be provided automatically via RAG.
model: sonnet
rag:
collection: DocChunk
query: "{{input.question}}"
limit: 10
certainty: 0.75
filters:
- path: ["project"]
operator: Equal
valueText: "legal-templates"
- id: synthesize
prompt: |
Using the research findings, provide a comprehensive answer:
Research: {{research.output}}
Question: {{input.question}}
model: opus
dependsOn: [research]Advanced RAG Patterns
Multi-Collection Search
Query multiple collections and merge results:
const result = await client.createTask({
prompt: 'Analyze the legal and tax implications of this corporate structure.',
model: 'opus',
rag: {
searches: [
{
collection: 'USIRSTaxRule',
query: 'corporate structure tax implications',
limit: 5,
},
{
collection: 'USCourtOpinion',
query: 'corporate veil piercing liability',
limit: 3,
},
{
collection: 'SGRegulation',
query: 'Singapore company compliance requirements',
limit: 3,
},
],
mergeStrategy: 'interleave', // 'interleave' | 'sequential' | 'deduplicate'
},
});Hybrid Search
Combine vector similarity with keyword matching:
rag: {
collection: 'DocChunk',
query: 'contract indemnification clause',
limit: 10,
searchType: 'hybrid',
hybridConfig: {
alpha: 0.75, // 0 = pure keyword, 1 = pure vector
},
}Filtered Search
Apply metadata filters to narrow results:
rag: {
collection: 'DocChunk',
query: 'risk assessment methodology',
limit: 10,
filters: [
{ path: ['project'], operator: 'Equal', valueText: 'legal-templates' },
{ path: ['category'], operator: 'Equal', valueText: 'risk-management' },
{ path: ['createdAt'], operator: 'GreaterThan', valueDate: '2025-01-01' },
],
}RAG Pipeline Template
A reusable template for RAG-powered Q&A:
rag-qa-v1.yaml
name: RAG Question Answering
description: Answer questions using vector-searched knowledge base context.
version: "1.0.0"
industry: custom
tags: [rag, qa, knowledge-base]
inputSchema:
type: object
required: [question, collection]
properties:
question:
type: string
collection:
type: string
description: Weaviate collection to search
filters:
type: object
description: Optional metadata filters
steps:
- id: retrieve
prompt: |
Search the knowledge base and return the most relevant context
for answering: {{input.question}}
model: haiku
rag:
collection: "{{input.collection}}"
query: "{{input.question}}"
limit: 10
certainty: 0.75
- id: answer
prompt: |
Answer the following question using ONLY the provided context.
If the context does not contain enough information, say so explicitly.
Question: {{input.question}}
Context:
{{retrieve.output}}
Rules:
1. Cite specific sources from the context
2. Do not fabricate information
3. If uncertain, state your confidence level
model: sonnet
dependsOn: [retrieve]
temperature: 0.3
- id: verify
prompt: |
Verify the answer against the context. Check for:
1. Claims not supported by the context (hallucination)
2. Missing information that the context contains
3. Accuracy of citations
Answer: {{answer.output}}
Context: {{retrieve.output}}
Return JSON: { "verified": true/false, "issues": [...], "confidence": 0.0-1.0 }
model: haiku
dependsOn: [answer, retrieve]Collection Management
List Collections
const collections = await client.rag.listCollections();
collections.forEach((c) => {
console.log(`${c.name}: ${c.objectCount} objects, vectorizer: ${c.vectorizer}`);
});Ingest Documents
await client.rag.ingest({
collection: 'DocChunk',
documents: [
{
content: 'Full text of the document...',
metadata: {
title: 'Contract Template - NDA',
project: 'legal-templates',
category: 'contracts',
source: 'internal',
},
},
],
chunkSize: 500,
chunkOverlap: 50,
});CLI Ingestion
# Ingest a file
devteam rag ingest --collection DocChunk --file ./document.pdf --project legal
# Ingest a directory
devteam rag ingest --collection DocChunk --dir ./documents/ --project legal --recursive
# Search
devteam rag search --collection DocChunk --query "indemnification clause" --limit 5For large-scale ingestion (1000+ documents), use the batch ingestion API which processes documents in parallel and reports progress. See the API Reference for the /rag/ingest/batch endpoint.
Performance Tuning
| Parameter | Recommendation |
|---|---|
limit | Start with 5-10. More context improves accuracy but increases token cost. |
certainty | 0.75 for broad search, 0.85+ for precise matching. |
chunkSize | 300-500 tokens for most use cases. Smaller for Q&A, larger for analysis. |
chunkOverlap | 10-20% of chunk size to maintain context across boundaries. |
alpha (hybrid) | 0.75 for semantic-heavy, 0.25 for keyword-heavy queries. |
Next Steps
- Deployment Guide -- Set up Weaviate in production
- API Reference -- RAG API endpoints