The Knowledge Base system enables semantic search over custom documents using vector embeddings. Upload your own research papers, documentation, or datasets to create a private knowledge base that agents can query during literature search.
-- Enable pgvector extensionCREATE EXTENSION IF NOT EXISTS vector;-- Create documents tableCREATE TABLE IF NOT EXISTS documents ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), title TEXT NOT NULL, content TEXT NOT NULL, metadata JSONB DEFAULT '{}', embedding VECTOR(1024), -- Adjust dimensions based on model created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW());-- Create index for vector similarity searchCREATE INDEX IF NOT EXISTS documents_embedding_idx ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);-- Create full-text search indexCREATE INDEX IF NOT EXISTS documents_content_idx ON documents USING gin(to_tsvector('english', content));-- Vector similarity search functionCREATE OR REPLACE FUNCTION match_documents( query_embedding VECTOR(1024), match_threshold FLOAT DEFAULT 0.5, match_count INT DEFAULT 10)RETURNS TABLE ( id UUID, title TEXT, content TEXT, metadata JSONB, similarity FLOAT)LANGUAGE sqlSTABLEAS $$ SELECT id, title, content, metadata, 1 - (embedding <=> query_embedding) AS similarity FROM documents WHERE 1 - (embedding <=> query_embedding) > match_threshold ORDER BY embedding <=> query_embedding LIMIT match_count;$$;
Knowledge Base is automatically queried during literature searches:
src/agents/literature/knowledge.ts
export async function initKnowledgeBase() { const docsPath = process.env.KNOWLEDGE_DOCS_PATH; if (!docsPath) { logger.info("KNOWLEDGE_DOCS_PATH not set, skipping knowledge base initialization"); return; } const vectorSearch = new VectorSearchWithReranker(); const processor = new DocumentProcessor(); // Process all documents in directory const documents = await processor.processDirectory(docsPath); if (documents.length === 0) { logger.warn("No documents found in KNOWLEDGE_DOCS_PATH"); return; } // Add to vector database await vectorSearch.addDocuments(documents); logger.info(`Knowledge base initialized with ${documents.length} documents`);}
Usage in Chat/Deep Research:
src/routes/chat.ts
// Knowledge base is queried if KNOWLEDGE_DOCS_PATH is configuredif (process.env.KNOWLEDGE_DOCS_PATH) { const knowledgePromise = literatureAgent({ objective: task.objective, type: "KNOWLEDGE", }).then((result) => { if (result.count && result.count > 0) { task.output += `${result.output}\n\n`; } }); literaturePromises.push(knowledgePromise);}
# Set documents pathexport KNOWLEDGE_DOCS_PATH=/path/to/docs# Add documentsmkdir -p /path/to/docscp research_paper.pdf /path/to/docs/cp protocol.md /path/to/docs/# Restart server to reindexbun run dev
-- IVFFlat index for approximate nearest neighbor searchCREATE INDEX documents_embedding_idx ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100); -- Increase for larger datasets
Similarity threshold too high (lower SIMILARITY_THRESHOLD)
No documents indexed (check getStats())
Query too specific (broaden search terms)
Solution:
# Check document countSELECT COUNT(*) FROM documents;# Lower thresholdSIMILARITY_THRESHOLD=0.3
Poor search quality
Possible causes:
Weak embedding model
Reranking disabled
Insufficient vector search candidates
Solution:
# Use better embedding modelEMBEDDING_PROVIDER=voyageTEXT_EMBEDDING_MODEL=voyage-3# Enable rerankingUSE_RERANKING=true# Increase candidates for rerankingVECTOR_SEARCH_LIMIT=50RERANK_FINAL_LIMIT=10
Slow queries
Possible causes:
Missing vector index
Index needs tuning
Cold cache
Solution:
-- Rebuild index with more listsDROP INDEX documents_embedding_idx;CREATE INDEX documents_embedding_idx ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 200);-- Analyze tableANALYZE documents;