Horizontal Scaling

BioAgents workers can scale horizontally across multiple servers with no coordination required. All workers connect to the same Redis queue and automatically share the workload.

Scaling Architecture

Multi-Server Deployment

Server Roles

API Servers

Handle HTTP/WebSocket requests, enqueue jobs, broadcast notifications

Worker Servers

Process jobs from queue, execute AI workflows, update database

Redis

Central message broker for job queue and pub/sub

Setup Strategy

Deploy Redis - Use managed service (Upstash, ElastiCache) for high availability
Deploy API servers - Scale based on HTTP traffic and WebSocket connections
Deploy workers - Scale based on queue depth and job processing needs

Worker Deployment

Prerequisites

Each worker server needs:

Docker 20.10+
Access to Redis (via REDIS_URL)
Access to Supabase database
LLM API keys (OpenAI, Anthropic, etc.)

Deploy to New Server

Install Docker

curl -fsSL https://get.docker.com | sh

Clone Repository

git clone https://github.com/bio-xyz/bioagents-agentkit.git
cd bioagents-agentkit

Configure Environment

cp .env.worker.example .env
nano .env

Required variables:

# External Redis (shared across all workers)
REDIS_URL=rediss://default:password@your-redis.upstash.io:6379

# Database
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_ANON_KEY=eyJ...

# LLM API Keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AIza...

Start Workers

# Start 2 worker containers
docker-compose -f docker-compose.worker.yml up -d --scale worker=2

# Verify workers are running
docker-compose -f docker-compose.worker.yml ps

Monitor Logs

docker-compose -f docker-compose.worker.yml logs -f

Look for:

redis_publisher_connected
chat_queue_initialized
deep_research_queue_initialized

Worker Configuration

services:
  worker:
    build: .
    command: ["bun", "run", "src/worker.ts"]
    
    environment:
      # Enable queue mode
      - USE_JOB_QUEUE=true
      
      # External Redis
      - REDIS_URL=${REDIS_URL}
      
      # Database
      - SUPABASE_URL=${SUPABASE_URL}
      - SUPABASE_ANON_KEY=${SUPABASE_ANON_KEY}
      
      # LLM API Keys
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - GOOGLE_API_KEY=${GOOGLE_API_KEY}
      
      # Worker concurrency
      - CHAT_QUEUE_CONCURRENCY=${CHAT_QUEUE_CONCURRENCY:-5}
      - DEEP_RESEARCH_QUEUE_CONCURRENCY=${DEEP_RESEARCH_QUEUE_CONCURRENCY:-3}
      
      # Production
      - NODE_ENV=production
    
    restart: unless-stopped
    
    # Allow long-running jobs to complete
    stop_grace_period: 8h
    
    # Resource limits
    deploy:
      resources:
        limits:
          memory: 2G
        reservations:
          memory: 512M

Scaling Strategies

Scale Based on Queue Depth

Monitor queue depth and scale workers accordingly:

# Check waiting jobs
redis-cli -u $REDIS_URL LLEN bull:deep-research:waiting
redis-cli -u $REDIS_URL LLEN bull:chat:waiting

Scaling guidelines:

Queue Depth	Recommended Workers	Response Time
0-10 jobs	2 workers	< 5 minutes
10-30 jobs	4 workers	< 10 minutes
30-50 jobs	6 workers	< 15 minutes
50+ jobs	8+ workers	< 20 minutes

Each deep research worker can handle ~3 concurrent jobs. Each chat worker can handle ~5 concurrent jobs.

Auto-Scaling with Monitoring

Implement auto-scaling based on queue metrics:

import redis
import subprocess
import time

REDIS_URL = "redis://your-redis-host:6379"
MIN_WORKERS = 2
MAX_WORKERS = 10
SCALE_UP_THRESHOLD = 20
SCALE_DOWN_THRESHOLD = 5

def get_queue_depth():
    r = redis.from_url(REDIS_URL)
    chat_waiting = r.llen("bull:chat:waiting")
    research_waiting = r.llen("bull:deep-research:waiting")
    return chat_waiting + research_waiting

def get_current_workers():
    result = subprocess.run(
        ["docker-compose", "-f", "docker-compose.worker.yml", "ps", "-q"],
        capture_output=True,
        text=True
    )
    return len(result.stdout.strip().split("\n"))

def scale_workers(count):
    count = max(MIN_WORKERS, min(MAX_WORKERS, count))
    subprocess.run([
        "docker-compose", "-f", "docker-compose.worker.yml",
        "up", "-d", "--scale", f"worker={count}"
    ])
    print(f"Scaled to {count} workers")

while True:
    depth = get_queue_depth()
    current = get_current_workers()
    
    if depth > SCALE_UP_THRESHOLD:
        scale_workers(current + 2)
    elif depth < SCALE_DOWN_THRESHOLD and current > MIN_WORKERS:
        scale_workers(current - 1)
    
    time.sleep(60)  # Check every minute

Concurrency Tuning

Adjust concurrency per worker based on server resources:

Low Memory (1GB)
Medium Memory (2GB)
High Memory (4GB+)

CHAT_QUEUE_CONCURRENCY=2
DEEP_RESEARCH_QUEUE_CONCURRENCY=1

2 chat jobs + 1 research job = ~1.5GB peak memory
Conservative but reliable

CHAT_QUEUE_CONCURRENCY=5
DEEP_RESEARCH_QUEUE_CONCURRENCY=3

Default configuration
Balanced throughput and stability

CHAT_QUEUE_CONCURRENCY=10
DEEP_RESEARCH_QUEUE_CONCURRENCY=5

Maximum throughput
Requires monitoring to prevent OOM

Multi-Region Deployment

Deploy workers in multiple regions for global coverage:

Multi-region deployments require:

Low-latency Redis (use Upstash Global or regional replicas)
Database replication or read replicas
Careful handling of cross-region network latency

Resource Planning

Worker Server Sizing

Hetzner Cloud
DigitalOcean
AWS EC2

Plan	vCPU	RAM	Workers	Cost/mo
CX22	2	4GB	2	$6
CX32	4	8GB	4	$12
CX42	8	16GB	8	$24
CX52	16	32GB	16	$48

Droplet	vCPU	RAM	Workers	Cost/mo
Basic	2	4GB	2	$24
General Purpose	4	8GB	4	$48
CPU-Optimized	8	16GB	8	$96

Instance	vCPU	RAM	Workers	Cost/mo
t3.medium	2	4GB	2	$30
t3.large	2	8GB	4	$60
c6i.xlarge	4	8GB	4	$122
c6i.2xlarge	8	16GB	8	$244

Cost Optimization

Spot Instances

Use spot instances for burst capacity:

# AWS EC2 Spot
aws ec2 run-instances \
  --instance-type c6i.xlarge \
  --spot-instance-request-type one-time \
  --user-data file://worker-init.sh

Benefits:

60-90% cost savings
Good for non-critical workers

Risks:

Can be terminated with 2-minute notice
Workers should handle graceful shutdown

Reserved Instances

Reserve minimum capacity for predictable workloads:

1-year commitment: ~30% savings
3-year commitment: ~50% savings

Strategy:

Reserve minimum worker capacity (e.g., 2 workers)
Use on-demand/spot for scaling above baseline

Autoscaling

Scale workers based on time of day:

# Cron job: Scale up during business hours
0 9 * * 1-5 /opt/bioagents/scale-workers.sh 8

# Scale down at night
0 18 * * 1-5 /opt/bioagents/scale-workers.sh 2

High Availability

Worker Redundancy

Always run at least 2 workers to prevent single point of failure:

# Minimum HA setup
docker-compose -f docker-compose.worker.yml up -d --scale worker=2

If one worker crashes, the other continues processing jobs. BullMQ automatically reassigns stalled jobs.

Graceful Shutdown

Workers use stop_grace_period: 8h to finish long-running jobs:

services:
  worker:
    stop_grace_period: 8h  # Allow deep research jobs to complete

Shutdown behavior:

Docker sends SIGTERM to worker
Worker stops accepting new jobs
Worker continues processing active jobs
After 8 hours, Docker sends SIGKILL (force stop)

Never use docker-compose down without checking for active jobs. Use Bull Board to verify queue is empty first.

Redis Failover

Use managed Redis with automatic failover:

Upstash
AWS ElastiCache
Redis Sentinel

REDIS_URL=rediss://default:password@your-redis.upstash.io:6379

Features:

Multi-region replication
Automatic failover
TLS encryption
Pay-per-use pricing

REDIS_URL=redis://master.cluster.abc123.use1.cache.amazonaws.com:6379

Features:

Automatic failover with Redis Cluster
Multi-AZ deployment
Automated backups
CloudWatch monitoring

REDIS_URL=redis://sentinel-1:26379,sentinel-2:26379,sentinel-3:26379

Features:

Self-hosted HA solution
Automatic failover
Lower cost than managed services
Requires more maintenance

Monitoring & Observability

Queue Metrics

Export queue metrics to monitoring systems:

const client = require('prom-client');
const { getChatQueue, getDeepResearchQueue } = require('./queue/queues');

const queueDepthGauge = new client.Gauge({
  name: 'bioagents_queue_depth',
  help: 'Number of jobs waiting in queue',
  labelNames: ['queue', 'state']
});

async function updateMetrics() {
  const chatQueue = getChatQueue();
  const researchQueue = getDeepResearchQueue();
  
  const chatCounts = await chatQueue.getJobCounts();
  const researchCounts = await researchQueue.getJobCounts();
  
  queueDepthGauge.set({ queue: 'chat', state: 'waiting' }, chatCounts.waiting);
  queueDepthGauge.set({ queue: 'chat', state: 'active' }, chatCounts.active);
  queueDepthGauge.set({ queue: 'deep-research', state: 'waiting' }, researchCounts.waiting);
  queueDepthGauge.set({ queue: 'deep-research', state: 'active' }, researchCounts.active);
}

setInterval(updateMetrics, 10000); // Every 10 seconds

Alerting

Set up alerts for queue health:

alerts:
  - name: HighQueueDepth
    expr: bioagents_queue_depth{state="waiting"} > 50
    for: 10m
    annotations:
      summary: "Queue depth is high"
      description: "{{ $labels.queue }} has {{ $value }} waiting jobs"
  
  - name: NoActiveWorkers
    expr: count(up{job="bioagents-worker"}) == 0
    for: 1m
    annotations:
      summary: "No workers are running"
      description: "All workers are down - jobs will not be processed"
  
  - name: HighJobFailureRate
    expr: rate(bioagents_job_failures_total[5m]) > 0.1
    for: 5m
    annotations:
      summary: "Job failure rate is high"
      description: "{{ $value }} jobs/sec are failing"

Troubleshooting

Workers Not Picking Up Jobs

Check Redis connection:

docker-compose -f docker-compose.worker.yml logs | grep -i redis

Expected output:

redis_publisher_connected
chat_queue_initialized
deep_research_queue_initialized

Verify Redis URL:

docker-compose -f docker-compose.worker.yml exec worker env | grep REDIS

Uneven Load Distribution

Symptom: Some workers process many jobs, others idle. Cause: Different worker start times or concurrency settings. Fix: Ensure all workers have identical configuration:

# Restart all workers simultaneously
docker-compose -f docker-compose.worker.yml down
docker-compose -f docker-compose.worker.yml up -d --scale worker=4

Memory Leaks

Monitor memory over time:

docker stats --no-stream

Implement periodic restarts:

# Cron job: Rolling restart every 24 hours
0 3 * * * /opt/bioagents/rolling-restart.sh

rolling-restart.sh

#!/bin/bash
# Restart workers one at a time to maintain capacity

for i in {1..4}; do
  echo "Restarting worker $i..."
  docker-compose -f docker-compose.worker.yml restart worker_$i
  sleep 60  # Wait 1 minute between restarts
done

Best Practices

Next Steps

Job Queue

Learn about BullMQ architecture and configuration

Docker Setup

Deploy with docker-compose

Job Queue Architecture

Custom Agents

⌘I

Documentation Index

​Scaling Architecture

​Multi-Server Deployment

​Server Roles

API Servers

Worker Servers

Redis

​Setup Strategy

​Worker Deployment

​Prerequisites

​Deploy to New Server

​Worker Configuration

​Scaling Strategies

​Scale Based on Queue Depth

​Auto-Scaling with Monitoring

​Concurrency Tuning

​Multi-Region Deployment

​Resource Planning

​Worker Server Sizing

​Cost Optimization

​High Availability

​Worker Redundancy

​Graceful Shutdown

​Redis Failover

​Monitoring & Observability

​Queue Metrics

​Alerting

​Troubleshooting

​Workers Not Picking Up Jobs

​Uneven Load Distribution

​Memory Leaks

​Best Practices

​Next Steps

Job Queue

Docker Setup

Scaling Architecture

Multi-Server Deployment

Server Roles

Setup Strategy

Worker Deployment

Prerequisites

Deploy to New Server

Worker Configuration

Scaling Strategies

Scale Based on Queue Depth

Auto-Scaling with Monitoring

Concurrency Tuning

Multi-Region Deployment

Resource Planning

Worker Server Sizing

Cost Optimization

High Availability

Worker Redundancy

Graceful Shutdown

Redis Failover

Monitoring & Observability

Queue Metrics

Alerting

Troubleshooting

Workers Not Picking Up Jobs

Uneven Load Distribution

Memory Leaks

Best Practices

Next Steps