Hybrid Search

NEW in v0.8.6: See Hybrid Search below for BM25+Vector RRF.

Use your Long Term Core (LTC) as a high-level index for RAG-like consciousness. Search for relevant memories, then re-hydrate full context from git history.

Why Semantic Search?

Vant generates LTC via pruning - a “distilled” consciousness with only important learnings and decisions. But sometimes you need the full context.

This is like having a map (LTC) + archive (git history). Query the map, retrieve from the archive!

How It Works

Query → Search LTC → Identify topics → Re-hydrate from git → Full context

Usage

const search = require('vant').search;

// Get search summary
const summary = search.getSummary();
console.log(summary);
// { hasLTC: true, learnings: 10, decisions: 5, ... }

// Search LTC
const results = search.searchLTC('python', { limit: 5 });
console.log(results);
// [{ type: 'learnings', summary: '...', relevance: 3 }]

// Query + Re-hydrate
const { results, context } = await search.query('authentication');
console.log(context);  // Full matching content

Query Options

All query modes:

// Full search + rehydrate (default)
const { results, context } = await search.query('python');

// Compact mode: summaries only (faster)
const { results, context } = await search.query('python', { compact: true });
// context: "- Learned X\n- Decided Y..."

// Hybrid search with caching
const r1 = await search.hybrid('python');   // First call
const r2 = await search.hybrid('python');   // Cached!

// Cache management
console.log(search.getCacheStats());  // { size: 1, max: 50 }
search.clearCache();               // Clear session cache

// HyDE transform
const hydeQuery = search.hyde(query);

Session Caching

Hybrid search results are cached per-session (50 max, MD5 keyed):

// First call - actual search
const r1 = await search.hybrid('python');

// Second call - cached (instant)
const r2 = await search.hybrid('python');

// Cache stats
search.getCacheStats();  // { size: 1, max: 50 }
search.clearCache();  // Clear all cached

Re-hydrate

Re-hydrate fetches full matching files from git history:

const { context } = await search.rehydrate(results);
// Returns: combined context up to 50KB

Security limits:

Max 50KB re-hydrated content
Only reads from models/vX/ directory
No arbitrary command execution

LTC Structure

Generated by pruning at models/vX/_core.json:

{
    "version": "1.0",
    "updated": "2026-05-05T12:00:00Z",
    "core": {
        "learnings": [...],
        "decisions": [...],
        "preferences": {...}
    },
    "stats": {
        "pruned": 10,
        "kept": 5
    }
}

Use Cases

Deep research: Search LTC, re-hydrate full context
Decision review: Find all decisions on topic
Context recovery: Need more than LTC provides
RAG pipeline: Build AI responses with full context

Integration with AI

// In agent/loop.js
const search = require('vant').search;

async function answerWithContext(query) {
    // 1. Quick LTC check
    const ltc = search.getSummary();
    if (!lc.hasLTC) {
        return 'No LTC available - run prune first';
    }
    
    // 2. Search + re-hydrate
    const { results, context } = await search.query(query);
    
    if (!context) {
        // 3. Fallback to current brain only
        return null;
    }
    
    // 4. Use context in AI prompt
    return context;
}

Caveats

Requires LTC (run prune first)
Limited to pruned versions
50KB max prevents context overflow

Rerank (RAG)

Keyword reranking and compression for LLM context

Rerank is separate from search - focuses on keyword scoring and token optimization:

Rerank: Score memories by keyword match to query
Compress: Strip markdown fluff, truncate to token budget
Pipeline: Rerank + compress in sequence

See Rerank Guide for full documentation.

vant rerank "lessons learned"          # Rerank
vant rerank pipeline "security" -t 4000  # Rerank + compress

Search vs Rerank

Feature	Search	Rerank
Type	Semantic (BM25+Vector)	Keyword
Use case	Find relevant memories	Prepare for LLM
Input	Query	Query + memories
Output	Candidates	Ranked + compressed

Use search to find candidates, rerank to optimize for LLM context.

Usage

const search = require('vant').search;

const results = await search.hybrid('herbalism plants');
// { sparse: [], dense: [], fused: [], sources: [] }

CLI

vant search "query"           # Default hybrid
vant search --hybrid "q"   # Explicit hybrid
vant search --hyde "q"      # HyDE
vant search --stats        # Index stats
vant search "q" -r         # Search + rerank
vant search "q" -r -t 4000 # Search + rerank w/ 4000 tokens

Rerank Integration

Use -r or --rerank to pipeline search results through rerank:

# Search hybrid, then keyword rerank + compress
vant search "lessons" -r -t 4000

# Pipeline: Hybrid search → Rerank → Compress
# Returns keyword-scored, token-compressed memories

Flag	Description	Default
`-r, --rerank`	Rerank results	false
`-t, --max-tokens`	Max tokens	2000

All 3 search modes support --rerank: Default (hybrid), --mode basic, --mode rag, --mode hybrid

Integration

Use with query transformation:

const query = require('./lib/query');
const citations = require('./lib/citations');

const result = await query.hyde('herbalism');
// Uses HyDE: fake answer → search real

citations.addSource(result.sources[0].commit);

MCP Tool

Search is available via MCP as vant_search with two modes:

{
  name: 'vant_search',
  arguments: {
    query: 'search term',
    mode: 'basic' | 'rag',  // default: 'basic'
    files: ['file.md'],       // basic mode only
    limit: 5                // rag mode, default: 5
  }
}

Basic Mode

Fast text search. Default mode.

// Search all files
await vant_search({ query: 'lessons', mode: 'basic' });

// Search specific files
await vant_search({ query: 'lessons', mode: 'basic', files: ['identity.md', 'goals.md'] });

Returns:

{
  "mode": "basic",
  "query": "lessons",
  "filesSearched": 24,
  "hits": 5,
  "results": [
    { "file": "identity.md", "line": 91, "text": "3. **Load brain**" },
    { "file": "lessons.md", "line": 1, "text": "# PROJECT LESSONS" }
  ]
}

RAG Mode

Semantic search using LTC. Searches indexed learnings and decisions, then rehydrates full context.

// Semantic search
await vant_search({ query: 'python', mode: 'rag', limit: 3 });

Returns:

{
  "mode": "rag",
  "query": "python",
  "results": 2,
  "hits": [
    { "type": "learnings", "summary": "Use requests library" },
    { "type": "decisions", "summary": "Prefer venv" }
  ],
  "context": "=== learnings/python.md ===\n...",
  "compressed": "[COMPRESSED:12345]...",
  "ltc": { "hasLTC": true, "learnings": 42, "decisions": 15 }
}

Compression: RAG mode applies compression when context > 5KB.

Hybrid Mode

BM25 + Vector + RRF for full-text capability. Best for general search.

// Hybrid search (BM25 + Vector + RRF via unified lib)
await vant_search({ query: 'lessons', mode: 'hybrid', limit: 5 });

Returns:

{
  "mode": "hybrid",
  "query": "lessons",
  "sparse": 10,
  "dense": 5,
  "fused": 8,
  "results": [
    { "id": "abc123", "rrf": "0.815", "content": "Project lessons learned..." }
  ]
}

Best for: General queries where you want both keyword and semantic matches.

Unified API

All modes available via single lib/search.js:

const search = require('vant').search;

// Basic: text search
const basic = await search.searchLTC('query', { limit: 5 });

// RAG: semantic + rehydrate
const { results, context } = await search.query('query', { limit: 5 });

// Hybrid: BM25 + Vector + RRF
const hybrid = await search.hybrid('query');

// HyDE: query transformation
const hyde = await search.hyde('query');

// Get settings
const settings = search.getSettings();

CLI

# Basic search (default)
vant search "lessons"

# RAG search
vant search --mode rag "python"
vant search --mode rag --limit 3 "context"

Settings

Configure search via settings.ini:

# Search (RAG)
REHYDRATE_MAX_SIZE=51200    # bytes, max 1MB (default 50KB)
COMPRESSION_THRESHOLD=5120  # bytes trigger for compression hint (default 5KB)
RAG_LIMIT_MAX=20             # max results from LTC search (default 20)

Setting	Default	Description
REHYDRATE_MAX_SIZE	50KB	Max bytes returned in RAG context
COMPRESSION_THRESHOLD	5KB	When to show compression hint
RAG_LIMIT_MAX	20	Max results from LTC query

RAG response includes current settings:

{
  "mode": "rag",
  "settings": {
    "rehydrateMaxSize": 51200,
    "compressionThreshold": 5120,
    "ragLimitMax": 20
  }
}

Security

Path traversal: Blocked in basic mode
Null bytes: Content rejected
Absolute paths: Blocked in files filter
Query length: Max 500 chars via vaf
RAG limit bounds: 1-20 enforced

Mode Comparison

Feature	Basic	RAG/LTC	Hybrid
Speed	~1ms	~5ms	~10ms
Type	Text match	Semantic	RRF
Context	Current brain	Git history	Full
Compression	N/A	>5KB trigger	N/A
Requires LTC	No	Yes	Optional
Tokens	Full file	Distilled	Full

Future: Islands + vpatch

Combining with Islands architecture:

Islands: Componentized brain (lazy-load on trigger)
vpatch: Compact diff format vs full file
Benefit: Smaller token context, faster hydration

Potential workflow:

Query triggers island(s) → lazy-hydrate only needed components
Return compressed vpatch diffs → smaller context
Result: Faster RAG with lower tokens

See: Islands for architecture.

Islands - Componentized brain
Audit - Activity logging
Citations - Git-backed citations
Hybrid Sync - Public/Private split

Hybrid Search

Why Semantic Search?

How It Works

Usage

Query Options

Session Caching

Re-hydrate

LTC Structure

Use Cases

Integration with AI

Caveats

Rerank (RAG)

Search vs Rerank

Usage

CLI

Rerank Integration

Integration

MCP Tool

Basic Mode

RAG Mode

Hybrid Mode

Unified API

CLI

Settings

Security

Mode Comparison

Future: Islands + vpatch

Related