Features - GENAPA

Data model

The Knowledge Graph

Every piece of knowledge in GENAPA lives as one of four node types in a directed graph.

Sources Your original files. Each source gets AI-generated summary, description, and factual claims.

References Named things extracted from sources -- people, systems, concepts. Each reference links back to the source it was found in.

Entities Canonical, deduplicated records. When multiple sources mention the same thing under different names, GENAPA merges the references into one entity.

Synths Conceptual groupings built by the Weaver agent. Synths organize sources into a structured outline with generated narrative descriptions.

Technical detail

Four node types form a directed acyclic graph. Sources link to references via extraction. References link to entities via disambiguation. Synths link to sources via the Weaver agent. All relationships are stored in PostgreSQL with full provenance tracking.

Why it matters:

This is not a flat file search. It is a structured graph where you can follow relationships between files, concepts, people, and systems -- and always trace back to the original source.

Pipeline

The Processing Pipeline

A multi-stage pipeline that transforms raw files into structured, searchable knowledge.

01

Chunking and embedding Files are split into sections sized for AI processing. Vector embeddings are generated and stored for fast similarity search.

02

Source field extraction AI generates summary, description, and claims for each file using domain-specific prompt configurations.

03

Reference extraction AI identifies named entities within each source and creates linked reference nodes.

04

Entity disambiguation Two-stage matching -- vector similarity to find candidates, then AI evaluation with domain-specific rules to merge duplicates.

05

Synth extraction (Weaver) An autonomous AI agent reads source content, groups semantically similar files, and builds a structured outline. It performs differential updates when content changes instead of rebuilding from scratch.

06

Synthesis (Oracle) Generates detailed narrative descriptions for each synth node, incorporating content from linked sources and child synths.

Technical detail

Embeddings stored in PostgreSQL using pgvector with HNSW indexes. Domain-specific extraction prompts defined in YAML. Entity disambiguation uses two-stage matching: vector similarity candidates then LLM evaluation. The Weaver performs differential updates -- when sources change, only affected synths are re-processed.

Why it matters:

The pipeline handles deduplication, conceptual organization, and synthesis automatically. When files change, only the affected parts are re-processed -- not the whole archive.

Search

Semantic Search

Search by meaning, not keywords. A search for "authentication flow" finds files about login, identity brokering, and token validation -- even if those exact words do not appear.

A search for "regulatory compliance obligations" finds relevant material even when different documents use different terminology.

Searches across file names, descriptions, factual claims, and content simultaneously
Two scoring strategies: best single field match or balanced scoring across all fields
Filter by node type, domain, collection, or job
Automatically expands results to include related entities and synth connections

Technical detail

Powered by pgvector with HNSW approximate nearest-neighbor indexes. Searches against Label, About, Claim, Chunk, and Synthesis fields simultaneously with configurable per-field weighting.

Why it matters:

You find what you need by describing it, not by guessing which keywords the author used.

Agents

Four Specialized AI Agents

Four purpose-built agents, each with a specific role in the knowledge workflow.

Seeker Iterative search agent. Breaks your question into multiple searches, runs them, evaluates results, and refines across passes.

Oracle Deep exploration agent. Uses Seeker to discover relevant content, then produces a written synthesis with citations -- every claim links to the source file it came from.

Weaver Graph construction agent. Builds and maintains the conceptual hierarchy. The only agent that modifies the knowledge graph structure.

Omni Chat agent. Ask questions about your archive and get answers backed by specific source files. Can delegate to other agents for deeper research.

Technical detail

Agents use a tool-calling loop pattern with persistent session memory. Omni can spawn sub-agents with configurable recursion depth. Cross-agent memory linking preserves the chain of discovery.

Why it matters:

These are not generic chatbots. Each agent is built for a specific kind of knowledge work, and they compose -- Omni can use Seeker, Oracle, or Weaver as tools.

Visualization

Real-Time Graph Visualization

See the shape of your knowledge -- which files are connected, which concepts cluster together, which entities span multiple subsystems.

Color-coded nodes by type: sources, references, entities, synths
Physics-based layout with incremental updates -- new nodes animate into position without restarting the layout
Click any node to see its claims, metadata, and source content
Live updates as the archive changes

Technical detail

Two persistent Sigma.js v3 WebGL graph renderers (chat canvas and explorer canvas) using graphology. ForceAtlas2 physics layout with incremental updates. Diff-based graph updates via SignalR.

Why it matters:

The knowledge graph is a visual, interactive object that updates in real time as the archive changes -- not a static diagram you generate on demand.

Configuration

Domain-Configurable Extraction

The same GENAPA instance can process source code, legal documents, research papers, and architecture specs -- each with extraction rules tailored to that content type.

Built-in domains: source code (per-language for C#, Python, JavaScript, SQL, and more), technical documentation, general content
Each domain type specifies extraction rules, quality constraints, worked examples, claim scaling, and disambiguation guidance
Add new domains by writing a YAML configuration file -- no code changes required
Supports frontmatter-first extraction where files can provide their own metadata

Technical detail

Each domain type carries extraction guidance, disambiguation rules, worked examples, field specifications with quality constraints, claim scaling rules, and synth type declarations. All defined in YAML.

Why it matters:

You do not need to retrain anything. Add a YAML configuration file and GENAPA knows how to read your specific content type.

Integration

MCP Integration

Your knowledge archive becomes a tool that AI assistants in your editor can use. Search, retrieve, explore, and run agents -- all from Claude Code, GitHub Copilot, or any MCP-compatible client.

Search the archive, retrieve nodes, and traverse the synth hierarchy
Run Seeker and Oracle agents directly from your editor
Manage memories and collections
Monitor jobs and file watchers

Technical detail

Model Context Protocol server over Streamable HTTP transport at the /mcp endpoint. API-key authenticated. Full tool suite: semantic search, node retrieval with children loading, BFS synth traversal, Seeker and Oracle agent invocation, memory CRUD.

Why it matters:

You do not have to context-switch to the portal to use the archive. Your existing AI workflow gains a persistent knowledge backend.

Filesystem

GENAPA Link

The archive stays current with your filesystem automatically. Edit a file, and GENAPA re-processes it without manual intervention.

File watchers detect changes in real time -- create, modify, rename, delete
Event coalescing: rapid changes are buffered and merged (a save that triggers delete and create is treated as one update)
Debounced batch processing with configurable accumulation periods
Automatic reconnection with exponential backoff

GENAPA Link is currently a Windows service. The Forge Portal and API are accessible from any browser regardless of operating system.

Technical detail

Local Windows service using FileSystemWatcher with System.Reactive event coalescing. Shared watchers, atomic write detection (delete+create as single update), rename coalescing, configurable debounce periods. SignalR bi-directional streaming with exponential-backoff reconnection.

Why it matters:

Zero-maintenance archive currency. Your knowledge base reflects your actual working files without manual re-import.

Portal

The Forge Portal

Everything happens in one interface. Chat, search, graph exploration, job monitoring, and system configuration.

Chat pane with streaming AI conversation, visible thinking, tool calls, and context
Explorer pane for searching and browsing the archive, viewing any node's content, claims, and relationships
Two graph canvases for visual navigation of the knowledge graph
Job dashboard for real-time monitoring of processing jobs, file watchers, and automations
System configuration: LLM provider setup, embedding configuration, secrets management
Setup wizard for guided onboarding of new instances

Why it matters:

You do not need separate tools for chat, search, graph exploration, and system administration.

Memory

Nothing Gets Lost

Searches, syntheses, and agent sessions persist across sessions and stay linked to the specific archive content they referenced.

Multiple memory types: generic, search results, agent sessions, node mappings
Cross-memory linking -- an Oracle memory links to the Seeker memories it used
Human-friendly node mapping: sequential integer IDs instead of GUIDs during sessions
You control how long memories are kept

Technical detail

Typed memory system with in-memory cache and YAML file persistence. Configurable retention periods. Cross-memory linking preserves the chain of discovery across agent sessions.

Why it matters:

A search you ran yesterday, a synthesis the Oracle produced last week -- they are still there, linked to the nodes they referenced.

Everything GENAPA does.

The Knowledge Graph

The Processing Pipeline

Semantic Search

Four Specialized AI Agents

Real-Time Graph Visualization

Domain-Configurable Extraction

MCP Integration

GENAPA Link

The Forge Portal

Nothing Gets Lost