Everything GENAPA does.
Browse the capabilities below, or jump to a specific feature from the sidebar.
The Knowledge Graph
Every piece of knowledge in GENAPA lives as one of four node types in a directed graph.
Technical detail
Four node types form a directed acyclic graph. Sources link to references via extraction. References link to entities via disambiguation. Synths link to sources via the Weaver agent. All relationships are stored in PostgreSQL with full provenance tracking.
This is not a flat file search. It is a structured graph where you can follow relationships between files, concepts, people, and systems -- and always trace back to the original source.
The Processing Pipeline
A multi-stage pipeline that transforms raw files into structured, searchable knowledge.
Technical detail
Embeddings stored in PostgreSQL using pgvector with HNSW indexes. Domain-specific extraction prompts defined in YAML. Entity disambiguation uses two-stage matching: vector similarity candidates then LLM evaluation. The Weaver performs differential updates -- when sources change, only affected synths are re-processed.
The pipeline handles deduplication, conceptual organization, and synthesis automatically. When files change, only the affected parts are re-processed -- not the whole archive.
Semantic Search
Search by meaning, not keywords. A search for "authentication flow" finds files about login, identity brokering, and token validation -- even if those exact words do not appear.
A search for "regulatory compliance obligations" finds relevant material even when different documents use different terminology.
- Searches across file names, descriptions, factual claims, and content simultaneously
- Two scoring strategies: best single field match or balanced scoring across all fields
- Filter by node type, domain, collection, or job
- Automatically expands results to include related entities and synth connections
Technical detail
Powered by pgvector with HNSW approximate nearest-neighbor indexes. Searches against Label, About, Claim, Chunk, and Synthesis fields simultaneously with configurable per-field weighting.
You find what you need by describing it, not by guessing which keywords the author used.
Four Specialized AI Agents
Four purpose-built agents, each with a specific role in the knowledge workflow.
Technical detail
Agents use a tool-calling loop pattern with persistent session memory. Omni can spawn sub-agents with configurable recursion depth. Cross-agent memory linking preserves the chain of discovery.
These are not generic chatbots. Each agent is built for a specific kind of knowledge work, and they compose -- Omni can use Seeker, Oracle, or Weaver as tools.
Real-Time Graph Visualization
See the shape of your knowledge -- which files are connected, which concepts cluster together, which entities span multiple subsystems.
- Color-coded nodes by type: sources, references, entities, synths
- Physics-based layout with incremental updates -- new nodes animate into position without restarting the layout
- Click any node to see its claims, metadata, and source content
- Live updates as the archive changes
Technical detail
Two persistent Sigma.js v3 WebGL graph renderers (chat canvas and explorer canvas) using graphology. ForceAtlas2 physics layout with incremental updates. Diff-based graph updates via SignalR.
The knowledge graph is a visual, interactive object that updates in real time as the archive changes -- not a static diagram you generate on demand.
Domain-Configurable Extraction
The same GENAPA instance can process source code, legal documents, research papers, and architecture specs -- each with extraction rules tailored to that content type.
- Built-in domains: source code (per-language for C#, Python, JavaScript, SQL, and more), technical documentation, general content
- Each domain type specifies extraction rules, quality constraints, worked examples, claim scaling, and disambiguation guidance
- Add new domains by writing a YAML configuration file -- no code changes required
- Supports frontmatter-first extraction where files can provide their own metadata
Technical detail
Each domain type carries extraction guidance, disambiguation rules, worked examples, field specifications with quality constraints, claim scaling rules, and synth type declarations. All defined in YAML.
You do not need to retrain anything. Add a YAML configuration file and GENAPA knows how to read your specific content type.
MCP Integration
Your knowledge archive becomes a tool that AI assistants in your editor can use. Search, retrieve, explore, and run agents -- all from Claude Code, GitHub Copilot, or any MCP-compatible client.
- Search the archive, retrieve nodes, and traverse the synth hierarchy
- Run Seeker and Oracle agents directly from your editor
- Manage memories and collections
- Monitor jobs and file watchers
Technical detail
Model Context Protocol server over Streamable HTTP transport at the /mcp endpoint. API-key authenticated. Full tool suite: semantic search, node retrieval with children loading, BFS synth traversal, Seeker and Oracle agent invocation, memory CRUD.
You do not have to context-switch to the portal to use the archive. Your existing AI workflow gains a persistent knowledge backend.
GENAPA Link
The archive stays current with your filesystem automatically. Edit a file, and GENAPA re-processes it without manual intervention.
- File watchers detect changes in real time -- create, modify, rename, delete
- Event coalescing: rapid changes are buffered and merged (a save that triggers delete and create is treated as one update)
- Debounced batch processing with configurable accumulation periods
- Automatic reconnection with exponential backoff
GENAPA Link is currently a Windows service. The Forge Portal and API are accessible from any browser regardless of operating system.
Technical detail
Local Windows service using FileSystemWatcher with System.Reactive event coalescing. Shared watchers, atomic write detection (delete+create as single update), rename coalescing, configurable debounce periods. SignalR bi-directional streaming with exponential-backoff reconnection.
Zero-maintenance archive currency. Your knowledge base reflects your actual working files without manual re-import.
The Forge Portal
Everything happens in one interface. Chat, search, graph exploration, job monitoring, and system configuration.
- Chat pane with streaming AI conversation, visible thinking, tool calls, and context
- Explorer pane for searching and browsing the archive, viewing any node's content, claims, and relationships
- Two graph canvases for visual navigation of the knowledge graph
- Job dashboard for real-time monitoring of processing jobs, file watchers, and automations
- System configuration: LLM provider setup, embedding configuration, secrets management
- Setup wizard for guided onboarding of new instances
You do not need separate tools for chat, search, graph exploration, and system administration.
Nothing Gets Lost
Searches, syntheses, and agent sessions persist across sessions and stay linked to the specific archive content they referenced.
- Multiple memory types: generic, search results, agent sessions, node mappings
- Cross-memory linking -- an Oracle memory links to the Seeker memories it used
- Human-friendly node mapping: sequential integer IDs instead of GUIDs during sessions
- You control how long memories are kept
Technical detail
Typed memory system with in-memory cache and YAML file persistence. Configurable retention periods. Cross-memory linking preserves the chain of discovery across agent sessions.
A search you ran yesterday, a synthesis the Oracle produced last week -- they are still there, linked to the nodes they referenced.
