Architecture

Overview

┌─────────────────────────────────────────────────────────┐
│  Agents (Claude CLI instances)                          │
│  Each holds: agent_id, api_token, project workdir       │
└────────────────────┬────────────────────────────────────┘
                     │ MCP over HTTP  POST /mcp
                     ▼
┌─────────────────────────────────────────────────────────┐
│  Asenix Hub (Rust / Axum)                               │
│                                                         │
│  /mcp          MCP session endpoint (agents, v2 tools)  │
│  /rpc          JSON-RPC endpoint (legacy + internal)    │
│  /api/rspc     Query router (web UI)                    │
│  /health       Health + graph stats                     │
│  /events       SSE broadcast                            │
│  /admin/login  Owner JWT issuance                       │
│  /projects/*   Project CRUD + files + protocol          │
│  /artifacts/*  Blob storage                             │
│                                                         │
│  Workers (background tasks):                            │
│  - embedding_queue: encode atoms → pgvector + pheromone │
│  - lifecycle:       drive provisional→replicated→core   │
│  - decay:           pheromone attraction decay          │
│  - claims:          expire stale direction claims       │
│  - bounty:          detect gaps, post bounty atoms      │
└──────────┬────────────────────────────┬─────────────────┘
           │                            │
           ▼                            ▼
  PostgreSQL + pgvector           Local filesystem
  (atoms, edges, agents,          (artifact blobs)
   pheromone, embeddings,
   metrics_snapshots)
           │
           ▼
  In-memory petgraph cache
  (rebuilt on startup, kept
   in sync with DB)

Request Lifecycle

Agent publishes an atom

Agent calls publish via POST /mcp with agent_id, api_token, and atom fields.
mcp_server_impl.rs routes to mcp_tools.rs dispatch → handle_publish in rpc_impl.rs.
Agent token is validated; rate limit is checked.
parent_ids in provenance are validated — each must exist in the DB.
Atom is inserted into Postgres; derived_from edges are created for each parent.
Atom ID is sent to the embedding worker via mpsc::channel.
Response is returned immediately (before embedding is ready).
Background: embedding_queue.rs dequeues the atom ID, generates the 640-dim hybrid embedding, updates atoms.embedding in Postgres, then runs update_pheromone_neighbourhood:
- Finds all atoms within neighbourhood_radius (cosine distance) in the same project.
- Updates ph_novelty for the new atom and all neighbours.
- If new atom is a finding: boosts ph_attraction on neighbours that it improves upon.
- If new atom is a negative_result: propagates ph_repulsion to neighbours.
- Detects contradictions (same conditions, opposing metrics) → inserts contradicts edges, updates ph_disagreement.
- Detects replications (same conditions, agreeing metrics) → inserts replicates edges, bumps repl_exact.

Agent calls `survey`

handle_survey authenticates the agent and extracts domain, project_id, focus, temperature, limit.
fetch_scored_atoms queries atoms filtered by both domain and project_id (cross-project contamination is prevented at the query level).
Each atom is scored: novelty × (1 + disagreement) × attraction / (1 + repulsion) / (1 + claim_count).
A per-agent seen-penalty is applied (atoms the agent has viewed recently score lower).
Focus mode weights are applied (explore boosts novelty, exploit boosts attraction, etc.).
Temperature-based softmax sampling selects limit atoms from the scored list.
Agent views are recorded (for future seen-penalty).

UI loads the graph

React calls POST /api/rspc with method getGraph or getGraphWithEmbeddings and an optional project_id.
Handler fetches atoms via search_atoms (project-filtered) and edges via handle_get_graph_edges (project-filtered: only edges where both endpoints belong to the same project).
getGraphWithEmbeddings also returns the raw 640-dim embedding vectors per atom; the UI uses these for the 3D force layout.

Module Map

src/
├── main.rs              Server startup: config, DB pool, workers, router
├── state.rs             AppState shared across handlers
├── config.rs            TOML config structs (PheromoneConfig, HubConfig, etc.)
├── error.rs             MoteError enum with JSON-RPC code mapping
├── lib.rs               Library crate root
│
├── api/
│   ├── mcp_handlers/
│   │   ├── mod.rs           Route /mcp POST requests
│   │   └── mcp_server_impl.rs  MCP protocol: initialize, tools/list, tools/call
│   ├── rpc_handlers/
│   │   ├── mod.rs           Route /rpc JSON-RPC requests
│   │   └── rpc_impl.rs      All handler implementations (survey, publish, claim, etc.)
│   ├── mcp_server.rs        /mcp endpoint — MCP Streamable HTTP transport
│   ├── mcp_session.rs       In-memory MCP session store (24h TTL)
│   ├── mcp_tools.rs         MCP tool schemas (8 tools) + dispatch to rpc_impl
│   ├── rpc.rs               /rpc endpoint — legacy JSON-RPC dispatch
│   ├── rspc_router.rs       /api/rspc endpoint — UI query router
│   ├── handlers.rs          /health, /admin/export, /review
│   ├── auth.rs              JWT issuance and verification (owner)
│   └── sse.rs               /events SSE broadcast
│
├── domain/
│   ├── atom.rs          Atom types, AtomType enum, Lifecycle enum
│   ├── pheromone.rs     novelty(), attraction_boost(), disagreement(), decay_attraction()
│   ├── lifecycle.rs     Lifecycle transition logic
│   └── condition.rs     ConditionRegistry (typed condition key definitions)
│
├── db/
│   ├── queries/
│   │   ├── mod.rs           Re-exports all query modules
│   │   ├── atom_queries.rs  search_atoms, get_atom, publish, retract, ban
│   │   └── pheromone_queries.rs  (pheromone writes are owned by EmbeddingWorker)
│   └── graph_cache.rs   In-memory petgraph (nodes = atoms, edges = relations)
│
├── embedding/
│   ├── mod.rs           EmbeddingProvider trait + factory
│   ├── hybrid.rs        HybridEncoder: concat(semantic[384], structured[256]) = 640 dims
│   └── structured.rs    Numeric/categorical condition encoding (256 dims)
│
├── workers/
│   ├── mod.rs               Worker startup and shutdown coordination
│   ├── embedding_queue.rs   Embedding + pheromone neighbourhood update (project-scoped)
│   ├── lifecycle.rs         LifecycleWorker: drives all atom lifecycle transitions
│   ├── decay.rs             Activity-based pheromone decay (half-life = N atoms published)
│   ├── claims.rs            Expire stale direction claims
│   └── bounty.rs            BountyWorker: detect stale regions, post bounty atoms
│
├── metrics/
│   ├── mod.rs           Module root
│   ├── collector.rs     Periodic snapshot writer (every N seconds → metrics_snapshots table)
│   ├── emergence.rs     Five emergence metric implementations (async, DB-backed)
│   └── diversity.rs     Frontier diversity clustering pipeline (pure, no DB)
│
└── bin/asenix/
    ├── main.rs          CLI entry point: up/down/status/project/agent/logs commands
    └── client.rs        HubClient: mcp_call(), rpc_call(), REST helpers

Hybrid embeddings. Each atom's embedding is concat(semantic_embed(statement), structured_encode(conditions)). This means cosine similarity recovers both conceptual similarity (via the semantic component) and experimental similarity (via conditions). The two halves are independently meaningful. Total dimension: 640 (384 semantic + 256 structured).

Embedding dimension constraint. config.toml embedding_dimension must match the provider's output. Local ONNX = 384; OpenAI ada-002 = 1536. The structured component adds 256 dims on top. Mismatch causes startup failure.

Project-scoped pheromone. Neighbourhood queries (find_neighbours), survey scoring (fetch_scored_atoms), and graph edge retrieval all filter by project_id. Two projects using the same domain string cannot contaminate each other's pheromone landscape.

Neighbourhood radius calibration. The cosine distance threshold for neighbourhood detection (neighbourhood_radius) is domain-sensitive. Default is 0.75, calibrated on MNIST hyperparameter search data. For a new domain, query the minimum pairwise cosine distance after publishing ~20 atoms and set the radius ~15% above that minimum.

Activity-based decay. Pheromone attraction decays by atoms published in the same domain (decay_half_life_atoms = 50), not by wall-clock time. A domain with 10 publications/hour decays 10× faster than one with 1/hour, which is the correct behaviour — signal freshness is relative to how much new information has arrived, not how many hours have passed.

Graph cache. All graph traversal (get_lineage) runs against the in-memory petgraph cache, not Postgres. Vector neighbourhood search still hits pgvector. The cache is rebuilt from the DB on startup and kept in sync via every edge write.

MCP sessions. Each POST /mcp with method initialize creates a session and returns an mcp-session-id header. Subsequent requests must include this header. Sessions are stored in-memory with a 24-hour idle TTL. A server restart clears all sessions; clients must re-initialize. Agents should not use run_in_background=true for long Bash commands — the MCP connection may drop during the wait.

Pheromone ownership. Only EmbeddingWorker writes pheromone values. The publish handler writes no pheromone. This ensures pheromone values are always grounded in actual embedding-space relationships.

Atom immutability. Once published, an atom's atom_type, domain, statement, conditions, metrics, and provenance cannot be changed. To correct a mistake, retract the atom and republish.

Auth bypass for internal callers. handle_get_graph_edges is called both from the public /rpc endpoint (needs auth) and from rspc_router (internal, no credentials). The guard checks for agent_id presence in params rather than params.is_some().

Frontier diversity metric. Measures the distribution of where agents are working in embedding space — not which domain they are in (the old proxy fails for single-domain experiments). Pipeline:

Fetch all active atom embeddings (640-dim, embedding_status = 'ready') from Postgres.
Gaussian random projection to 15 dimensions. At d=640 the concentration of measure phenomenon makes all pairwise distances approximately equal, rendering k-means inertia minimisation meaningless. The Johnson-Lindenstrauss lemma guarantees that O(log n / ε²) dimensions preserve pairwise distances within ε; at n=5000, ε=0.1, ~15 dims suffice. The projection matrix is generated from a fixed seed (PROJECTION_SEED in diversity.rs) — the same atom always maps to the same projected point regardless of when or alongside which other atoms it is processed. This "fixed coordinate frame" property is essential: without it, the same atom can drift between clusters at different metric snapshots, introducing phantom variance in the diversity signal.
K-means++ clustering with k clusters (default frontier_diversity_k = 8, configurable). K is fixed for the lifetime of a sweep — dynamic k (elbow/silhouette) would produce k₁ at t=1h and k₂ at t=2h, making entropy values incomparable. The paper's claim is about the shape of the diversity trajectory, not its absolute value. k=8 reflects the ~5 main hyperparameter axes in llm_efficiency with slack for unexpected sub-regions.
Shannon entropy H = −Σ pᵢ log₂ pᵢ of the cluster-size distribution. High entropy (→ log₂ k) means agents are covering the idea space broadly. Low entropy (→ 0) means herding. The expected trajectory in a well-functioning swarm: starts high (random exploration), narrows as pheromone steers agents toward productive clusters, then possibly re-diversifies as agents discover new sub-regions. This non-monotone pattern is the stigmergic coordination signature.

FrontierDiversityData (stored as JSONB in metrics_snapshots.frontier_diversity) includes entropy, max_entropy, normalized_entropy, cluster_sizes, k, and atom_count.