Search
Endpoints for retrieving memories. The full search runs query expansion, co-retrieval, reranking, and an LLM repair loop. The fast path skips the LLM repair loop and cross-encoder reranking to hit a sub-200ms target.
Base URL: http://localhost:3050
All request bodies are JSON (Content-Type: application/json). Field names on the raw HTTP prototype surface use snake_case.
POST /v1/memories/search
Full search with query expansion, co-retrieval, reranking, and LLM repair loop.
Request:
{
"user_id": "ethan",
"query": "What is their tech stack?",
"source_site": "claude",
"limit": 5,
"as_of": "2026-04-01T00:00:00Z",
"retrieval_mode": "flat",
"token_budget": 4000,
"namespace_scope": "work"
}
| Field | Type | Required | Notes |
|---|---|---|---|
user_id | string | yes | User identifier |
query | string | yes | Search query |
source_site | string | no | Filter by source platform |
limit | number | no | Max results (1–100, default: server config) |
as_of | string | no | ISO timestamp for temporal filtering |
retrieval_mode | string | no | flat, tiered, or abstract-aware |
token_budget | number | no | Token budget for results (100–50,000) |
namespace_scope | string | no | Restrict search to a namespace |
workspace_id | string | no | Scope search to a workspace (requires agent_id) |
agent_id | string | no | Scope search to an agent within the workspace |
agent_scope | string | no | Agent visibility scope: all, self, or others |
Scope resolution follows the platform's scope contract: when workspace_id + agent_id are omitted, the scope is { kind: 'user', userId }; when both are present, the scope is { kind: 'workspace', userId, workspaceId, agentId, agentScope } and the response echoes it back. Workspace visibility is enforced against each memory's stored visibility column — it is not a caller-provided filter.
Response (captured from running prototype):
{
"count": 3,
"retrieval_mode": "flat",
"scope": { "kind": "user", "userId": "docs-demo" },
"memories": [
{
"id": "7a52eec6-ede8-4904-8bfd-e393bf83f279",
"content": "User is allergic to peanuts and avoids all tree nuts.",
"similarity": 0.5556080705511763,
"score": 2.8112157186128353,
"importance": 0.7,
"source_site": "chatgpt",
"created_at": "2026-04-05T03:21:42.748Z"
},
{
"id": "f4e240d1-293d-4b58-a72a-401d26dbd09d",
"content": "The v2 launch deadline is April 15, 2026 and it is a hard deadline we cannot move.",
"similarity": 0.3323145702287046,
"score": 2.2646287539301384,
"importance": 0.6,
"source_site": "chatgpt",
"created_at": "2026-04-05T03:21:30.380Z"
},
{
"id": "3fa330cb-ee9e-4614-825c-1ce27539d24d",
"content": "Our production stack is TypeScript, React, PostgreSQL with pgvector, and we deploy on Fly.io.",
"similarity": 0.3731595762622564,
"score": 2.246318742363717,
"importance": 0.5,
"source_site": "claude",
"created_at": "2026-04-05T03:21:28.152Z"
}
],
"injection_text": "### Subject: site/chatgpt\n- [2026-04-05] [answer] The v2 launch deadline is April 15, 2026 and it is a hard deadline we cannot move.\n- [2026-04-05] [context] User is allergic to peanuts and avoids all tree nuts.\n\n### Subject: site/claude\n- [2026-04-05] [context] Our production stack is TypeScript, React, PostgreSQL with pgvector, and we deploy on Fly.io.",
"citations": [
"7a52eec6-ede8-4904-8bfd-e393bf83f279",
"f4e240d1-293d-4b58-a72a-401d26dbd09d",
"3fa330cb-ee9e-4614-825c-1ce27539d24d"
],
"observability": {
"retrieval": { "stages": [/* ... */] },
"packaging": { "tier_budget_tokens": 4000 },
"assembly": { "injection_tokens": 312 }
}
}
| Response Field | Notes |
|---|---|
scope | Echoes the resolved MemoryScope — { kind: 'user', userId } or { kind: 'workspace', userId, workspaceId, agentId, ... } |
memories[] | Ranked results with cosine similarity, composite score, and importance |
injection_text | Pre-formatted markdown grouped by ### Subject: site/<source> with date-stamped [context]/[answer] bullets |
citations | Memory IDs referenced in injection_text, matching memories[].id order |
tier_assignments | Present when retrieval_mode is tiered |
expand_ids | IDs for follow-up /v1/memories/expand calls |
lesson_check | Safety check against learned lessons |
consensus | Conflict resolution stats when multiple memories conflict |
observability | Optional trace payload (retrieval / packaging / assembly sub-objects) — present when the runtime produced per-stage summaries. See observability for the schema. |
Example:
curl -X POST http://localhost:3050/v1/memories/search \
-H 'Content-Type: application/json' \
-d '{"user_id": "docs-demo", "query": "Do they have any food allergies?", "limit": 3}'
POST /v1/memories/search/fast
Latency-optimized search (sub-200ms target). Skips the LLM repair loop and cross-encoder reranking.
Request:
| Field | Type | Required | Notes |
|---|---|---|---|
user_id | string | yes | User identifier |
query | string | yes | Search query |
source_site | string | no | Filter by source platform |
limit | number | no | Max results (1–100) |
namespace_scope | string | no | Restrict to namespace |
workspace_id | string | no | Scope search to a workspace (requires agent_id) |
agent_id | string | no | Scope search to an agent within the workspace |
agent_scope | string | no | Agent visibility scope: all, self, or others |
Response (captured from running prototype — same schema as /v1/memories/search, including scope and optional observability):
{
"count": 3,
"retrieval_mode": "flat",
"memories": [
{
"id": "7a52eec6-ede8-4904-8bfd-e393bf83f279",
"content": "User is allergic to peanuts and avoids all tree nuts.",
"similarity": 0.3755667878488398,
"score": 4.951131619541372,
"importance": 0.7,
"source_site": "chatgpt",
"created_at": "2026-04-05T03:21:42.748Z"
},
{
"id": "f4e240d1-293d-4b58-a72a-401d26dbd09d",
"content": "The v2 launch deadline is April 15, 2026 and it is a hard deadline we cannot move.",
"similarity": 0.4432517683214816,
"score": 4.153163511162466,
"importance": 0.6,
"source_site": "chatgpt",
"created_at": "2026-04-05T03:21:30.380Z"
},
{
"id": "3fa330cb-ee9e-4614-825c-1ce27539d24d",
"content": "Our production stack is TypeScript, React, PostgreSQL with pgvector, and we deploy on Fly.io.",
"similarity": 0.6176793662882996,
"score": 3.568684490548965,
"importance": 0.5,
"source_site": "claude",
"created_at": "2026-04-05T03:21:28.152Z"
}
],
"injection_text": "### Subject: site/chatgpt\n- [2026-04-05] [answer] The v2 launch deadline is April 15, 2026 and it is a hard deadline we cannot move.\n- [2026-04-05] [context] User is allergic to peanuts and avoids all tree nuts.\n\n### Subject: site/claude\n- [2026-04-05] [context] Our production stack is TypeScript, React, PostgreSQL with pgvector, and we deploy on Fly.io.",
"citations": [
"7a52eec6-ede8-4904-8bfd-e393bf83f279",
"f4e240d1-293d-4b58-a72a-401d26dbd09d",
"3fa330cb-ee9e-4614-825c-1ce27539d24d"
]
}
Note: /search/fast composite score values are higher than /search because the fast path applies different boosting (recency, access frequency) without the LLM repair normalization.
Example:
curl -X POST http://localhost:3050/v1/memories/search/fast \
-H 'Content-Type: application/json' \
-d '{"user_id": "docs-demo", "query": "What is their tech stack?", "limit": 3}'