Search

Endpoints for retrieving memories. The full search runs query expansion, co-retrieval, reranking, and an LLM repair loop. The fast path skips the LLM repair loop and cross-encoder reranking to hit a sub-200ms target.

Base URL: http://localhost:3050

All request bodies are JSON (Content-Type: application/json). Field names on the raw HTTP prototype surface use snake_case.

POST /v1/memories/search

Full search with query expansion, co-retrieval, reranking, and LLM repair loop.

Request:

{
  "user_id": "ethan",
  "query": "What is their tech stack?",
  "source_site": "claude",
  "limit": 5,
  "as_of": "2026-04-01T00:00:00Z",
  "retrieval_mode": "flat",
  "token_budget": 4000,
  "namespace_scope": "work"
}

Field	Type	Required	Notes
`user_id`	string	yes	User identifier
`query`	string	yes	Search query
`source_site`	string	no	Filter by source platform
`limit`	number	no	Max results (1–100, default: server config)
`as_of`	string	no	ISO timestamp for temporal filtering
`retrieval_mode`	string	no	`flat`, `tiered`, or `abstract-aware`
`token_budget`	number	no	Token budget for results (100–50,000)
`namespace_scope`	string	no	Restrict search to a namespace
`workspace_id`	string	no	Scope search to a workspace (requires `agent_id`)
`agent_id`	string	no	Scope search to an agent within the workspace
`agent_scope`	string	no	Agent visibility scope: `all`, `self`, or `others`

Scope resolution follows the platform's scope contract: when workspace_id + agent_id are omitted, the scope is { kind: 'user', userId }; when both are present, the scope is { kind: 'workspace', userId, workspaceId, agentId, agentScope } and the response echoes it back. Workspace visibility is enforced against each memory's stored visibility column — it is not a caller-provided filter.

Response (captured from running prototype):

{
  "count": 3,
  "retrieval_mode": "flat",
  "scope": { "kind": "user", "userId": "docs-demo" },
  "memories": [
    {
      "id": "7a52eec6-ede8-4904-8bfd-e393bf83f279",
      "content": "User is allergic to peanuts and avoids all tree nuts.",
      "similarity": 0.5556080705511763,
      "score": 2.8112157186128353,
      "importance": 0.7,
      "source_site": "chatgpt",
      "created_at": "2026-04-05T03:21:42.748Z"
    },
    {
      "id": "f4e240d1-293d-4b58-a72a-401d26dbd09d",
      "content": "The v2 launch deadline is April 15, 2026 and it is a hard deadline we cannot move.",
      "similarity": 0.3323145702287046,
      "score": 2.2646287539301384,
      "importance": 0.6,
      "source_site": "chatgpt",
      "created_at": "2026-04-05T03:21:30.380Z"
    },
    {
      "id": "3fa330cb-ee9e-4614-825c-1ce27539d24d",
      "content": "Our production stack is TypeScript, React, PostgreSQL with pgvector, and we deploy on Fly.io.",
      "similarity": 0.3731595762622564,
      "score": 2.246318742363717,
      "importance": 0.5,
      "source_site": "claude",
      "created_at": "2026-04-05T03:21:28.152Z"
    }
  ],
  "injection_text": "### Subject: site/chatgpt\n- [2026-04-05] [answer] The v2 launch deadline is April 15, 2026 and it is a hard deadline we cannot move.\n- [2026-04-05] [context] User is allergic to peanuts and avoids all tree nuts.\n\n### Subject: site/claude\n- [2026-04-05] [context] Our production stack is TypeScript, React, PostgreSQL with pgvector, and we deploy on Fly.io.",
  "citations": [
    "7a52eec6-ede8-4904-8bfd-e393bf83f279",
    "f4e240d1-293d-4b58-a72a-401d26dbd09d",
    "3fa330cb-ee9e-4614-825c-1ce27539d24d"
  ],
  "observability": {
    "retrieval": { "stages": [/* ... */] },
    "packaging": { "tier_budget_tokens": 4000 },
    "assembly": { "injection_tokens": 312 }
  }
}

Response Field	Notes
`scope`	Echoes the resolved `MemoryScope` — `{ kind: 'user', userId }` or `{ kind: 'workspace', userId, workspaceId, agentId, ... }`
`memories[]`	Ranked results with cosine `similarity`, composite `score`, and `importance`
`injection_text`	Pre-formatted markdown grouped by `### Subject: site/<source>` with date-stamped `[context]`/`[answer]` bullets
`citations`	Memory IDs referenced in `injection_text`, matching `memories[].id` order
`tier_assignments`	Present when `retrieval_mode` is `tiered`
`expand_ids`	IDs for follow-up `/v1/memories/expand` calls
`lesson_check`	Safety check against learned lessons
`consensus`	Conflict resolution stats when multiple memories conflict
`observability`	Optional trace payload (`retrieval` / `packaging` / `assembly` sub-objects) — present when the runtime produced per-stage summaries. See observability for the schema.

Example:

curl -X POST http://localhost:3050/v1/memories/search \
  -H 'Content-Type: application/json' \
  -d '{"user_id": "docs-demo", "query": "Do they have any food allergies?", "limit": 3}'

POST /v1/memories/search/fast

Latency-optimized search (sub-200ms target). Skips the LLM repair loop and cross-encoder reranking.

Request:

Field	Type	Required	Notes
`user_id`	string	yes	User identifier
`query`	string	yes	Search query
`source_site`	string	no	Filter by source platform
`limit`	number	no	Max results (1–100)
`namespace_scope`	string	no	Restrict to namespace
`workspace_id`	string	no	Scope search to a workspace (requires `agent_id`)
`agent_id`	string	no	Scope search to an agent within the workspace
`agent_scope`	string	no	Agent visibility scope: `all`, `self`, or `others`

Response (captured from running prototype — same schema as /v1/memories/search, including scope and optional observability):

{
  "count": 3,
  "retrieval_mode": "flat",
  "memories": [
    {
      "id": "7a52eec6-ede8-4904-8bfd-e393bf83f279",
      "content": "User is allergic to peanuts and avoids all tree nuts.",
      "similarity": 0.3755667878488398,
      "score": 4.951131619541372,
      "importance": 0.7,
      "source_site": "chatgpt",
      "created_at": "2026-04-05T03:21:42.748Z"
    },
    {
      "id": "f4e240d1-293d-4b58-a72a-401d26dbd09d",
      "content": "The v2 launch deadline is April 15, 2026 and it is a hard deadline we cannot move.",
      "similarity": 0.4432517683214816,
      "score": 4.153163511162466,
      "importance": 0.6,
      "source_site": "chatgpt",
      "created_at": "2026-04-05T03:21:30.380Z"
    },
    {
      "id": "3fa330cb-ee9e-4614-825c-1ce27539d24d",
      "content": "Our production stack is TypeScript, React, PostgreSQL with pgvector, and we deploy on Fly.io.",
      "similarity": 0.6176793662882996,
      "score": 3.568684490548965,
      "importance": 0.5,
      "source_site": "claude",
      "created_at": "2026-04-05T03:21:28.152Z"
    }
  ],
  "injection_text": "### Subject: site/chatgpt\n- [2026-04-05] [answer] The v2 launch deadline is April 15, 2026 and it is a hard deadline we cannot move.\n- [2026-04-05] [context] User is allergic to peanuts and avoids all tree nuts.\n\n### Subject: site/claude\n- [2026-04-05] [context] Our production stack is TypeScript, React, PostgreSQL with pgvector, and we deploy on Fly.io.",
  "citations": [
    "7a52eec6-ede8-4904-8bfd-e393bf83f279",
    "f4e240d1-293d-4b58-a72a-401d26dbd09d",
    "3fa330cb-ee9e-4614-825c-1ce27539d24d"
  ]
}

Note: /search/fast composite score values are higher than /search because the fast path applies different boosting (recency, access frequency) without the LLM repair normalization.

Example:

curl -X POST http://localhost:3050/v1/memories/search/fast \
  -H 'Content-Type: application/json' \
  -d '{"user_id": "docs-demo", "query": "What is their tech stack?", "limit": 3}'

POST /v1/memories/search​

POST /v1/memories/search/fast​

POST /v1/memories/search

POST /v1/memories/search/fast