Turn PDFs and images into governed graph rows using your AnythingGraph entity schema or an installed playbook. Fetch the extraction contract over REST or MCP, run your vision model, then ingest structured JSON.
AnythingGraph does not run OCR or vision models inside the platform. You provide the document; your VLM (GPT-4o, Claude, Gemini, or an on-prem model) returns JSON that matches your entity structure. The graph stores validated rows, applies playbook mappings, and exposes the same data to the dashboard and MCP agents.
Each entity field supports AI metadata:
description, example, and
extraction_hint (where to find the value on a page). Playbooks package record types,
relationships, connector mappings, and ingest instructions.
invoice-records-structured — assumes extraction happens outside AnythingGraph; the playbook
validates and graphs invoice facts.
./start-all.sh (data-layer :8182, dashboard API :5180, MCP HTTP
:3333/mcp when started separately).
GET /api/entities/:id, GET /api/playbooks/:id).get_entity before vision.
Install e.g. invoice-records-structured, procure-to-pay, or
document-registry. Playbook JSON lives under
dashboard/backend/src/playbook/playbooks/ and defines entities, fields, and example payloads.
In the dashboard, open Entity structure → create or edit a record type → expand AI metadata on each field. Hints such as “top-right corner, labeled Invoice #” improve VLM accuracy.
Before calling your VLM, assemble a machine-readable spec: entity names, field names, types, required flags, descriptions, examples, and extraction hints. Use either REST (direct HTTP) or MCP (agent host invokes tools).
| Method | Best for | Primary calls |
|---|---|---|
| REST | Custom pipelines, server-side ETL, dashboard API | GET /api/entities/:id, GET /api/playbooks/:id |
| MCP | Agent + VLM in one session (no manual copy/paste) | list_entities → get_entity |
get_graph_query_context and anythinggraph://schema-summary
are for graph Q&A, not rich extraction prompts. Always use get_entity (MCP) or
GET /api/entities/:id (REST) for field-level metadata.
List entities, then fetch each definition (includes extraction_hint):
curl -s http://127.0.0.1:5180/api/entities
curl -s http://127.0.0.1:5180/api/entities/1
# Playbook catalog + install status (field defs from entities after install)
curl -s http://127.0.0.1:5180/api/playbooks/invoice-records-structured
Connect Cursor, Claude Desktop, or your orchestrator to
http://127.0.0.1:3333/mcp (cd mcp-service && npm run start:http). Tool
sequence:
health_checkget_graph_query_context with optional playbook_idget_entity for each entity in scope{
"mcpServers": {
"anythinggraph": {
"url": "http://127.0.0.1:3333/mcp"
}
}
}
Pass the generated extraction spec plus the document (image or PDF pages) to your vision model.
{
"task": "Extract structured business records from the attached document.",
"rules": [
"Output a JSON object with a records array.",
"Use field_name keys exactly as defined in the schema.",
"Use null for missing optional fields; do not invent values.",
"Dates: prefer ISO-8601 (YYYY-MM-DD) when possible."
],
"schema": { "...": "from Step 2 — entities[].fields[]" }
}
Example model output for invoice-records-structured:
{
"records": [
{
"invoice_number": "INV-2024-0042",
"vendor_name": "Acme Supplies Ltd",
"total_amount": 1250.0,
"invoice_date": "2024-03-15"
}
]
}
After playbook install:
POST http://127.0.0.1:5180/api/playbooks/invoice-records-structured/webhook
Content-Type: application/json
{
"records": [
{
"invoice_number": "INV-2024-0042",
"vendor_name": "Acme Supplies Ltd",
"total_amount": 1250,
"invoice_date": "2024-03-15"
}
]
}
The connector validates required fields, applies field mappings, routes rows to entities, and sends failures to the landing zone.
create_entity_row(
entity_id=<from list_entities>,
values_json='{"invoice_number":"INV-2024-0042",...}'
)
client.dashboard.playbook_webhook("invoice-records-structured", {"records": [...]})
ask_graph or SPARQL (sync_rdf_cache then run_sparql).
┌─────────────┐ REST or MCP ┌──────────────────┐
│ Orchestrator│ ──────────────────► │ AnythingGraph │
│ (your app) │ ◄── schema / spec │ data-layer + MCP │
└──────┬──────┘ └────────▲─────────┘
│ │
│ document + spec │ JSON records
▼ │
┌─────────────┐ ┌──────┴───────┐
│ VLM │ │ Connector / │
│ (vision API)│ │ webhook │
└─────────────┘ └──────────────┘
The VLM never talks to LMDB directly. Your orchestrator owns the loop: fetch schema → extract → ingest → optional graph queries.
| Issue | What to check |
|---|---|
| REST fetch fails (network) |
Dashboard API running on :5180; CORS enabled on dashboard; open this page via
http:// not file://
|
| Playbook entities not found | Install the playbook first; entity names in LMDB must match playbook entities[].name |
| MCP tools missing | cd mcp-service && npm run start:http; data-layer on :8182 |
| Ingest validation errors | Landing zone; required fields; field mappings if VLM keys differ from schema |
| Empty extraction hints |
Playbook catalog JSON may omit hints — add them on entity fields in the dashboard or use
GET /api/entities/:id after editing
AI metadata
|