AnythingLLM: workspace-based RAG with any LLM

Install via Docker

docker run -d \
    --name anythingllm \
    --restart unless-stopped \
    -p 127.0.0.1:3001:3001 \
    -v anythingllm-data:/app/server/storage \
    --cap-add SYS_ADMIN \
    -e STORAGE_DIR="/app/server/storage" \
    mintplexlabs/anythingllm:latest

--cap-add SYS_ADMIN is required for the bundled headless Chromium that AnythingLLM uses for website scraping. Drop the flag if you don't need URL ingestion; only sites won't import.

Reverse proxy

# Caddy
chat.example.com {
    reverse_proxy 127.0.0.1:3001
    request_body { max_size 100MB }
}

First-run setup

Browse to the URL. The setup wizard asks for:

LLM provider — OpenAI, Anthropic, Azure OpenAI, Google Gemini, AWS Bedrock, Mistral, Groq, HuggingFace, Ollama (local), LM Studio (local), LocalAI. Provide API key + pick a default model.
Embedding model — defaults to the bundled "AnythingLLM Native Embedder" (works locally without an extra service); OpenAI / Ollama / Cohere / Voyage / LMStudio are alternatives. For everything-stays-local, pair with Ollama running nomic-embed-text or mxbai-embed-large.
Vector database — defaults to LanceDB (embedded, file-backed); pgvector (see that tutorial), Pinecone, Weaviate, Chroma, Qdrant, Milvus, Astra DB also supported.
Account creation — you become the admin. Multi-user mode can be enabled later for SSO + per-user workspaces.

Create a workspace

Click "+ New Workspace" — name it (e.g. "Engineering Docs"). Drop documents in via the upload pane. Supported formats:

PDF, DOCX, ODT, RTF, TXT, Markdown
HTML / scraped websites (paste URL; AnythingLLM crawls)
JSON, CSV
Audio + video files (transcribed via OpenAI Whisper or local whisper.cpp)
Confluence, Notion (export), GitHub repos, GitLab repos, YouTube videos (transcript), Drupal, web sitemaps

Each document is chunked (configurable chunk size + overlap), embedded, and stored in the vector DB associated with that workspace. Chunks per workspace are isolated — chats only retrieve from the workspace they're in.

Chat with the workspace

Open the chat for the workspace; type a question. AnythingLLM:

Embeds the question.
Retrieves the top-K most similar chunks from the workspace's vector store.
Sends the question + the chunks to the configured LLM with a system prompt instructing it to answer from the provided context and cite sources.
Renders the response with inline citation chips that link to the source document + chunk.

Click a citation chip to jump to the source. This is the part that separates RAG from "the LLM made something up" — provenance is visible.

Per-workspace settings

Each workspace has its own:

LLM model (workspace A uses Claude; workspace B uses local Ollama)
Embedding model
Temperature, top-K retrieval count, similarity threshold
System prompt ("You're a customer-support assistant; answer only from the provided documents; if you can't find the answer, say so")
Suggested message templates ("Show me", "Compare", "Summarize")

This lets one instance host very different agents — legal review, customer support, internal docs — with independent configurations.

Agent skills

AnythingLLM 1.x adds "agent skills" — tool-call-style functions the LLM can invoke. Built-in skills: web browsing, web search, code execution, RAG over linked workspaces, save-to-document. Per-workspace toggle.

Custom skills are written in JavaScript and dropped in /app/server/storage/plugins/agent-skills/ — each is a self-contained JS module declaring its arguments, schema, and handler. Useful for "agent can query our internal Jira" or "agent can write a row to NocoDB" without writing a full integration plugin.

Multi-user + SSO

Settings → Users → enable multi-user. Roles: admin, manager, user. Each user gets their own workspaces and chats; admins can share workspaces across users. SAML / OIDC SSO supported via the Pro / Enterprise tier; the community edition does email + password, plus basic API-key auth for integrations.

Embedded chat widget

Per workspace, AnythingLLM generates a JavaScript snippet that drops a chat widget on any website — the widget talks to the workspace and answers from its documents. Useful for "stick the docs assistant on the docs site."

Backups

Everything lives under ./storage/:

The SQLite (or external) database with users + workspaces + settings
The embedded LanceDB (vectors) — or remote vector DB if configured externally
Uploaded source documents

Snapshot the directory nightly via restic (see that tutorial). If using an external vector DB, back it up via its own tooling.

AnythingLLM vs LibreChat vs Open WebUI

LibreChat — best for a polished multi-model chat UI; less workspace/RAG-centric.
AnythingLLM — best for "I want documents + chat to feel like one product"; workspaces are the organizing concept.
Open WebUI — best for Ollama-first local-only setups; the smallest scope.

For team knowledge bases or customer-support agents that need provenance and per-context isolation, AnythingLLM is the most direct fit in the self-hosted LLM ecosystem in 2026.