Karakeep: a bookmark archiver that actually saves the content

Install via docker compose

# docker-compose.yml
services:
  karakeep:
    image: ghcr.io/karakeep-app/karakeep:release
    container_name: karakeep
    restart: unless-stopped
    ports:
      - "127.0.0.1:3000:3000"
    volumes:
      - ./data:/data
    environment:
      DATA_DIR: /data
      NEXTAUTH_URL: https://karakeep.example.com
      NEXTAUTH_SECRET: ${NEXTAUTH_SECRET}        # openssl rand -base64 36
      MEILI_ADDR: http://meilisearch:7700
      MEILI_MASTER_KEY: ${MEILI_MASTER_KEY}
      BROWSER_WEB_URL: http://chrome:9222

      # Optional: AI tagging (LLM provider)
      OPENAI_API_KEY: ${OPENAI_API_KEY}
      OPENAI_BASE_URL: https://api.openai.com/v1   # or your LiteLLM (see /tutorials/litellm-llm-gateway-proxy.html)
      INFERENCE_TEXT_MODEL: gpt-5-mini
      INFERENCE_IMAGE_MODEL: gpt-5-mini

    depends_on: [ meilisearch, chrome ]

  meilisearch:
    image: getmeili/meilisearch:latest
    restart: unless-stopped
    environment:
      MEILI_NO_ANALYTICS: true
      MEILI_MASTER_KEY: ${MEILI_MASTER_KEY}
    volumes:
      - ./meili-data:/meili_data

  chrome:
    image: gcr.io/zenika-hub/alpine-chrome:123
    restart: unless-stopped
    command:
      [
        chromium-browser,
        --headless,
        --no-sandbox,
        --disable-gpu,
        --disable-dev-shm-usage,
        --remote-debugging-address=0.0.0.0,
        --remote-debugging-port=9222,
        --hide-scrollbars,
      ]

docker compose up -d
docker compose logs -f karakeep

Reverse proxy

# Caddy
karakeep.example.com {
    reverse_proxy 127.0.0.1:3000
}

First-run

Browse to the URL, register the first user (becomes admin). Disable open signups in Settings after the first batch of users are created.

How content is captured

When you save a URL, Karakeep runs (in background):

Crawl — fetch the page; if the URL is dynamic, the headless Chromium executes JavaScript first.
Extract — pull the article text via a Mozilla Readability-style extraction.
Screenshot — capture a full-page rendered screenshot for visual reference.
OCR — for image-heavy pages or saved images, run OCR for searchability.
Index — push extracted text + metadata into MeiliSearch (see that tutorial) for fast full-text search.
AI tag (optional) — if LLM keys are configured, generate tags + a one-line summary.

You now have the article text searchable even if the original site goes down. Click the screenshot in the UI for the rendered version.

Mobile apps + browser extensions

Mobile (iOS + Android) — native apps. Share sheet integration: tap "Share" on any link from any app → Karakeep → saved.
Browser extension — Chrome / Firefox / Safari extensions. Click the icon to save the current page; right-click any link to save without navigating.
Web — paste any URL into the home page; same result.
API — POST to /api/v1/bookmarks with a JSON payload from any script.

Lists + tags

Bookmarks can be organized in:

Lists — manual folders ("Articles to read on the plane")
Tags — flat labels; auto-generated by the LLM if enabled, plus manual additions
Smart lists — query-based; "all articles tagged 'machine learning' with no read marker"

The combination of LLM auto-tagging + smart lists is the magic; "show me everything I've ever saved about Rust async" works without you ever manually tagging anything.

Highlights + notes

Per-bookmark: highlight text snippets, attach personal notes. Notes are markdown; highlights are stored alongside the source text for "what stood out from this article."

Auto-archive vs link-only

For sites that are unfriendly to crawlers (e.g. NYT paywall, Discord links), Karakeep stores the link only. For everything it can fetch, it fetches + archives.

For archived content, the saved view doesn't require re-fetching from the source. So when the original site dies, the article still lives in Karakeep.

Backups

All data lives under ./data/:

SQLite database (or Postgres if configured) — bookmarks + metadata
Asset storage — screenshots + extracted images + OCR text
MeiliSearch index data (separate volume) — rebuildable from primary data but worth backing up too

Restic on the data directory (see that tutorial) covers everything. The MeiliSearch index will rebuild from the SQLite db if lost.

RSS feed import

For "save every new post from this blog automatically," configure RSS feeds; Karakeep polls them and auto-saves new entries. Pair with FreshRSS (see that tutorial) for "RSS reader for active reading, Karakeep for permanent archive."

Karakeep vs alternatives

Pocket — Mozilla's discontinued / sunset hosted bookmarking service. The closest "save for later with article extraction" hosted equivalent. Karakeep replaces it self-hosted.
Wallabag — older PHP-based self-hosted equivalent. Less polished UI, no AI tagging, no full-page screenshots.
raindrop.io — commercial hosted service; rich features; not self-hosted.
linkding — lightweight self-hosted bookmark manager; just links + tags, no content archiving.
ArchiveBox — CLI-first; deep archival (full warc/png/pdf/screenshot per URL); heavier; less UI-friendly.

For "Pocket-shaped UX self-hosted, with content saved + LLM-tagged for free-text recall," Karakeep is the right pick in 2026.

The "research vault" workflow

Combine Karakeep with AnythingLLM (see that tutorial) or your favorite LLM tool: export Karakeep's archived content into a RAG workspace; the model can now answer questions grounded in everything you've ever saved. Personal-knowledge-base level of useful, without leaking your reading history to a third party.