The stack
- paperless-ngx (webserver) — Django app with the UI and API.
- paperless-ngx (consumer) — the same image in a worker mode that watches the consume folder and processes new files.
- redis — task broker for the consumer.
- postgres — metadata: titles, tags, correspondents, dates.
- gotenberg + tika (optional) — convert Office documents and emails to PDF before ingest.
docker compose
# docker-compose.yml (adapted from the upstream template)
services:
broker:
image: docker.io/library/redis:7
restart: unless-stopped
volumes:
- redisdata:/data
db:
image: docker.io/library/postgres:16
restart: unless-stopped
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: ${DB_PASSWORD}
volumes:
- pgdata:/var/lib/postgresql/data
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on: [db, broker, gotenberg, tika]
ports:
- "127.0.0.1:8000:8000"
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBHOST: db
PAPERLESS_DBUSER: paperless
PAPERLESS_DBPASS: ${DB_PASSWORD}
PAPERLESS_OCR_LANGUAGE: eng+fra
PAPERLESS_TIME_ZONE: America/Toronto
PAPERLESS_URL: https://docs.example.com
PAPERLESS_SECRET_KEY: ${SECRET_KEY}
PAPERLESS_TIKA_ENABLED: 1
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
gotenberg:
image: docker.io/gotenberg/gotenberg:8
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-javascript=true"
- "--chromium-allow-list=file:///tmp/.*"
tika:
image: docker.io/apache/tika:latest
restart: unless-stopped
volumes:
data:
media:
pgdata:
redisdata:
.env:
DB_PASSWORD=$(openssl rand -base64 36 | tr -d '\n')
SECRET_KEY=$(openssl rand -base64 60 | tr -d '\n')
Bring it up:
docker compose up -d
docker compose logs -f webserver
First start runs database migrations; then create the admin user:
docker compose exec webserver \
python manage.py createsuperuser
The consume folder
Anything dropped into ./consume/ on the host (mounted at /usr/src/paperless/consume in the container) is picked up automatically: PDFs, JPGs, PNGs, TIFFs, and (with Tika enabled) .docx, .odt, .eml, etc. The consumer:
- OCRs the document with Tesseract if needed
- Detects the date, correspondent, tags using existing matching rules
- Files the document by hash into
media/documents/originals/ - Generates a thumbnail
- Indexes the OCR'd text into the search engine
The original file is moved out of ./consume/ on success.
Reverse proxy
# Caddy
docs.example.com {
reverse_proxy 127.0.0.1:8000
request_body { max_size 500MB }
}
Set PAPERLESS_URL=https://docs.example.com in the env so Paperless generates correct redirect URLs.
How to actually use it
Three patterns:
- Drag-and-drop in the web UI. Simplest; works for occasional uploads.
- Mobile scan apps — Paperless Mobile for Android, several iOS clients via the REST API. Snap a photo, auto-crop, auto-deskew, upload — same workflow as Dropbox's Scan feature, but landing in your own server.
- Scanner with network destination. Document scanners (Brother ADS-2700W, Fujitsu ScanSnap, ePson WorkForce) can scan-to-SMB or scan-to-email. Point them at the consume folder share or an email address Paperless polls.
Tags, correspondents, and document types
These are Paperless's three classification dimensions:
- Tag — a multi-valued label (taxes, healthcare, warranty, important).
- Correspondent — the issuer / sender of the document (the city, the bank, the landlord).
- Document type — a single-valued classification (Invoice, Receipt, Statement, Letter, Contract).
For each, you can set matching rules: a Tag with "match any of: invoice, facture" auto-applies to every document whose OCR'd text contains those words. After a few hundred documents, Paperless's optional ML classifier (off by default) can learn from your manual classifications and start suggesting auto-tags.
Email ingest
Settings → Mail → Add IMAP account. Paperless polls the inbox, downloads attachments matching the rule (size limits, sender filters, subject patterns), ingests them, and optionally marks the source email as read or moves it to a different folder. Useful for "Bills" folders that auto-route into the archive.
Backups
Three things to capture:
data/— index, configurationmedia/— the original document files (the actual data, in their original format)- The Postgres database — metadata, tags, correspondents
For exportable, restorable backups, Paperless ships a one-command "document exporter":
docker compose exec webserver \
document_exporter ../export --no-progress-bar --compare-checksums
This dumps the entire archive plus its JSON metadata into ./export/ in a format Paperless can re-import later (document_importer). Combined with a restic job (see restic + S3) on that directory, you get versioned, offsite, encrypted backups of the whole archive.
Storage growth
OCR adds a separate searchable PDF/A version of each document next to the original. Rule of thumb: 1.2–1.5× storage of the originals. For a household archive of multiple years, plan on a few GB; for a small business, tens of GB is normal.
What it isn't
Paperless-ngx is an archive, not a collaboration tool. No real-time co-editing, no versioned edits, no shared workspaces (beyond per-document permissions in 2.x). For "team workflows on living documents," use Outline, Bookstack, or NextCloud. For "save the PDF the bank sent and find it again in three years," Paperless is exactly the right tool.