Healthchecks: self-hosted cron-job monitoring

Install via docker compose

# docker-compose.yml
services:
  healthchecks:
    image: healthchecks/healthchecks:latest
    container_name: healthchecks
    restart: unless-stopped
    ports:
      - "127.0.0.1:8000:8000"
    volumes:
      - ./data:/data
    environment:
      DEBUG: "False"
      SECRET_KEY: ${SECRET_KEY}
      ALLOWED_HOSTS: "ping.example.com"
      DEFAULT_FROM_EMAIL: "healthchecks@example.com"
      EMAIL_HOST: smtp.example.com
      EMAIL_PORT: 587
      EMAIL_HOST_USER: healthchecks@example.com
      EMAIL_HOST_PASSWORD: ${SMTP_PASSWORD}
      EMAIL_USE_TLS: "True"
      REGISTRATION_OPEN: "False"      # disable after creating your account
      SITE_ROOT: https://ping.example.com
      SITE_NAME: "Healthchecks"
      DB: sqlite                       # or postgres for >a few users
      DB_NAME: /data/hc.sqlite

docker compose up -d
docker compose logs -f healthchecks

For first-run signup, set REGISTRATION_OPEN: "True", create your account, then flip it back to "False" and restart.

Reverse proxy

# Caddy
ping.example.com {
    reverse_proxy 127.0.0.1:8000
}

Healthchecks needs to be HTTPS-reachable from every host that will ping it.

Create your first check

In the UI, click "+ Add Check":

Name: "Nightly DB backup"
Schedule: choose Simple with period "1 day" + grace "1 hour" (allow up to 1 day + 1h between pings before alerting). Or choose Cron and paste the exact cron expression for an irregular schedule.
Tags: optional grouping ("prod", "backup", "weekly").

The check page shows its unique ping URL: https://ping.example.com/<uuid>.

Ping from the job

At the end of a successful job, hit the ping URL:

# Bash cron entry
15 3 * * *  /usr/local/bin/db-backup.sh && curl -fsS -m 10 --retry 5 -o /dev/null \
    https://ping.example.com/<uuid>

# Or as a systemd ExecStartPost (see /tutorials/systemd-timers-cron-replacement.html)
[Service]
ExecStart=/usr/local/bin/db-backup.sh
ExecStartPost=/usr/bin/curl -fsS -m 10 --retry 5 -o /dev/null \
    https://ping.example.com/<uuid>

The && means the ping only fires on success (exit 0). If the job fails, no ping fires; Healthchecks notices the missing ping within the grace window and alerts.

Ping success / failure / start / log

# Plain success
curl https://ping.example.com/<uuid>

# Explicit failure ping (skips waiting for the timeout)
curl https://ping.example.com/<uuid>/fail

# Mark a long-running job as started, then success
curl https://ping.example.com/<uuid>/start
./run-job.sh
curl https://ping.example.com/<uuid>

# Include log output as the ping body (shown in the UI)
./run-job.sh 2>&1 | curl --data-binary @- https://ping.example.com/<uuid>

Wrap any command with runitor

runitor wraps any command, sending start + success/failure pings around it automatically:

runitor -uuid <uuid> -api-url https://ping.example.com -- /usr/local/bin/db-backup.sh

One binary, drop-in around any cron command. Handles start, success, failure, stdout/stderr capture, exit-code reporting.

Notification channels

Integrations → configure once per channel; per-check, attach the channels that should alert:

Email (already-configured SMTP)
Slack / Discord / Microsoft Teams webhooks
PagerDuty / OpsGenie / VictorOps
Pushover / Pushbullet / Gotify
SMS via Twilio
Generic webhook (POST to any URL)
ntfy.sh / self-hosted ntfy
Matrix (see that tutorial)
Signal (via signal-cli)

The webhook channel + n8n (see that tutorial) covers any custom integration.

Status pages

Per project, optionally publish a public status page that surfaces the checks: green/yellow/red status, uptime history. Useful for transparency with stakeholders ("yes, the nightly export ran"). Auth-required projects don't expose this; explicit opt-in.

API for programmatic management

The Healthchecks Management API lets you create / update / delete checks programmatically:

curl -X POST https://ping.example.com/api/v3/checks/ \
    -H "X-Api-Key: <api-key>" \
    -H "Content-Type: application/json" \
    -d '{
      "name": "Nightly backup — web-01",
      "schedule": "0 3 * * *",
      "grace": 3600,
      "tags": "prod backup",
      "channels": "*"
    }'

Useful for Ansible / Terraform / config-management that creates a check per host or per service automatically.

Backups for Healthchecks itself

The SQLite database under ./data/hc.sqlite is everything. A nightly restic on the directory (see that tutorial) covers it. For larger setups, switch to Postgres and back it up separately.

Where this catches things

The script that broke six months ago because a referenced binary moved — cron silently fails; Healthchecks notices.
The home internet that goes down and takes the backup VPN with it; the cloud-side health check stops getting pings.
The host that ran out of disk and stopped writing to the database the backup expected to read.
The certbot renewal that hasn't fired in 80 days because the .timer was disabled by accident.

The passive nature is the key insight: "if you don't hear from me, alert." Active monitoring (Beszel; see that tutorial) catches different failures; the two complement each other. Pair both, and silent-failure-bugs lose most of their hiding places.