The Collector's pipeline model

Three component classes:

  • Receivers — accept telemetry from sources. OTLP/gRPC, OTLP/HTTP, Jaeger, Zipkin, Prometheus scrape, statsd, syslog, journald, kubeletstats, host metrics, file logs.
  • Processors — modify telemetry in flight. Batch, memory_limiter, attributes (add/remove/redact tags), filter, tail_sampling, transform.
  • Exporters — send telemetry out. OTLP/gRPC, Prometheus remote_write, Loki, Tempo, Jaeger, Elasticsearch, Datadog, Honeycomb, AWS X-Ray, file.

A pipeline wires receivers → processors → exporters per signal type (traces, metrics, logs).

Two distributions

  • otelcol-core — the minimal set of receivers/exporters, ~30 MB binary.
  • otelcol-contrib — everything maintained outside the core; bigger binary but covers nearly every vendor + niche protocol.

For "I want to send anything to anything," use otelcol-contrib. For a hardened production deployment, build a custom binary with only the components you actually need via the ocb (OpenTelemetry Collector Builder).

Install

OCV=0.114.0
curl -L -o /tmp/otelcol-contrib.tar.gz \
    "https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v${OCV}/otelcol-contrib_${OCV}_linux_amd64.tar.gz"
tar -xzf /tmp/otelcol-contrib.tar.gz -C /tmp
sudo mv /tmp/otelcol-contrib /usr/local/bin/

sudo useradd -r -s /sbin/nologin otelcol
sudo mkdir -p /etc/otelcol
sudo chown otelcol:otelcol /etc/otelcol

A working config: app traces + host metrics, exported to Tempo and Prometheus

# /etc/otelcol/config.yaml
receivers:
  otlp:
    protocols:
      grpc: { endpoint: 0.0.0.0:4317 }
      http: { endpoint: 0.0.0.0:4318 }

  hostmetrics:
    collection_interval: 30s
    scrapers:
      cpu:        {}
      memory:     {}
      disk:       {}
      filesystem: {}
      network:    {}
      load:       {}
      processes:  {}

  prometheus:
    config:
      scrape_configs:
        - job_name: 'self'
          scrape_interval: 30s
          static_configs:
            - targets: ['localhost:8888']        # the collector's own metrics

processors:
  memory_limiter:
    check_interval: 5s
    limit_percentage: 75
    spike_limit_percentage: 25

  resourcedetection:
    detectors: [env, system, docker]
    timeout: 2s

  batch:
    timeout: 1s
    send_batch_size: 1024

  attributes/redact:
    actions:
      - key: http.url
        action: update
        pattern: '(token=)[^&]+'
        value: '$$1REDACTED'

  tail_sampling:
    decision_wait: 10s
    policies:
      - name: errors
        type: status_code
        status_code: { status_codes: [ERROR] }
      - name: slow
        type: latency
        latency: { threshold_ms: 500 }
      - name: random-10pct
        type: probabilistic
        probabilistic: { sampling_percentage: 10 }

exporters:
  otlphttp/tempo:
    endpoint: http://tempo.lab:4318

  prometheusremotewrite:
    endpoint: http://prometheus.lab:9090/api/v1/write

  otlphttp/loki:
    endpoint: http://loki.lab:3100/otlp

extensions:
  health_check:    { endpoint: 0.0.0.0:13133 }
  pprof:           { endpoint: 0.0.0.0:1777 }
  zpages:          { endpoint: 0.0.0.0:55679 }

service:
  extensions: [health_check, pprof, zpages]

  pipelines:
    traces:
      receivers:  [otlp]
      processors: [memory_limiter, resourcedetection, tail_sampling, attributes/redact, batch]
      exporters:  [otlphttp/tempo]

    metrics:
      receivers:  [otlp, hostmetrics, prometheus]
      processors: [memory_limiter, resourcedetection, batch]
      exporters:  [prometheusremotewrite]

    logs:
      receivers:  [otlp]
      processors: [memory_limiter, resourcedetection, attributes/redact, batch]
      exporters:  [otlphttp/loki]

Three pipelines — one per signal type — share the memory_limiter, resourcedetection, and batch processors. Traces get tail-based sampling (keep all errors, all slow requests, 10% of the rest) and attribute redaction.

systemd unit

# /etc/systemd/system/otelcol.service
[Unit]
Description=OpenTelemetry Collector
After=network-online.target
Wants=network-online.target

[Service]
User=otelcol
Group=otelcol
ExecStart=/usr/local/bin/otelcol-contrib --config /etc/otelcol/config.yaml
Restart=always

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable --now otelcol
curl http://localhost:13133/         # health check
curl http://localhost:55679/         # zpages introspection (browse to it)

Pointing apps at the Collector

Most OTel SDKs default to http://localhost:4318 with OTLP/HTTP, or localhost:4317 with OTLP/gRPC. So an app on the same host just works:

OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
OTEL_SERVICE_NAME=my-app
OTEL_RESOURCE_ATTRIBUTES="deployment.environment=prod"

For containers or remote hosts, point at the Collector's network address. The Collector handles network buffering, retries, and back-pressure; apps don't need to know if Tempo is up.

Two deployment shapes

  1. Agent — one Collector per host (or one per pod via DaemonSet on K8s). Receives from local apps, forwards to a central Collector. Adds host-level enrichment (hostname, K8s labels, cloud region) close to the source.
  2. Gateway — a centralized Collector cluster behind a load balancer. Heavy processing (tail sampling, complex routing) lives here. Agents talk to it.

The two-tier agent-then-gateway shape is the standard production layout.

Tail-based sampling is the killer feature

Head-based sampling (decide whether to keep a trace at the moment it starts) is cheap but blind — you don't know yet whether the request will error. Tail-based sampling buffers traces until they complete, then decides — "keep this one because it errored" or "drop this one because it's a routine 5ms healthcheck." The Collector's tail_sampling processor implements it; the cost is RAM proportional to in-flight trace count, which is bounded by request rate × decision_wait.

Backend pairings

  • TracesGrafana Tempo (object-storage-backed, very cheap), Jaeger, Honeycomb, Datadog APM, AWS X-Ray.
  • Metrics — Prometheus (see that tutorial), VictoriaMetrics, Mimir, Datadog Metrics, CloudWatch.
  • Logs — Loki (see that tutorial), ElasticSearch / OpenSearch, Datadog Logs, CloudWatch Logs.

The Collector pattern means swapping any of these is one config change; the apps don't notice.

OTel vs Vector vs Fluent Bit

For log shipping specifically, Vector and Fluent Bit are smaller binaries with similar capabilities. The OTel Collector wins when you also want first-class trace and metric pipelines in one tool. For a logs-only pipeline, Vector is slimmer; for unified traces + metrics + logs, OTel Collector is the natural pick.