Skip to content

Turn log queries into status signals.

Observer reads from Loki and Elasticsearch. An error rate or a specific pattern becomes a number, and that number drives your status page like any other metric.

Streamtail -f /var/log/checkout.logsvc=auth method=GET latency=23mssvc=workers task=cleanup ok=truesvc=billing method=POST latency=78mssvc=checkout req_id=d8a1 status=200svc=auth method=POST latency=89mssvc=checkout req_id=a4f2 status=500svc=workers task=sync ok=truesvc=auth method=GET latency=12mssvc=billing method=GET latency=22mssvc=checkout req_id=c3d9 status=500svc=auth method=POST latency=45mssvc=billing method=POST latency=92mssvc=workers task=enqueue ok=truesvc=checkout req_id=e5b6 status=500svc=auth method=GET latency=34mssvc=billing method=POST latency=56mssvc=workers task=sync ok=truesvc=checkout req_id=f2c7 status=500svc=auth method=POST latency=156mssvc=checkout req_id=b7e1 status=200svc=billing method=GET latency=12mssvc=workers task=cleanup ok=truesvc=auth method=GET latency=23mssvc=checkout req_id=d5b1 status=200Filtercount_over_time( {app="checkout"} |= "ERROR" [5m])VerdictSERVICEcheckoutOperationalDegradedthreshold1001234errors / 5m
Thousands of lines in, one number out. When the count crosses its threshold, the verdict flips.

Observer is not a log aggregation platform, and it does not try to be. Keep Loki and Elasticsearch for searching, retention, and forensics. Observer reads the one signal inside them that your customers actually feel, the query that returns a number, and turns it into status.

Sources

Where the query runs.

Two log stores, one contract: a query that aggregates many lines into a single value. That value is all Observer carries forward.

Grafana Loki
LogQL queries for error rates, pattern frequency, and event counts. The aggregation runs where the logs already live.
Elasticsearch & OpenSearch
Query DSL with aggregations. A bucket count or a metric aggregation collapses to one number Observer can read.
How it works

A query, then nothing special.

The work is in writing a query that returns one number. After that, a log-derived metric is just a metric.

  1. 01

    Point at your log store

    Give the agent read-only access to a Loki or Elasticsearch URL. Nothing is copied out; the query runs against your cluster.

  2. 02

    Write a query that returns a number

    A LogQL range aggregation or an ES aggregation. The result has to be a single value, not a page of log lines.

  3. 03

    The agent runs it on schedule

    Every 10 to 30 seconds the agent executes the query and reads back the one number it returns.

  4. 04

    The number behaves like any metric

    Same threshold, same dwell, same SLO and incident logic. A log-derived signal is indistinguishable downstream.

In practice

Signals buried in logs, surfaced.

The events worth a status change are usually already in your logs. Observer is the part that watches for them.

  • 5xx rate from access logs becomes the service-health verdict on a public status page.

  • A specific exception pattern crosses its frequency threshold and drafts an incident for on-call.

  • Successful login count dropping toward zero surfaces as an early security signal.

  • Background job completion rate falling below target reads as a reliability metric, not a buried log line.