Skip to content

SigNoz — How It Works

How SigNoz processes telemetry through its OTel-native pipeline, stores data in ClickHouse, and provides unified observability.

Data Pipeline

Ingestion Flow

flowchart LR
    subgraph Sources["Data Sources"]
        APP["App + OTel SDK"]
        PROM["Prometheus"]
        JAEG["Jaeger / Zipkin"]
        FB["FluentBit / FluentD"]
    end

    subgraph Collector["SigNoz OTel Collector"]
        Recv["Receivers\n(OTLP, Jaeger, Zipkin,\nPrometheus)"]
        Proc["Processors\n(batch, memory_limiter,\nattribute, tail_sampling)"]
        Exp["Exporters\n(ClickHouse)"]
    end

    subgraph Backend["SigNoz Backend"]
        QS["Query Service\n(Go API)"]
        FE["React Frontend"]
        Rule["Ruler /\nAlertmanager"]
        OpAMP["OpAMP Server\n(dynamic config)"]
    end

    subgraph CH["ClickHouse Cluster"]
        T["signoz_traces"]
        L["signoz_logs"]
        M["signoz_metrics"]
    end

    Sources --> Recv --> Proc --> Exp --> CH
    QS --> CH
    Rule --> CH
    QS --> FE
    OpAMP -.->|reconfigure| Collector

OTel Collector Distribution

SigNoz ships a custom OpenTelemetry Collector distribution that includes:

Component Purpose
OTLP Receiver Primary ingestion (gRPC + HTTP)
Prometheus Receiver Scrape Prometheus targets
Jaeger/Zipkin Receiver Legacy trace format support
FluentForward Receiver FluentBit/FluentD log ingestion
Batch Processor Batches data for efficient ClickHouse writes
Memory Limiter Prevents OOM under load
Tail Sampling Sample traces based on latency/error criteria
ClickHouse Exporter Writes all signals to ClickHouse

OpAMP (Open Agent Management Protocol)

SigNoz uses OpAMP for dynamic reconfiguration of the OTel Collector:

  • Log pipelines: Add/modify log processing rules without collector restart
  • Sampling rules: Adjust tail sampling dynamically
  • Collector health: Monitor collector instances from the SigNoz UI

Storage Schema (ClickHouse)

Traces

-- signoz_traces.signoz_index_v2
-- Core trace/span index with columnar storage
-- Columns: traceID, spanID, serviceName, name, kind, durationNano,
--          statusCode, httpMethod, httpRoute, resourceAttributes, ...
-- Engine: MergeTree, partitioned by toDate(timestamp)
-- TTL: Configurable (default 7 days self-hosted, 15 days cloud)

Logs

-- signoz_logs.logs
-- Columnar log storage with full-text indexing
-- Columns: timestamp, body, severityText, severityNumber,
--          traceID, spanID, resourceAttributes, logAttributes
-- Engine: MergeTree, partitioned by toDate(timestamp)
-- Supports: JSON expansion, attribute indexing

Metrics

-- signoz_metrics.samples_v4
-- Time-series samples with metric metadata
-- Columns: metric_name, fingerprint, timestamp_ms, value,
--          labels (Map), temporality, type
-- Engine: MergeTree, partitioned by toDate(timestamp_ms)
-- Query: PromQL translated to ClickHouse SQL

Query Execution

Dual Query Language Support

Signal Query Language How It Works
Metrics PromQL Translated to ClickHouse SQL by the query service
Logs ClickHouse SQL Direct columnar queries with filter pushdown
Traces ClickHouse SQL Span-level queries with attribute filtering
All Query Builder Visual query builder generates optimized CH SQL

Query Builder → ClickHouse Translation

The React frontend's visual query builder generates structured query payloads that the Go query service translates into optimized ClickHouse SQL:

  1. User builds query visually (aggregation, filters, group-by)
  2. Frontend sends structured JSON payload to API
  3. Query Service compiles to ClickHouse SQL with proper materialized column usage
  4. ClickHouse executes with columnar vectorized processing
  5. Results returned as time-series or table data

Cross-Signal Correlation

SigNoz enables correlation between signals using shared identifiers:

flowchart LR
    Trace["Trace\n(traceID)"] <-->|traceID in log| Log["Log\n(traceID, spanID)"]
    Trace <-->|service + timestamp| Metric["Metric\n(service, operation)"]
    Log <-->|service + timestamp| Metric
  • Trace → Log: Click a span to see logs with matching traceID
  • Log → Trace: Click a log with traceID to jump to the trace waterfall
  • Metric → Trace: Drill down from a latency spike to exemplar traces

Alerting Pipeline

flowchart LR
    Rule["Alert Rule\n(PromQL / CH SQL)"] --> Eval["Ruler\n(periodic eval)"]
    Eval -->|threshold breach| AM["Alertmanager"]
    AM --> Slack["Slack"]
    AM --> PD["PagerDuty"]
    AM --> WH["Webhook"]
    AM --> Email["Email"]
    AM --> MST["MS Teams"]
  • Rules can be defined on any signal type (metrics, logs, traces)
  • Anomaly detection available for automated threshold learning
  • Alert history tracked with state transitions

Sources