SigNoz — How It Works¶
How SigNoz processes telemetry through its OTel-native pipeline, stores data in ClickHouse, and provides unified observability.
Data Pipeline¶
Ingestion Flow¶
flowchart LR
subgraph Sources["Data Sources"]
APP["App + OTel SDK"]
PROM["Prometheus"]
JAEG["Jaeger / Zipkin"]
FB["FluentBit / FluentD"]
end
subgraph Collector["SigNoz OTel Collector"]
Recv["Receivers\n(OTLP, Jaeger, Zipkin,\nPrometheus)"]
Proc["Processors\n(batch, memory_limiter,\nattribute, tail_sampling)"]
Exp["Exporters\n(ClickHouse)"]
end
subgraph Backend["SigNoz Backend"]
QS["Query Service\n(Go API)"]
FE["React Frontend"]
Rule["Ruler /\nAlertmanager"]
OpAMP["OpAMP Server\n(dynamic config)"]
end
subgraph CH["ClickHouse Cluster"]
T["signoz_traces"]
L["signoz_logs"]
M["signoz_metrics"]
end
Sources --> Recv --> Proc --> Exp --> CH
QS --> CH
Rule --> CH
QS --> FE
OpAMP -.->|reconfigure| Collector
OTel Collector Distribution¶
SigNoz ships a custom OpenTelemetry Collector distribution that includes:
| Component | Purpose |
|---|---|
| OTLP Receiver | Primary ingestion (gRPC + HTTP) |
| Prometheus Receiver | Scrape Prometheus targets |
| Jaeger/Zipkin Receiver | Legacy trace format support |
| FluentForward Receiver | FluentBit/FluentD log ingestion |
| Batch Processor | Batches data for efficient ClickHouse writes |
| Memory Limiter | Prevents OOM under load |
| Tail Sampling | Sample traces based on latency/error criteria |
| ClickHouse Exporter | Writes all signals to ClickHouse |
OpAMP (Open Agent Management Protocol)¶
SigNoz uses OpAMP for dynamic reconfiguration of the OTel Collector:
- Log pipelines: Add/modify log processing rules without collector restart
- Sampling rules: Adjust tail sampling dynamically
- Collector health: Monitor collector instances from the SigNoz UI
Storage Schema (ClickHouse)¶
Traces¶
-- signoz_traces.signoz_index_v2
-- Core trace/span index with columnar storage
-- Columns: traceID, spanID, serviceName, name, kind, durationNano,
-- statusCode, httpMethod, httpRoute, resourceAttributes, ...
-- Engine: MergeTree, partitioned by toDate(timestamp)
-- TTL: Configurable (default 7 days self-hosted, 15 days cloud)
Logs¶
-- signoz_logs.logs
-- Columnar log storage with full-text indexing
-- Columns: timestamp, body, severityText, severityNumber,
-- traceID, spanID, resourceAttributes, logAttributes
-- Engine: MergeTree, partitioned by toDate(timestamp)
-- Supports: JSON expansion, attribute indexing
Metrics¶
-- signoz_metrics.samples_v4
-- Time-series samples with metric metadata
-- Columns: metric_name, fingerprint, timestamp_ms, value,
-- labels (Map), temporality, type
-- Engine: MergeTree, partitioned by toDate(timestamp_ms)
-- Query: PromQL translated to ClickHouse SQL
Query Execution¶
Dual Query Language Support¶
| Signal | Query Language | How It Works |
|---|---|---|
| Metrics | PromQL | Translated to ClickHouse SQL by the query service |
| Logs | ClickHouse SQL | Direct columnar queries with filter pushdown |
| Traces | ClickHouse SQL | Span-level queries with attribute filtering |
| All | Query Builder | Visual query builder generates optimized CH SQL |
Query Builder → ClickHouse Translation¶
The React frontend's visual query builder generates structured query payloads that the Go query service translates into optimized ClickHouse SQL:
- User builds query visually (aggregation, filters, group-by)
- Frontend sends structured JSON payload to API
- Query Service compiles to ClickHouse SQL with proper materialized column usage
- ClickHouse executes with columnar vectorized processing
- Results returned as time-series or table data
Cross-Signal Correlation¶
SigNoz enables correlation between signals using shared identifiers:
flowchart LR
Trace["Trace\n(traceID)"] <-->|traceID in log| Log["Log\n(traceID, spanID)"]
Trace <-->|service + timestamp| Metric["Metric\n(service, operation)"]
Log <-->|service + timestamp| Metric
- Trace → Log: Click a span to see logs with matching
traceID - Log → Trace: Click a log with
traceIDto jump to the trace waterfall - Metric → Trace: Drill down from a latency spike to exemplar traces
Alerting Pipeline¶
flowchart LR
Rule["Alert Rule\n(PromQL / CH SQL)"] --> Eval["Ruler\n(periodic eval)"]
Eval -->|threshold breach| AM["Alertmanager"]
AM --> Slack["Slack"]
AM --> PD["PagerDuty"]
AM --> WH["Webhook"]
AM --> Email["Email"]
AM --> MST["MS Teams"]
- Rules can be defined on any signal type (metrics, logs, traces)
- Anomaly detection available for automated threshold learning
- Alert history tracked with state transitions