LGTM Stack¶
Home | Knowledge Hub | Projects Hub
Summary¶
LGTM is the Grafana Labs open-source observability stack, named after its four core components: Loki (Logs), Grafana (visualization), Tempo (Traces), and Mimir (Metrics). A fifth pillar, Pyroscope (Profiles), is frequently included, sometimes expanding the acronym to LGTMP or referring to it as "big tent" observability.
The stack is purpose-built so each backend is independently scalable, uses object storage (S3/GCS/Azure Blob) as its primary persistence layer, and speaks OpenTelemetry natively. Grafana sits at the center as the single pane of glass, correlating across all signals.
| Component | Signal | Query Language | GitHub Stars | Latest Version |
|---|---|---|---|---|
| Grafana Mimir | Metrics | PromQL | ~5k ⭐ | 3.0.5 |
| Grafana Loki | Logs | LogQL | ~27.9k ⭐ | 3.7.1 |
| Grafana Tempo | Traces | TraceQL | ~4k ⭐ | 2.10.1 (3.0 in dev) |
| Grafana Pyroscope | Profiles | FlameQL | ~10k ⭐ | 1.20.2 |
| Grafana | Visualization | — | 73.1k ⭐ | 12.4.2 |
| Grafana Alloy | Collection | HCL (River) | — | 1.15.0 |
Evaluation¶
-
Why it's better: The only fully open-source stack that covers all four observability pillars (metrics, logs, traces, profiles) with cross-signal correlation in a single UI. Each backend is optimized for its signal type and uses cheap object storage, making the stack 3–10x cheaper than Datadog at scale.
-
When it fits (Applicability):
- Organizations with platform engineering capacity to operate multiple backends
- Teams standardizing on OpenTelemetry who want no vendor lock-in
- Cloud-native (Kubernetes) environments needing horizontal scalability
- Mixed environments with heterogeneous data sources
-
Budget-conscious organizations needing enterprise-grade observability at open-source cost
-
Pros and Cons:
| Pros | Cons |
|---|---|
| Each component best-of-breed for its signal type | Operational complexity — 4+ backends to manage |
| Object-storage-first = dramatically reduced cost | Requires solid Kubernetes & DevOps expertise |
| OpenTelemetry-native, no vendor lock-in | Signal correlation requires careful config |
| Massive community, battle-tested at scale | Query languages differ per signal (PromQL, LogQL, TraceQL, FlameQL) |
| Independent horizontal scaling per component | Multi-tenancy requires auth proxy setup |
All-in-one Docker image for dev (grafana/otel-lgtm) |
Production setup requires 6+ Helm charts |
| Cross-signal correlation (exemplars, derived fields) | Label cardinality is the #1 operational pitfall |
- Common Use Cases:
- Full-stack Kubernetes observability — metrics, logs, traces, and profiles from all workloads in one view
- Centralized enterprise observability platform — multi-tenant, shared infrastructure for multiple teams (Maersk, DHL, Salesforce pattern)
- Cost-effective log aggregation — replacing Elasticsearch with Loki for 10–100x cost reduction
- Distributed tracing at scale — Tempo handles 100M+ spans/day on object storage alone
- AI/ML pipeline observability — tracking model inference latency, GPU utilization, and training metrics
-
IoT and industrial telemetry — high-volume metric ingestion via Mimir
-
Licensing & Commercial Use:
- Grafana, Loki, Tempo: AGPL-3.0
- Mimir: AGPL-3.0
- Pyroscope: AGPL-3.0
- Alloy: Apache 2.0
- All components are free to self-host. If you modify the source and offer it as SaaS, you must release modifications under AGPL-3.0.
-
Grafana Cloud provides fully managed LGTM: Free ($0), Pro ($19/mo + usage), Enterprise ($25k+/yr)
-
Ecosystem & Data Connections:
- Ingestion protocols: OTLP (gRPC/HTTP), Prometheus remote_write, Jaeger, Zipkin, Syslog, FluentBit
- Collection: Grafana Alloy (primary), OpenTelemetry Collector, Prometheus, Promtail (legacy)
- Storage: S3, GCS, Azure Blob Storage, MinIO (self-hosted)
- IaC: Helm charts, Terraform provider, Jsonnet/Tanka, Ansible
-
Instrumentation: OpenTelemetry SDKs (Go, Java, Python, Node.js, .NET, Rust), auto-instrumentation agents, eBPF
-
Compatibility & Requirements:
- Runs on Kubernetes (recommended), Docker, or bare metal Linux
- Min dev setup:
docker run grafana/otel-lgtm(single container with all components) - Production requires: Kubernetes cluster, object storage, PostgreSQL (for Grafana metadata), Redis (for sessions)
-
Object storage is mandatory for Mimir, Loki, and Tempo in production
-
Alternatives:
- Datadog — All-in-one SaaS, highest cost, lowest ops burden
- SigNoz — Open-source, OTel-native, ClickHouse-backed, unified single-binary
- ELK Stack — Mature for logs, weaker for metrics/traces
- New Relic — SaaS, generous free tier, proprietary
- Splunk Observability — Enterprise, very expensive
-
OpenObserve — Open-source, Rust-based, single binary
-
Migration & Lock-in Risks:
- Low lock-in on individual components — each backend uses open storage formats
- Moderate lock-in on query languages — PromQL is universal, but LogQL, TraceQL, and FlameQL are Grafana-specific (well-documented, but not portable)
- Gradual migration is supported — run old and new stacks in parallel, move one signal at a time
- Migration from ELK: KQL/Lucene → LogQL requires query rewriting; Elasticsearch → Loki is a fundamental architecture shift (full-text index → label-only index)
-
Migration from Prometheus + Jaeger: Mimir accepts remote_write directly; Tempo accepts Jaeger protocol directly — both are near-drop-in replacements
-
Community Health & Support:
- Combined GitHub stars across components: 120k+ (Grafana 73k, Loki 28k, Mimir 5k, Tempo 4k, Pyroscope 10k)
- Battle-tested at: Maersk, DHL Express, Dutch Tax Office, Salesforce, and thousands of organizations
- Enterprise SLAs via Grafana Labs
- Active community forums, Slack, regular GrafanaCON conferences
Notes In This Folder¶
Related Topics¶
- Grafana — the visualization layer and hub of the LGTM stack
- Victoria Stack — competing full-stack (VictoriaMetrics + VictoriaLogs + VictoriaTraces), Apache 2.0, lower resource footprint
- LGTM vs Victoria Stack — canonical comparison note
- OpenTelemetry — the industry-standard telemetry collection framework used to feed the LGTM stack
- Observability Stacks Comparison — 6-way comparison including Coroot, SigNoz, SkyWalking, OpenObserve
Assets¶
Store local images, diagrams, and PDFs in the _assets/ subfolder. Prefer Mermaid for inline diagrams.
Next Actions¶
- Deep dive into Grafana Adaptive Metrics and Adaptive Logs (cost optimization features)
- ~~Research LGTM vs SigNoz comparison note~~ → covered in Observability Stacks Comparison
- Benchmark object storage costs across S3, GCS, and Azure Blob for LGTM workloads