Skip to content

LGTM Stack

Home | Knowledge Hub | Projects Hub

Summary

LGTM is the Grafana Labs open-source observability stack, named after its four core components: Loki (Logs), Grafana (visualization), Tempo (Traces), and Mimir (Metrics). A fifth pillar, Pyroscope (Profiles), is frequently included, sometimes expanding the acronym to LGTMP or referring to it as "big tent" observability.

The stack is purpose-built so each backend is independently scalable, uses object storage (S3/GCS/Azure Blob) as its primary persistence layer, and speaks OpenTelemetry natively. Grafana sits at the center as the single pane of glass, correlating across all signals.

Component Signal Query Language GitHub Stars Latest Version
Grafana Mimir Metrics PromQL ~5k ⭐ 3.0.5
Grafana Loki Logs LogQL ~27.9k ⭐ 3.7.1
Grafana Tempo Traces TraceQL ~4k ⭐ 2.10.1 (3.0 in dev)
Grafana Pyroscope Profiles FlameQL ~10k ⭐ 1.20.2
Grafana Visualization 73.1k ⭐ 12.4.2
Grafana Alloy Collection HCL (River) 1.15.0

Evaluation

  • Why it's better: The only fully open-source stack that covers all four observability pillars (metrics, logs, traces, profiles) with cross-signal correlation in a single UI. Each backend is optimized for its signal type and uses cheap object storage, making the stack 3–10x cheaper than Datadog at scale.

  • When it fits (Applicability):

  • Organizations with platform engineering capacity to operate multiple backends
  • Teams standardizing on OpenTelemetry who want no vendor lock-in
  • Cloud-native (Kubernetes) environments needing horizontal scalability
  • Mixed environments with heterogeneous data sources
  • Budget-conscious organizations needing enterprise-grade observability at open-source cost

  • Pros and Cons:

Pros Cons
Each component best-of-breed for its signal type Operational complexity — 4+ backends to manage
Object-storage-first = dramatically reduced cost Requires solid Kubernetes & DevOps expertise
OpenTelemetry-native, no vendor lock-in Signal correlation requires careful config
Massive community, battle-tested at scale Query languages differ per signal (PromQL, LogQL, TraceQL, FlameQL)
Independent horizontal scaling per component Multi-tenancy requires auth proxy setup
All-in-one Docker image for dev (grafana/otel-lgtm) Production setup requires 6+ Helm charts
Cross-signal correlation (exemplars, derived fields) Label cardinality is the #1 operational pitfall
  • Common Use Cases:
  • Full-stack Kubernetes observability — metrics, logs, traces, and profiles from all workloads in one view
  • Centralized enterprise observability platform — multi-tenant, shared infrastructure for multiple teams (Maersk, DHL, Salesforce pattern)
  • Cost-effective log aggregation — replacing Elasticsearch with Loki for 10–100x cost reduction
  • Distributed tracing at scale — Tempo handles 100M+ spans/day on object storage alone
  • AI/ML pipeline observability — tracking model inference latency, GPU utilization, and training metrics
  • IoT and industrial telemetry — high-volume metric ingestion via Mimir

  • Licensing & Commercial Use:

  • Grafana, Loki, Tempo: AGPL-3.0
  • Mimir: AGPL-3.0
  • Pyroscope: AGPL-3.0
  • Alloy: Apache 2.0
  • All components are free to self-host. If you modify the source and offer it as SaaS, you must release modifications under AGPL-3.0.
  • Grafana Cloud provides fully managed LGTM: Free ($0), Pro ($19/mo + usage), Enterprise ($25k+/yr)

  • Ecosystem & Data Connections:

  • Ingestion protocols: OTLP (gRPC/HTTP), Prometheus remote_write, Jaeger, Zipkin, Syslog, FluentBit
  • Collection: Grafana Alloy (primary), OpenTelemetry Collector, Prometheus, Promtail (legacy)
  • Storage: S3, GCS, Azure Blob Storage, MinIO (self-hosted)
  • IaC: Helm charts, Terraform provider, Jsonnet/Tanka, Ansible
  • Instrumentation: OpenTelemetry SDKs (Go, Java, Python, Node.js, .NET, Rust), auto-instrumentation agents, eBPF

  • Compatibility & Requirements:

  • Runs on Kubernetes (recommended), Docker, or bare metal Linux
  • Min dev setup: docker run grafana/otel-lgtm (single container with all components)
  • Production requires: Kubernetes cluster, object storage, PostgreSQL (for Grafana metadata), Redis (for sessions)
  • Object storage is mandatory for Mimir, Loki, and Tempo in production

  • Alternatives:

  • Datadog — All-in-one SaaS, highest cost, lowest ops burden
  • SigNoz — Open-source, OTel-native, ClickHouse-backed, unified single-binary
  • ELK Stack — Mature for logs, weaker for metrics/traces
  • New Relic — SaaS, generous free tier, proprietary
  • Splunk Observability — Enterprise, very expensive
  • OpenObserve — Open-source, Rust-based, single binary

  • Migration & Lock-in Risks:

  • Low lock-in on individual components — each backend uses open storage formats
  • Moderate lock-in on query languages — PromQL is universal, but LogQL, TraceQL, and FlameQL are Grafana-specific (well-documented, but not portable)
  • Gradual migration is supported — run old and new stacks in parallel, move one signal at a time
  • Migration from ELK: KQL/Lucene → LogQL requires query rewriting; Elasticsearch → Loki is a fundamental architecture shift (full-text index → label-only index)
  • Migration from Prometheus + Jaeger: Mimir accepts remote_write directly; Tempo accepts Jaeger protocol directly — both are near-drop-in replacements

  • Community Health & Support:

  • Combined GitHub stars across components: 120k+ (Grafana 73k, Loki 28k, Mimir 5k, Tempo 4k, Pyroscope 10k)
  • Battle-tested at: Maersk, DHL Express, Dutch Tax Office, Salesforce, and thousands of organizations
  • Enterprise SLAs via Grafana Labs
  • Active community forums, Slack, regular GrafanaCON conferences

Notes In This Folder

  • Grafana — the visualization layer and hub of the LGTM stack
  • Victoria Stack — competing full-stack (VictoriaMetrics + VictoriaLogs + VictoriaTraces), Apache 2.0, lower resource footprint
  • LGTM vs Victoria Stack — canonical comparison note
  • OpenTelemetry — the industry-standard telemetry collection framework used to feed the LGTM stack
  • Observability Stacks Comparison — 6-way comparison including Coroot, SigNoz, SkyWalking, OpenObserve

Assets

Store local images, diagrams, and PDFs in the _assets/ subfolder. Prefer Mermaid for inline diagrams.

Next Actions

  • Deep dive into Grafana Adaptive Metrics and Adaptive Logs (cost optimization features)
  • ~~Research LGTM vs SigNoz comparison note~~ → covered in Observability Stacks Comparison
  • Benchmark object storage costs across S3, GCS, and Azure Blob for LGTM workloads