Skip to content

Personal Knowledge Base

Grafana Architecture

Architecture¶

Grafana Server Components¶

The Grafana server itself is a stateless web application with the following internal layers:

flowchart TB
    subgraph Frontend["Frontend (TypeScript / React)"]
        direction LR
        DashUI["Dashboard UI"]
        ExploreUI["Explore"]
        AlertUI["Alerting UI"]
        PluginUI["Panel & App Plugins"]
    end

    subgraph Backend["Backend (Go)"]
        direction LR
        API["HTTP API Server"]
        Auth["Auth & RBAC"]
        QEngine["Query Engine"]
        AlertEng["Alert Rule Evaluator"]
        Prov["Provisioning Engine"]
        PluginMgr["Plugin Manager<br/>(gRPC subprocess host)"]
    end

    subgraph State["State Layer"]
        DB["Database<br/>(PostgreSQL / MySQL / SQLite)"]
        Cache["Session Cache<br/>(Redis / Memcached)"]
    end

    subgraph External["External Data Sources"]
        Prom["Prometheus / Mimir"]
        LokiDS["Loki"]
        TempoDS["Tempo"]
        SQL["MySQL / PostgreSQL"]
        ES["Elasticsearch"]
        CW["CloudWatch"]
    end

    Frontend --> API
    API --> Auth
    API --> QEngine
    API --> AlertEng
    API --> Prov
    QEngine --> PluginMgr
    PluginMgr -->|gRPC| External
    Auth --> DB
    AlertEng --> DB
    Prov --> DB
    Auth --> Cache

    style Frontend fill:#ff6600,color:#fff
    style Backend fill:#2a2d3e,color:#fff
    style State fill:#1a1d2e,color:#fff
    style External fill:#0d7377,color:#fff

Key Architectural Properties¶

Property	Detail
Stateless frontend	All state is in the external DB and cache
Plugin isolation	Backend plugins run as gRPC subprocesses
Provisioning	Dashboards, data sources, alerts loaded from YAML/JSON at startup
Multi-org	Single Grafana instance, multiple isolated organizations
API-first	All UI operations have corresponding REST API endpoints

LGTM Stack — Full Production Architecture¶

flowchart TB
    subgraph Apps["Instrumented Applications"]
        App1["Service A<br/>(OTel SDK)"]
        App2["Service B<br/>(OTel SDK)"]
        App3["Service C<br/>(Prometheus client)"]
    end

    subgraph Infra["Infrastructure"]
        K8s["Kubernetes"]
        Nodes["VM / Bare Metal"]
    end

    subgraph Collection["Grafana Alloy (DaemonSet / Sidecar)"]
        Recv["Receivers<br/>OTLP, Prometheus, Syslog"]
        Proc["Processors<br/>Batch, MemoryLimiter, ResourceDetection"]
        Exp["Exporters"]
    end

    subgraph Mimir["Grafana Mimir"]
        MD["Distributor"]
        MI["Ingester"]
        MQ["Querier"]
        MSg["Store-Gateway"]
        MC["Compactor"]
    end

    subgraph Loki["Grafana Loki"]
        LD["Distributor"]
        LI["Ingester"]
        LQ["Querier"]
        LQF["Query Frontend"]
        LC["Compactor"]
    end

    subgraph Tempo["Grafana Tempo"]
        TD["Distributor"]
        TI["Ingester"]
        TQ["Querier"]
        TQF["Query Frontend"]
        TMG["Metrics Generator"]
    end

    subgraph ObjStore["Object Storage (S3 / GCS / Azure)"]
        Blocks["Metric Blocks"]
        Chunks["Log Chunks + Index"]
        Traces["Trace Blocks (Parquet)"]
    end

    subgraph Grafana["Grafana Server (HA)"]
        GF1["Grafana Pod 1"]
        GF2["Grafana Pod 2"]
        GFn["Grafana Pod N"]
    end

    subgraph Supporting
        PG["PostgreSQL<br/>(Grafana metadata DB)"]
        Redis["Redis<br/>(Session cache)"]
        LB["Load Balancer / Ingress"]
    end

    Apps --> Collection
    Infra --> Collection
    Collection -->|remote_write| MD
    Collection -->|push| LD
    Collection -->|OTLP gRPC| TD

    MD --> MI
    MI --> Blocks
    MQ --> MI
    MQ --> MSg
    MSg --> Blocks
    MC --> Blocks

    LD --> LI
    LI --> Chunks
    LQF --> LQ
    LQ --> LI
    LQ --> Chunks
    LC --> Chunks

    TD --> TI
    TI --> Traces
    TQF --> TQ
    TQ --> TI
    TQ --> Traces
    TMG --> MD

    GF1 --> PG
    GF2 --> PG
    GFn --> PG
    GF1 --> Redis
    LB --> GF1
    LB --> GF2
    LB --> GFn

    Grafana -.->|PromQL| MQ
    Grafana -.->|LogQL| LQF
    Grafana -.->|TraceQL| TQF

    style Apps fill:#0d7377,color:#fff
    style Infra fill:#0d7377,color:#fff
    style Collection fill:#ff6600,color:#fff
    style Mimir fill:#7b42bc,color:#fff
    style Loki fill:#2a7de1,color:#fff
    style Tempo fill:#e65100,color:#fff
    style ObjStore fill:#0d1117,color:#fff
    style Grafana fill:#ff6600,color:#fff
    style Supporting fill:#1a1d2e,color:#fff

Mimir Architecture (Metrics)¶

flowchart LR
    subgraph Write["Write Path"]
        D["Distributor<br/>(validates, shards, replicates)"]
        I["Ingester<br/>(in-memory TSDB + WAL)"]
    end

    subgraph Read["Read Path"]
        QF["Query Frontend<br/>(splits, caches, queues)"]
        Q["Querier<br/>(executes PromQL)"]
        SG["Store-Gateway<br/>(indexes object storage)"]
    end

    subgraph Background["Background"]
        C["Compactor<br/>(vertical + horizontal compaction)"]
    end

    subgraph Storage["Object Storage"]
        OS["S3 / GCS / Azure<br/>(TSDB Blocks)"]
    end

    Prom["Prometheus / Alloy"] -->|remote_write| D
    D -->|hash ring| I
    I -->|flush every 2h| OS
    QF --> Q
    Q -->|recent data| I
    Q -->|historical data| SG
    SG --> OS
    C --> OS

    style Write fill:#7b42bc,color:#fff
    style Read fill:#2a7de1,color:#fff
    style Background fill:#1a1d2e,color:#fff
    style Storage fill:#0d1117,color:#fff

Deployment Modes¶

Mode	Description	Use Case
Monolithic	All components in a single process/pod	Dev, testing, small scale
Read-Write	Separate read and write paths	Medium scale
Microservices	Each component as independent pods	Production, hyperscale

Loki Architecture (Logs)¶

flowchart LR
    subgraph Write["Write Path"]
        LD["Distributor<br/>(validates, routes by label hash)"]
        LI["Ingester<br/>(compresses into chunks, indexes labels)"]
    end

    subgraph Read["Read Path"]
        LQF["Query Frontend<br/>(splits time ranges, queues)"]
        LQ["Querier<br/>(executes LogQL)"]
        LIG["Index Gateway<br/>(metadata lookups)"]
    end

    subgraph Background["Background"]
        LC["Compactor<br/>(merges index files, retention)"]
    end

    subgraph Storage["Object Storage"]
        LOS["S3 / GCS / Azure<br/>(Chunks + Index)"]
    end

    Alloy["Alloy / Promtail"] -->|push| LD
    LD --> LI
    LI -->|flush| LOS
    LQF --> LQ
    LQ -->|recent| LI
    LQ -->|historical| LIG
    LIG --> LOS
    LC --> LOS

    style Write fill:#2a7de1,color:#fff
    style Read fill:#0d7377,color:#fff
    style Background fill:#1a1d2e,color:#fff
    style Storage fill:#0d1117,color:#fff

Key Design Choice: Loki only indexes labels, not log content. This makes it 10–100x cheaper to operate than full-text-indexing alternatives (e.g., Elasticsearch) but requires effective label design.

Tempo Architecture (Traces)¶

flowchart LR
    subgraph Write["Write Path"]
        TD["Distributor<br/>(OTLP, Jaeger, Zipkin)"]
        TI["Ingester<br/>(Parquet columns + bloom filters)"]
    end

    subgraph Read["Read Path"]
        TQF["Query Frontend<br/>(splits, shards)"]
        TQ["Querier<br/>(TraceQL engine)"]
    end

    subgraph SideEffects["Side Effects"]
        TMG["Metrics Generator<br/>(RED metrics → Mimir)"]
    end

    subgraph Storage["Object Storage"]
        TOS["S3 / GCS / Azure<br/>(Parquet Trace Blocks)"]
    end

    OTel["Apps (OTel SDK)"] -->|OTLP| TD
    TD --> TI
    TD --> TMG
    TI -->|flush blocks| TOS
    TQF --> TQ
    TQ --> TI
    TQ --> TOS
    TMG -->|remote_write| Mimir["Mimir"]

    style Write fill:#e65100,color:#fff
    style Read fill:#ff6600,color:#fff
    style SideEffects fill:#7b42bc,color:#fff
    style Storage fill:#0d1117,color:#fff

Key Design Choice: No traditional index — Tempo uses Parquet columnar storage with bloom filters. TraceQL queries selectively load required columns, making large-scale trace search performant.

Kubernetes Deployment Topology¶

A typical production Grafana + LGTM deployment on Kubernetes uses these Helm charts:

Component	Helm Chart	Min Replicas	Scaling
Grafana	`grafana/grafana`	2+ (HA)	HPA on CPU/memory
Mimir	`grafana/mimir-distributed`	3+ ingesters	Per-component HPA
Loki	`grafana/loki`	3+ ingesters	Per-component HPA
Tempo	`grafana/tempo-distributed`	3+ ingesters	Per-component HPA
Alloy	`grafana/alloy` (DaemonSet)	1 per node	DaemonSet auto-scales
PostgreSQL	External managed (RDS/CloudSQL)	HA pair	Managed service
Redis	External managed (ElastiCache)	HA pair	Managed service