Skip to content

OpenObserve — Architecture

Component breakdown, deployment topologies, and storage architecture for OpenObserve.

System Architecture

flowchart TB
    subgraph Sources["Data Sources"]
        OTEL_S["OTel Collector\n(OTLP gRPC/HTTP)"]
        PROM_S["Prometheus\n(remote_write)"]
        ES_S["ES Bulk API\nclients"]
        FB_S["FluentBit /\nVector"]
        KF_S["Kinesis Firehose\n/ GCP Pub/Sub"]
        RUM_S["RUM SDK\n(browser)"]
    end

    subgraph O2Cluster["OpenObserve Cluster"]
        direction TB
        subgraph Stateless["Stateless Compute"]
            Router["Router\n(request dispatch)"]
            Ingester["Ingester\n(WAL → Parquet)"]
            Querier["Querier\n(DataFusion engine)"]
            Compactor["Compactor\n(file merging)"]
            AlertMgr["AlertManager\n(alerts + reports)"]
        end

        subgraph Infra["Infrastructure"]
            WAL["WAL\n(local disk\nmemtable)"]
            Cache["Disk Cache\n(querier-side)"]
        end
    end

    subgraph Storage["Storage Layer"]
        S3["Object Storage\n(S3/GCS/Azure/MinIO)"]
        PQ["Apache Parquet\n(Zstd compressed)"]
        Meta["Metadata Store\n(PostgreSQL / SQLite)"]
    end

    Sources --> Router
    Router --> Ingester
    Ingester --> WAL
    WAL -->|"flush every\n5min or size"| PQ
    PQ --> S3
    Querier -->|"scan"| S3
    Querier --> Cache
    Compactor -->|"merge"| S3
    AlertMgr --> Querier

    style Stateless fill:#e65100,color:#fff
    style Storage fill:#1565c0,color:#fff

Node Role Architecture

flowchart LR
    subgraph Roles["ZO_NODE_ROLE"]
        ALL["all\n(single node)"]
        R["router"]
        I["ingester"]
        Q["querier"]
        C["compactor"]
        A["alertmanager"]
    end

    subgraph Groups["ZO_NODE_ROLE_GROUP"]
        Default["default\n(user queries)"]
        Background["background\n(alerts, reports)"]
    end

    R --> I
    R --> Q
    R --> A
    Q -.- Default
    A -.- Background

Role Responsibility

Role State Scales CPU Profile Memory Profile
Router Stateless Horizontal Low Low
Ingester WAL on disk Horizontal Medium Medium (memtable)
Querier Cache on disk Horizontal High (DataFusion) High (scan buffers)
Compactor Stateless 1–2 nodes Medium Low
AlertManager Stateless 1–2 nodes Low Low

Storage Architecture

Data Path

sequenceDiagram
    participant Client as Client
    participant Ingester as Ingester
    participant WAL as Local WAL
    participant S3 as Object Storage
    participant Compactor as Compactor

    Client->>Ingester: JSON / OTLP / ES Bulk
    Ingester->>Ingester: Schema inference
    Ingester->>WAL: Write to memtable (Arrow batches)
    Note over WAL: Flush triggers:<br/>5 min elapsed OR<br/>file size threshold

    WAL->>S3: Write small Parquet file
    Note over S3: Small files (1-10 MB)

    loop Background compaction
        Compactor->>S3: Read small files
        Compactor->>Compactor: Sort, merge, re-partition
        Compactor->>S3: Write large Parquet file
        Compactor->>S3: Delete old small files
    end

    Note over S3: Large files (100+ MB)<br/>Sorted by time, partitioned by stream

Parquet File Structure

Layer Detail
Partitioning By organization → stream → date → time window
Compression Zstd (default), high compression ratio
Bloom filters Per-column, configurable for high-cardinality fields
Row groups Optimized for DataFusion predicate pushdown
Metadata Column statistics for partition pruning

Query Engine: DataFusion

flowchart LR
    SQL["SQL Query"] --> Parser["SQL Parser"]
    Parser --> LP["Logical Plan"]
    LP --> Opt["Optimizer\n(predicate pushdown,\nprojection pruning,\npartition pruning)"]
    Opt --> PP["Physical Plan"]
    PP --> Scan["Parquet Scanner\n(parallel, columnar)"]
    Scan --> S3_Q["Read from S3\n(only needed cols)"]
    S3_Q --> Exec["Vectorized Execution\n(Arrow batches)"]
    Exec --> Result["Query Result"]

    style Opt fill:#2e7d32,color:#fff

HA Deployment Topology

flowchart TB
    LB["Load Balancer"]

    subgraph Routers["Router Pool"]
        R1["Router 1"]
        R2["Router 2"]
    end

    subgraph Ingesters["Ingester Pool"]
        I1["Ingester 1\n(WAL /data1)"]
        I2["Ingester 2\n(WAL /data2)"]
        I3["Ingester 3\n(WAL /data3)"]
    end

    subgraph Queriers["Querier Pool"]
        Q1["Querier 1\n(cache /cache1)"]
        Q2["Querier 2\n(cache /cache2)"]
    end

    C1["Compactor"]
    A1["AlertManager"]

    S3_HA["S3 / MinIO\n(shared storage)"]
    PG["PostgreSQL\n(metadata)"]

    LB --> Routers
    R1 --> Ingesters
    R2 --> Ingesters
    R1 --> Queriers
    R2 --> Queriers
    Ingesters --> S3_HA
    Queriers --> S3_HA
    C1 --> S3_HA
    A1 --> Queriers

    Routers --> PG
    Ingesters --> PG
    Queriers --> PG

    style S3_HA fill:#1565c0,color:#fff

Sources