Apache SkyWalking — How It Works¶
How SkyWalking's OAP server processes telemetry, the new V2 engine architecture, and BanyanDB's purpose-built storage model.
Architecture Overview¶
flowchart TB
subgraph Probes["Probes & Agents"]
JA["Java Agent\n(bytecode injection)"]
LA["Language Agents\n(.NET, Go, Python, etc.)"]
ROVER["Rover\n(eBPF network profiling)"]
OTEL["OTel Collector\n(OTLP receiver)"]
ENVOY["Envoy ALS\n(access log service)"]
SAT["Satellite\n(edge proxy)"]
end
subgraph OAP["OAP Server"]
direction TB
RECV["Receiver Layer\n(gRPC, REST, Kafka)"]
ANAL["Analysis Core"]
subgraph DSL["V2 DSL Engines"]
OAL["OAL V2\n(metric aggregation)"]
MAL["MAL V2\n(Prometheus → metrics)"]
LAL["LAL V2\n(log analysis)"]
end
ALERT["Alerting Engine"]
TOPO["Topology Builder"]
QUERY["MQE Query Engine"]
end
subgraph Storage["Storage (Pluggable)"]
BDB["BanyanDB\n(recommended)"]
ES["Elasticsearch\n/ OpenSearch"]
CHSW["ClickHouse"]
PG["PostgreSQL"]
end
Probes -->|gRPC/REST| RECV
RECV --> ANAL
ANAL --> DSL
DSL --> Storage
ALERT --> Storage
TOPO --> Storage
QUERY --> Storage
QUERY --> UI["SkyWalking UI"]
V2 Engine Architecture (v10.4.0)¶
The v10.4.0 release introduces a major engine overhaul, replacing the Groovy-based DSL runtime with ANTLR4 parser + Javassist bytecode generation:
OAL V2 (Observability Analysis Language)¶
| Feature | V1 (Groovy) | V2 (ANTLR4 + Javassist) |
|---|---|---|
| AST model | Mutable, Groovy closures | Immutable, type-safe |
| Thread safety | ThreadLocal-dependent | No shared mutable state |
| Error reporting | Runtime exceptions | File, line, column at parse time |
| Testability | Requires parsing | Models constructible without parsing |
MAL V2 (Metric Analysis Language)¶
Converts Prometheus metrics into SkyWalking's internal metric model:
- Speedup: ~6.8x faster execution vs Groovy V1
- Compile-time validation: Syntax errors caught at startup
- Immutable AST: Thread-safe without ThreadLocal
LAL V2 (Log Analysis Language)¶
Processes log streams for extraction, filtering, and routing:
- Compile: ~39x faster than Groovy V1
- Execute: ~2.8x faster
- Breaking Change:
slowSql {}andsampledTrace {}sub-DSLs replaced withoutputTypemechanism
BanyanDB¶
BanyanDB is SkyWalking's purpose-built observability database — a combined columnar + time-series DB:
Architecture¶
flowchart LR
subgraph BanyanDB["BanyanDB Cluster"]
Liaison["Liaison Node\n(query routing)"]
Data1["Data Node 1\n(shard owner)"]
Data2["Data Node 2\n(shard owner)"]
DataN["Data Node N"]
end
OAP["OAP Server"] -->|gRPC| Liaison
Liaison --> Data1
Liaison --> Data2
Liaison --> DataN
Storage Model¶
| Concept | Description |
|---|---|
| Group | Logical namespace (e.g., sw_metric, sw_trace) |
| Measure | Metric storage — columnar format optimized for aggregation |
| Stream | Log/trace storage — append-only time-ordered |
| IndexRule | Secondary index definitions for query acceleration |
BanyanDB vs Elasticsearch¶
| Dimension | BanyanDB | Elasticsearch |
|---|---|---|
| RAM usage | ~5x less | Baseline |
| Disk usage | ~30% less | Baseline |
| Hot/Warm/Cold | Built-in lifecycle stages | Requires ILM policies |
| Guardrails | Disk-usage thresholds, query memory protectors | External monitoring |
| Purpose | Designed for observability | General-purpose search |
CLI Tool: bydbctl¶
# List groups
bydbctl group list
# Query a measure
bydbctl measure query --group sw_metric --name service_cpm
# Create an index rule
bydbctl indexrule create -f index-rule.yaml
# Check cluster status
# BanyanDB exposes HTTP UI on port 17913
curl http://banyandb:17913/api/healthz
Virtual Thread Support (JDK 25+)¶
v10.4.0 adds virtual thread support for JDK 25+:
| Pool | JDK < 25 | JDK 25+ |
|---|---|---|
| gRPC server handlers | Cached platform (unbounded) | Virtual threads |
| HTTP blocking handlers | Cached platform (max 200) | Virtual threads |
| Total OAP threads | 150+ | ~72 (~50% reduction) |
BatchQueue (Replaces DataCarrier)¶
v10.4.0 replaces the legacy DataCarrier with BatchQueue:
| Queue | Old Threads | New Threads | Old Buffer Slots | New Buffer Slots |
|---|---|---|---|---|
| L1 Aggregation | 26 | 10 (unified OAL+MAL) | ~12.5M | ~6.6M |
| L2 Persistence | 3 | 4 (unified) | ~1.34M | ~660K |
| TopN Persistence | 4 | 1 | 4K | 4K |
| Total | 36 | 15 | ~13.9M | ~7.3M |