Version: 2.0.0 (Latest)

Configuration & Runtime Engine

PADAS Core is the runtime engine identity for the data plane: a single process that owns runtime state, execution scheduling, and persistence projection for streams, tasks, and connectors. It is not “just an API server”—the embedded control plane (padas-api on Axum + Rustls, /api/v1/*) orchestrates the same runtime topology the hot path executes. StreamRouter moves events through buffers; an optional WAL subscriber provides runtime durability; tasks are execution units on that graph; connectors are ingress/egress boundaries. Operators reason in API-driven orchestration terms (streams.json, tasks.json, connectors.json under [api.persistence].config_dir) while the engine reconciles registry persistence with live runtime reconciliation after restarts.

Overview

Core is the long-running runtime engine that materializes an execution graph from registry objects:

Subsystem	Implementation-aware role
StreamRouter	Per-stream routing fabric: buffer transitions, backpressure modes, and fan-out to subscribers—including the optional WAL subscriber path.
Runtime scheduling	Workers, virtual shards, and task/connector lifecycle map to OS-level execution; there is no out-of-process “pipeline scheduler” for product pipelines.
WAL subscriber behavior	WAL is a router subscriber writing segmented data under `[core.wal].path`; lagging consumers can hybrid-read from WAL per `[core.subscriber.lag]` (see Glossary → Consumer lag).
Runtime execution graph	Sources → stream buffers → (optional WAL) → task PDL → sink streams / connector sinks; one single-process runtime graph.
Task orchestration	Tasks subscribe to source streams, run PDL in processing or detection mode, manage aggregation state, and route outputs—execution unit semantics.
Connector lifecycle	Start/stop/restart via `/api/v1/connectors` maps to connector runtime threads/async work—runtime lifecycle tied to the same process.
Embedded API orchestration	Axum router under `[api].prefix` (default `/api/v1`) mutates runtime state and persists registry JSON without a sidecar control plane.
Runtime persistence projection	`streams.json`, `tasks.json`, `connectors.json` are the on-disk persistence projection of API-managed objects; startup registry reconciliation reloads them into the runtime engine.

Why Core is not just an API server: the REST surface is the embedded control plane for runtime coordination; disabling or mis-sizing the engine does not leave a “headless API”—it leaves a broken streaming runtime.

Why pipelines are runtime compositions: a pipeline in product language is not a standalone service—it is realized as runtime objects (streams, tasks, connectors) under one scheduler inside this single-process runtime graph.

Runtime state synchronization: UI and automation drive desired state through /api/v1/*; the process applies changes, persists registry files, and reflects status on …/status routes—runtime state and config state (see Runtime persistence model) must stay aligned across rolling restarts and reload semantics.

Runtime role

In runtime topology terms, Core is the single-process runtime engine that executes workloads described by:

Streams — stream transport: named channels with buffers, retention, optional WAL, and HTTP produce/consume surfaces. Stream topology realization is “which routers exist and how they connect.”
Tasks — execution units: subscriptions to source streams, PDL queries, concurrency limits, aggregation windows, and sinks (processing vs detection modes).
Connectors — ingress/egress boundaries: sources pump events into streams; sinks drain tasks or streams per connector class semantics (class-specific backpressure / drop policies—Glossary → Connector class).

Runtime graph realization: there is no separate “pipeline microservice”—the pipeline is not a standalone service; it is pipeline realized as runtime objects sharing runtime coordination, IO, and persistence ownership in one address space.

Runtime object ownership: the process owns buffers, WAL segments under [core.wal], task internal state, and connector handles—execution isolation is cooperative (caps, modes), not separate JVMs per pipeline.

Persistence recovery: on start, registry JSON plus padas.toml / padas.default.toml drive startup orchestration: restore routers, replay or resume consumers per offset and WAL policy, then serve /api/v1/*.

Shutdown guarantees: bounded by [core].shutdown_timeout_secs; for clean runtime lifecycle, drain or pause ingest before SIGTERM in production so sink delivery and WAL flush windows can complete.

Core responsibilities

Responsibility	Where it lives	Operational notes
Runtime coordination	Engine + `[core]`	Workers, shards, startup/shutdown timeouts; virtual shard count affects hashing and parallelism—profile before raising.
Stream lifecycle	Engine + `streams.json`	Create/update/delete, pause/resume/restart; per-stream WAL and buffer overlays via API or persisted registry.
Stream durability behavior	`[core.stream.wal]`, `[core.wal]`, WAL subscriber	WAL segments + retention; `sync_writes` trades latency for durability; high-throughput profiles sometimes disable per-stream WAL—explicit runtime durability trade-off (Glossary → WAL).
Consumer recovery	StreamRouter + offsets + `[core.subscriber.lag]`	Execution scheduling resumes from persisted offsets; hybrid WAL reads when lag exceeds threshold / timeout_ms.
Task execution	`[core.task]`, `tasks.json`	Execution state management for PDL, aggregation cleanup/watermarks (`[core.task.aggregation.]`); overlays in task JSON vs `padas.toml`* must be reconciled before change windows (Glossary → Watermark, Tasks).
Connector execution	`connectors.json` + connector runtime	Start/stop/restart mirrors tasks; runtime lifecycle and backpressure are class-specific.
Execution state management	In-process task/agg state + registry	Restart clears hot state; recovery depends on replay behavior and window definitions.
Persistence ownership	`[api.persistence].config_dir`, `core.data_dir`, WAL path	Registry vs WAL vs instance UUID paths—stateful paths must stay stable across VM migration.
Embedded control plane	`[api]`, Axum, *`/api/v1/`**	API-driven orchestration; `POST /api/v1/system/reload` applies supported `padas.toml` deltas where implemented—runtime reload semantics vs full restart differ by key.
Runtime reload semantics	`POST /api/v1/system/reload`	Prefer `validate_only` in automation before applying; unsupported keys still need process restart.
Metrics / internal streams	`[observability]`, `GET /api/v1/metrics`	Internal/system streams (e.g. `_padas_metrics`) feed observability—observability implications on hot path when metrics level is high.
Runtime API protection	`[api.auth]`, service-account JSON	Axum runtime auth middleware when enabled—see Service-account authentication and Security.

Configuration file

Hierarchy

Config layering is fixed: padas.default.toml (immutable vendor defaults) → operator padas.toml → environment overrides (e.g. PADAS_HOME). Runtime precedence: later layers override earlier for the same keys. Treat padas.default.toml as read-only baseline; keep deployment portability overrides in padas.toml beside it under $PADAS_HOME/etc/. Stateful paths (data_dir, WAL, [api.persistence].config_dir) belong in operator config so reload behavior and backups stay consistent.

Reload behavior vs runtime restart implications: POST /api/v1/system/reload validates and applies a subset of changes; anything outside that set requires a process restart—plan maintenance window considerations accordingly. Full doc: Runtime configurations.

Minimal lab profile (TLS and auth off)

Use only on trusted localhost. Auth-disabled operational risks: with [api.auth].enabled = false, /api/v1/* is unauthenticated runtime API exposure—never on shared networks (Security).

[api]
host = "127.0.0.1"

[api.tls]
enabled = false

[api.auth]
enabled = false

Production-oriented API bind and transport

[api]
enabled = true
host = "0.0.0.0"
port = 8999
prefix = "/api/v1"

[api.tls]
enabled = true
cert_file = "/var/lib/padas/etc/api.crt"
key_file = "/var/lib/padas/etc/api.key"

[api.cors]
enabled = true
allowed_origins = "https://padas-ui.example.com"
allow_credentials = false
allowed_methods = "GET,POST,PUT,DELETE,OPTIONS"
allowed_headers = "Content-Type,Authorization"
preflight_max_age_secs = 3600

Service-account authentication (summary)

When [api.auth].enabled = true, Axum runtime auth middleware requires Authorization: Bearer <token> on every embedded route under prefix, including /api/v1/health and /metrics—full runtime API protection for the surface. Token material lives in JSON at [api.auth].service_account_token_file (default ./data/security/service-account.token relative to PADAS_HOME). Service token lifecycle (rotation, grace, lockout, X-Forwarded-For) and UI/Core trust model are documented in Security — runtime operational security reference. Rustls terminates TLS per [api.tls] in padas-api.

Registry persistence (API-managed objects)

[api.persistence]
config_dir = "/var/lib/padas/data/registry"
streams_registry_file = "streams.json"
tasks_registry_file = "tasks.json"
connectors_registry_file = "connectors.json"
backup_enabled = true
backup_dir = "/var/lib/padas/data/registry/backups"
max_backups = 10

Runtime projection: config_dir holds the authoritative persistence projection of API-managed streams/tasks/connectors—the registry durability operators backup. Reconciliation behavior: on startup the engine loads JSON and aligns runtime state with disk; runtime drift implications appear when manual file edits disagree with last API write or when restores are partial.

Backup strategy: snapshot config_dir with backup_enabled rotation and external volume backup—restore expectations include restarting Core so registry reconciliation runs against a consistent trio of files. Persistence ownership is split: registry JSON for config state, WAL/data dirs for runtime durability and task windows—restore one without the other at your own risk.

Engine, data paths, and timeouts

[core]
workers = 0
virtual_shards = 4096
data_dir = "/var/lib/padas/data"
startup_timeout_secs = 30
shutdown_timeout_secs = 30

instance_uuid is normally auto-generated on first start; keep data_dir and WAL paths stable so offset persistence and segment files remain consistent across rolling restart implications.

Streams, buffers, and WAL defaults

Per-stream buffers default under [core.stream.buffer] (max_events, mode = timeout | drop | block, timeout_ms)—they define stream backpressure on the hot path.

[core.stream.buffer]
max_events = 25000
max_bytes = 0
timeout_ms = 10
mode = "timeout"

[core.stream.wal]
enabled = false

[core.stream.wal.batch]
max_events = 100
max_timeout_ms = 100

Global WAL storage and retention:

[core.wal]
path = "/var/lib/padas/data/wal"
retention_ms = 3600000
retention_bytes = 1073741824
max_segment_size = 104857600
max_segments = 20
sync_writes = false

[core.wal.compression]
enabled = true
algorithm = "zstd"
level = 1

WAL / stream relationship:

With [core.stream.wal].enabled = true, appended events are durably segmented under [core.wal].path subject to retention and sync_writes.
High-throughput modes may disable stream WAL entirely—explicit runtime durability trade-off vs latency (Glossary → WAL).

Subscriber lag and hybrid reads

Controls when consumers transition from StreamRouter buffers to WAL-backed reads—replay lag and consumer lag tuning (Glossary → Consumer lag):

[core.subscriber.lag]
threshold = 50
timeout_ms = 100

[core.subscriber.batch]
max_events = 500

[core.subscriber.offset.batch]
max_events = 1000

[core.subscriber]
enable_adaptive_capacity = true

Tasks and aggregation (runtime execution)

[core.task]
# Values from shipped defaults; raise workers/thread caps only after profiling.

[core.task.aggregation.cleanup]
# Window state retention — align with PDL windows and disk budget.

[core.task.aggregation.watermark]
# Idle / shutdown flush behaviour for tumbling windows (watermark / cleanup).

Task REST payloads may supply execution overlays mapping to these tables—resolve which layer wins (TOML vs task JSON) before production edits (Tasks).

Observability hooks

Prefer [observability.logging].format = "Json" for centralized logging; [observability.metrics_collection].level increases runtime observability cost on the hot path. GET /api/v1/metrics exposes Prometheus text when enabled.

Streams, tasks, and connectors inside one runtime engine

Event lifecycle on the runtime execution graph:

Ingestion — HTTP produce, connector sources, or internal generators append to a stream’s StreamRouter buffer (buffer transitions).
WAL handoff — optional WAL subscriber persists copies per stream/WAL policy before downstream visibility guarantees are defined by mode.
Task execution — tasks subscribe to source streams; task execution ordering is defined by the engine’s subscription and shard assignment—task saturation shows as lag, drops, or error status.
Sink routing — PDL outputs land on sink streams or connector sinks; sink delivery respects connector-specific backpressure.
Consumer lag behavior — subscribers past [core.subscriber.lag] thresholds read from WAL segments (execution isolation is not cross-process; it is per-task/connector caps).

Runtime state transitions: streams move created → active → paused/stopped; tasks/connectors expose running, stopped, error—use …/status routes for runtime diagnostics.

Control plane: REST CRUD on /api/v1/streams, /tasks, /connectors updates live runtime state and atomically persists streams.json / tasks.json / connectors.json. Lifecycle start / stop / restart maps to in-process runtime orchestration—not external Kubernetes job semantics unless you wrap them yourself.

Runtime execution model

Stage	Runtime behavior
Ingestion	Events enter StreamRouter buffers from HTTP, connectors, or internal producers.
Routing	Shards/virtual shards map keys to internal queues; fan-out to subscribers and WAL path.
Buffering	`[core.stream.buffer]` modes (`timeout`, `drop`, `block`) define stream backpressure before drops or blocking producers.
WAL persistence	Optional subscriber writes batches to `[core.wal].path` with retention and compression.
Task execution	PDL runs per execution unit; aggregation windows hold runtime state until cleanup/watermark.
Sink delivery	Tasks and connector sinks consume from router and/or WAL hybrid reads.
Offset management	Offset persistence enables consumer recovery; batching under `[core.subscriber.offset.batch]` tunes commit frequency.
Replay behavior	After restart or lag, consumers may replay from WAL per policy—coordinate with task execution windows to avoid duplicate side effects in sinks.

Runtime persistence model

Concern	Behavior
WAL vs registry persistence	WAL = runtime durability for stream event history (policy-dependent). Registry JSON = config state for streams/tasks/connectors the API last wrote.
Runtime state vs config state	Runtime state includes hot buffers, in-flight connector handles, aggregation windows; config state is the persisted registry + `padas.toml`.
Recovery model	Startup reloads registry + merges `padas.toml` layers; WAL segments inform replay semantics for lagging subscribers.
Replay semantics	Defined by offset + WAL retention—if retention is shorter than outage, you lose replay range.
Stream durability	Tied to WAL enablement, `sync_writes`, and retention bytes/time—size for WAL growth worst case.
Offset persistence	Drives consumer recovery after rolling restart implications—validate backup includes offset stores if your deployment relies on them on disk under `data_dir`.
Registry restore expectations	Restoring only `streams.json` without matching tasks/connectors yields partial runtime topology—expect reload failures or manual cleanup.

Runtime orchestration boundaries

Boundary	Responsibility
UI	Operator desired state and Core pairing (`account_token`); does not execute PDL or hold StreamRouter.
Core runtime	Runtime engine: StreamRouter, WAL, tasks, connectors, offsets, execution scheduling.
Registry	`streams.json`, `tasks.json`, `connectors.json`—persistence projection and runtime reconciliation input.
REST API	Embedded control plane: *`/api/v1/`, `POST /api/v1/system/reload`, auth middleware—orchestration boundary** to automation.
Connectors	External IO; translate network/files/syslog into stream events or drain to downstream systems.
Stream engine	In-process stream transport and buffer/WAL stack shared by all pipelines.

Operational notes

Topic	Guidance
Rolling restart implications	One node at a time still drops runtime state for that process—expect replay lag and brief connector stalls; upstream should tolerate redelivery where at-least-once applies.
WAL sizing	Plan `retention_bytes`, `max_segments`, and disk IO for burst ingest; WAL growth without retention tuning fills disks.
State recovery	Validate `startup_timeout_secs` against large registries + WAL replay; startup hangs often trace here.
Runtime reload safety	Use `validate_only` on `POST /api/v1/system/reload` in CI/CD; bad TOML can still risk reload failures depending on validation coverage.
Persistence corruption considerations	Truncated JSON in `config_dir` or corrupt WAL segments—restore from backup; expect persistence inconsistencies until a clean snapshot is mounted.
Deployment synchronization	`padas.toml` on disk must match what automation assumes; UI Core rows only store reachability + Bearer—deployment synchronization is still file + registry ops.
API orchestration behavior	High-frequency CRUD without backpressure can task saturate the control plane threadpool—pace automation.
Runtime drift	Manual edits to JSON while Core is running can diverge from in-memory runtime state—prefer API or controlled restarts.
Observability implications	Verbose metrics/logging increase CPU; correlate with Monitoring during incidents.
Maintenance window considerations	Coordinate TLS rotation, service-account.token rotation, and `secret.json`-less Core changes with UI token refresh (Security).

Runtime troubleshooting checkpoints

Symptom	Where to look first
Startup hangs	`startup_timeout_secs`, registry size, WAL replay volume, TLS cert load (Rustls).
WAL growth	Retention, `max_segments`, ingest rate vs sink throughput; disk free space.
Stalled consumers	`[core.subscriber.lag]`, offset commits, downstream sink health.
Replay lag	WAL read path vs buffer path; segment count; consumer lag behavior.
Reload failures	`POST /api/v1/system/reload` response body; unsupported keys still need restart.
Stream backpressure	Buffer `mode`, `max_events`, drop metrics vs ingress EPS.
Task saturation	`[core.task]` worker/thread caps, PDL cost, aggregation windows.
Connector stalls	Connector `status`, external dependency timeouts, sink rate limits.
Persistence inconsistencies	Mtime skew across `streams.json` / `tasks.json` / `connectors.json`; partial restore.

Deeper playbooks: Troubleshooting & Logs.

Performance Tuning

TBD

REST API Reference — /api/v1/*, curl, auth headers
Monitoring — metrics, EPS, incident correlation
Stream configuration — operator topology language vs runtime topology
Security — runtime operational security reference — Bearer, TLS, runtime API exposure
Runtime configurations — padas.toml / padas.default.toml, PADAS_HOME, load order
Troubleshooting & Logs — reachability, TLS, auth triage
Glossary — runtime vocabulary (streams, WAL, tasks, operations)
Cores — UI pairing and account_token

Overview​

Runtime role​

Core responsibilities​

Configuration file​

Hierarchy​

Minimal lab profile (TLS and auth off)​

Production-oriented API bind and transport​

Service-account authentication (summary)​

Registry persistence (API-managed objects)​

Engine, data paths, and timeouts​

Streams, buffers, and WAL defaults​

Subscriber lag and hybrid reads​

Tasks and aggregation (runtime execution)​

Observability hooks​

Streams, tasks, and connectors inside one runtime engine​

Runtime execution model​

Runtime persistence model​

Runtime orchestration boundaries​

Operational notes​

Runtime troubleshooting checkpoints​

Performance Tuning​

Related pages​