Skip to main content
Version: 2.0.0 (Latest)

Configuration & Runtime Engine

PADAS Core is the runtime engine identity for the data plane: a single process that owns runtime state, execution scheduling, and persistence projection for streams, tasks, and connectors. It is not “just an API server”—the embedded control plane (padas-api on Axum + Rustls, /api/v1/*) orchestrates the same runtime topology the hot path executes. StreamRouter moves events through buffers; an optional WAL subscriber provides runtime durability; tasks are execution units on that graph; connectors are ingress/egress boundaries. Operators reason in API-driven orchestration terms (streams.json, tasks.json, connectors.json under [api.persistence].config_dir) while the engine reconciles registry persistence with live runtime reconciliation after restarts.

Related: REST API Reference · Runtime configurations · Monitoring · Stream configuration · Security — runtime operational security reference · Troubleshooting & Logs · Glossary · Cores


Overview

Core is the long-running runtime engine that materializes an execution graph from registry objects:

SubsystemImplementation-aware role
StreamRouterPer-stream routing fabric: buffer transitions, backpressure modes, and fan-out to subscribers—including the optional WAL subscriber path.
Runtime schedulingWorkers, virtual shards, and task/connector lifecycle map to OS-level execution; there is no out-of-process “pipeline scheduler” for product pipelines.
WAL subscriber behaviorWAL is a router subscriber writing segmented data under [core.wal].path; lagging consumers can hybrid-read from WAL per [core.subscriber.lag] (see Glossary → Consumer lag).
Runtime execution graphSources → stream buffers → (optional WAL) → task PDL → sink streams / connector sinks; one single-process runtime graph.
Task orchestrationTasks subscribe to source streams, run PDL in processing or detection mode, manage aggregation state, and route outputs—execution unit semantics.
Connector lifecycleStart/stop/restart via /api/v1/connectors maps to connector runtime threads/async work—runtime lifecycle tied to the same process.
Embedded API orchestrationAxum router under [api].prefix (default /api/v1) mutates runtime state and persists registry JSON without a sidecar control plane.
Runtime persistence projectionstreams.json, tasks.json, connectors.json are the on-disk persistence projection of API-managed objects; startup registry reconciliation reloads them into the runtime engine.

Why Core is not just an API server: the REST surface is the embedded control plane for runtime coordination; disabling or mis-sizing the engine does not leave a “headless API”—it leaves a broken streaming runtime.

Why pipelines are runtime compositions: a pipeline in product language is not a standalone service—it is realized as runtime objects (streams, tasks, connectors) under one scheduler inside this single-process runtime graph.

Runtime state synchronization: UI and automation drive desired state through /api/v1/*; the process applies changes, persists registry files, and reflects status on …/status routes—runtime state and config state (see Runtime persistence model) must stay aligned across rolling restarts and reload semantics.


Runtime role

In runtime topology terms, Core is the single-process runtime engine that executes workloads described by:

  • Streamsstream transport: named channels with buffers, retention, optional WAL, and HTTP produce/consume surfaces. Stream topology realization is “which routers exist and how they connect.”
  • Tasksexecution units: subscriptions to source streams, PDL queries, concurrency limits, aggregation windows, and sinks (processing vs detection modes).
  • Connectorsingress/egress boundaries: sources pump events into streams; sinks drain tasks or streams per connector class semantics (class-specific backpressure / drop policies—Glossary → Connector class).

Runtime graph realization: there is no separate “pipeline microservice”—the pipeline is not a standalone service; it is pipeline realized as runtime objects sharing runtime coordination, IO, and persistence ownership in one address space.

Runtime object ownership: the process owns buffers, WAL segments under [core.wal], task internal state, and connector handles—execution isolation is cooperative (caps, modes), not separate JVMs per pipeline.

Persistence recovery: on start, registry JSON plus padas.toml / padas.default.toml drive startup orchestration: restore routers, replay or resume consumers per offset and WAL policy, then serve /api/v1/*.

Shutdown guarantees: bounded by [core].shutdown_timeout_secs; for clean runtime lifecycle, drain or pause ingest before SIGTERM in production so sink delivery and WAL flush windows can complete.


Core responsibilities

ResponsibilityWhere it livesOperational notes
Runtime coordinationEngine + [core]Workers, shards, startup/shutdown timeouts; virtual shard count affects hashing and parallelism—profile before raising.
Stream lifecycleEngine + streams.jsonCreate/update/delete, pause/resume/restart; per-stream WAL and buffer overlays via API or persisted registry.
Stream durability behavior[core.stream.wal], [core.wal], WAL subscriberWAL segments + retention; sync_writes trades latency for durability; high-throughput profiles sometimes disable per-stream WAL—explicit runtime durability trade-off (Glossary → WAL).
Consumer recoveryStreamRouter + offsets + [core.subscriber.lag]Execution scheduling resumes from persisted offsets; hybrid WAL reads when lag exceeds threshold / timeout_ms.
Task execution[core.task], tasks.jsonExecution state management for PDL, aggregation cleanup/watermarks ([core.task.aggregation.*]); overlays in task JSON vs padas.toml must be reconciled before change windows (Glossary → Watermark, Tasks).
Connector executionconnectors.json + connector runtimeStart/stop/restart mirrors tasks; runtime lifecycle and backpressure are class-specific.
Execution state managementIn-process task/agg state + registryRestart clears hot state; recovery depends on replay behavior and window definitions.
Persistence ownership[api.persistence].config_dir, core.data_dir, WAL pathRegistry vs WAL vs instance UUID paths—stateful paths must stay stable across VM migration.
Embedded control plane[api], Axum, /api/v1/*API-driven orchestration; POST /api/v1/system/reload applies supported padas.toml deltas where implemented—runtime reload semantics vs full restart differ by key.
Runtime reload semanticsPOST /api/v1/system/reloadPrefer validate_only in automation before applying; unsupported keys still need process restart.
Metrics / internal streams[observability], GET /api/v1/metricsInternal/system streams (e.g. _padas_metrics) feed observability—observability implications on hot path when metrics level is high.
Runtime API protection[api.auth], service-account JSONAxum runtime auth middleware when enabled—see Service-account authentication and Security.

Configuration file

Hierarchy

Config layering is fixed: padas.default.toml (immutable vendor defaults) → operator padas.tomlenvironment overrides (e.g. PADAS_HOME). Runtime precedence: later layers override earlier for the same keys. Treat padas.default.toml as read-only baseline; keep deployment portability overrides in padas.toml beside it under $PADAS_HOME/etc/. Stateful paths (data_dir, WAL, [api.persistence].config_dir) belong in operator config so reload behavior and backups stay consistent.

Reload behavior vs runtime restart implications: POST /api/v1/system/reload validates and applies a subset of changes; anything outside that set requires a process restart—plan maintenance window considerations accordingly. Full doc: Runtime configurations.

Minimal lab profile (TLS and auth off)

Use only on trusted localhost. Auth-disabled operational risks: with [api.auth].enabled = false, /api/v1/* is unauthenticated runtime API exposure—never on shared networks (Security).

[api]
host = "127.0.0.1"

[api.tls]
enabled = false

[api.auth]
enabled = false

Production-oriented API bind and transport

[api]
enabled = true
host = "0.0.0.0"
port = 8999
prefix = "/api/v1"

[api.tls]
enabled = true
cert_file = "/var/lib/padas/etc/api.crt"
key_file = "/var/lib/padas/etc/api.key"

[api.cors]
enabled = true
allowed_origins = "https://padas-ui.example.com"
allow_credentials = false
allowed_methods = "GET,POST,PUT,DELETE,OPTIONS"
allowed_headers = "Content-Type,Authorization"
preflight_max_age_secs = 3600

Service-account authentication (summary)

When [api.auth].enabled = true, Axum runtime auth middleware requires Authorization: Bearer <token> on every embedded route under prefix, including /api/v1/health and /metrics—full runtime API protection for the surface. Token material lives in JSON at [api.auth].service_account_token_file (default ./data/security/service-account.token relative to PADAS_HOME). Service token lifecycle (rotation, grace, lockout, X-Forwarded-For) and UI/Core trust model are documented in Security — runtime operational security reference. Rustls terminates TLS per [api.tls] in padas-api.

Registry persistence (API-managed objects)

[api.persistence]
config_dir = "/var/lib/padas/data/registry"
streams_registry_file = "streams.json"
tasks_registry_file = "tasks.json"
connectors_registry_file = "connectors.json"
backup_enabled = true
backup_dir = "/var/lib/padas/data/registry/backups"
max_backups = 10

Runtime projection: config_dir holds the authoritative persistence projection of API-managed streams/tasks/connectors—the registry durability operators backup. Reconciliation behavior: on startup the engine loads JSON and aligns runtime state with disk; runtime drift implications appear when manual file edits disagree with last API write or when restores are partial.

Backup strategy: snapshot config_dir with backup_enabled rotation and external volume backup—restore expectations include restarting Core so registry reconciliation runs against a consistent trio of files. Persistence ownership is split: registry JSON for config state, WAL/data dirs for runtime durability and task windows—restore one without the other at your own risk.

Engine, data paths, and timeouts

[core]
workers = 0
virtual_shards = 4096
data_dir = "/var/lib/padas/data"
startup_timeout_secs = 30
shutdown_timeout_secs = 30

instance_uuid is normally auto-generated on first start; keep data_dir and WAL paths stable so offset persistence and segment files remain consistent across rolling restart implications.

Streams, buffers, and WAL defaults

Per-stream buffers default under [core.stream.buffer] (max_events, mode = timeout | drop | block, timeout_ms)—they define stream backpressure on the hot path.

[core.stream.buffer]
max_events = 25000
max_bytes = 0
timeout_ms = 10
mode = "timeout"

[core.stream.wal]
enabled = false

[core.stream.wal.batch]
max_events = 100
max_timeout_ms = 100

Global WAL storage and retention:

[core.wal]
path = "/var/lib/padas/data/wal"
retention_ms = 3600000
retention_bytes = 1073741824
max_segment_size = 104857600
max_segments = 20
sync_writes = false

[core.wal.compression]
enabled = true
algorithm = "zstd"
level = 1

WAL / stream relationship:

  • With [core.stream.wal].enabled = true, appended events are durably segmented under [core.wal].path subject to retention and sync_writes.
  • High-throughput modes may disable stream WAL entirely—explicit runtime durability trade-off vs latency (Glossary → WAL).

Subscriber lag and hybrid reads

Controls when consumers transition from StreamRouter buffers to WAL-backed readsreplay lag and consumer lag tuning (Glossary → Consumer lag):

[core.subscriber.lag]
threshold = 50
timeout_ms = 100

[core.subscriber.batch]
max_events = 500

[core.subscriber.offset.batch]
max_events = 1000

[core.subscriber]
enable_adaptive_capacity = true

Tasks and aggregation (runtime execution)

[core.task]
# Values from shipped defaults; raise workers/thread caps only after profiling.

[core.task.aggregation.cleanup]
# Window state retention — align with PDL windows and disk budget.

[core.task.aggregation.watermark]
# Idle / shutdown flush behaviour for tumbling windows (watermark / cleanup).

Task REST payloads may supply execution overlays mapping to these tables—resolve which layer wins (TOML vs task JSON) before production edits (Tasks).

Observability hooks

Prefer [observability.logging].format = "Json" for centralized logging; [observability.metrics_collection].level increases runtime observability cost on the hot path. GET /api/v1/metrics exposes Prometheus text when enabled.


Streams, tasks, and connectors inside one runtime engine

Event lifecycle on the runtime execution graph:

  1. Ingestion — HTTP produce, connector sources, or internal generators append to a stream’s StreamRouter buffer (buffer transitions).
  2. WAL handoff — optional WAL subscriber persists copies per stream/WAL policy before downstream visibility guarantees are defined by mode.
  3. Task execution — tasks subscribe to source streams; task execution ordering is defined by the engine’s subscription and shard assignment—task saturation shows as lag, drops, or error status.
  4. Sink routing — PDL outputs land on sink streams or connector sinks; sink delivery respects connector-specific backpressure.
  5. Consumer lag behavior — subscribers past [core.subscriber.lag] thresholds read from WAL segments (execution isolation is not cross-process; it is per-task/connector caps).

Runtime state transitions: streams move created → active → paused/stopped; tasks/connectors expose running, stopped, error—use …/status routes for runtime diagnostics.

Control plane: REST CRUD on /api/v1/streams, /tasks, /connectors updates live runtime state and atomically persists streams.json / tasks.json / connectors.json. Lifecycle start / stop / restart maps to in-process runtime orchestration—not external Kubernetes job semantics unless you wrap them yourself.


Runtime execution model

StageRuntime behavior
IngestionEvents enter StreamRouter buffers from HTTP, connectors, or internal producers.
RoutingShards/virtual shards map keys to internal queues; fan-out to subscribers and WAL path.
Buffering[core.stream.buffer] modes (timeout, drop, block) define stream backpressure before drops or blocking producers.
WAL persistenceOptional subscriber writes batches to [core.wal].path with retention and compression.
Task executionPDL runs per execution unit; aggregation windows hold runtime state until cleanup/watermark.
Sink deliveryTasks and connector sinks consume from router and/or WAL hybrid reads.
Offset managementOffset persistence enables consumer recovery; batching under [core.subscriber.offset.batch] tunes commit frequency.
Replay behaviorAfter restart or lag, consumers may replay from WAL per policy—coordinate with task execution windows to avoid duplicate side effects in sinks.

Runtime persistence model

ConcernBehavior
WAL vs registry persistenceWAL = runtime durability for stream event history (policy-dependent). Registry JSON = config state for streams/tasks/connectors the API last wrote.
Runtime state vs config stateRuntime state includes hot buffers, in-flight connector handles, aggregation windows; config state is the persisted registry + padas.toml.
Recovery modelStartup reloads registry + merges padas.toml layers; WAL segments inform replay semantics for lagging subscribers.
Replay semanticsDefined by offset + WAL retention—if retention is shorter than outage, you lose replay range.
Stream durabilityTied to WAL enablement, sync_writes, and retention bytes/time—size for WAL growth worst case.
Offset persistenceDrives consumer recovery after rolling restart implications—validate backup includes offset stores if your deployment relies on them on disk under data_dir.
Registry restore expectationsRestoring only streams.json without matching tasks/connectors yields partial runtime topology—expect reload failures or manual cleanup.

Runtime orchestration boundaries

BoundaryResponsibility
UIOperator desired state and Core pairing (account_token); does not execute PDL or hold StreamRouter.
Core runtimeRuntime engine: StreamRouter, WAL, tasks, connectors, offsets, execution scheduling.
Registrystreams.json, tasks.json, connectors.jsonpersistence projection and runtime reconciliation input.
REST APIEmbedded control plane: /api/v1/*, POST /api/v1/system/reload, auth middleware—orchestration boundary to automation.
ConnectorsExternal IO; translate network/files/syslog into stream events or drain to downstream systems.
Stream engineIn-process stream transport and buffer/WAL stack shared by all pipelines.

Operational notes

TopicGuidance
Rolling restart implicationsOne node at a time still drops runtime state for that process—expect replay lag and brief connector stalls; upstream should tolerate redelivery where at-least-once applies.
WAL sizingPlan retention_bytes, max_segments, and disk IO for burst ingest; WAL growth without retention tuning fills disks.
State recoveryValidate startup_timeout_secs against large registries + WAL replay; startup hangs often trace here.
Runtime reload safetyUse validate_only on POST /api/v1/system/reload in CI/CD; bad TOML can still risk reload failures depending on validation coverage.
Persistence corruption considerationsTruncated JSON in config_dir or corrupt WAL segments—restore from backup; expect persistence inconsistencies until a clean snapshot is mounted.
Deployment synchronizationpadas.toml on disk must match what automation assumes; UI Core rows only store reachability + Bearer—deployment synchronization is still file + registry ops.
API orchestration behaviorHigh-frequency CRUD without backpressure can task saturate the control plane threadpool—pace automation.
Runtime driftManual edits to JSON while Core is running can diverge from in-memory runtime state—prefer API or controlled restarts.
Observability implicationsVerbose metrics/logging increase CPU; correlate with Monitoring during incidents.
Maintenance window considerationsCoordinate TLS rotation, service-account.token rotation, and secret.json-less Core changes with UI token refresh (Security).

Runtime troubleshooting checkpoints

SymptomWhere to look first
Startup hangsstartup_timeout_secs, registry size, WAL replay volume, TLS cert load (Rustls).
WAL growthRetention, max_segments, ingest rate vs sink throughput; disk free space.
Stalled consumers[core.subscriber.lag], offset commits, downstream sink health.
Replay lagWAL read path vs buffer path; segment count; consumer lag behavior.
Reload failuresPOST /api/v1/system/reload response body; unsupported keys still need restart.
Stream backpressureBuffer mode, max_events, drop metrics vs ingress EPS.
Task saturation[core.task] worker/thread caps, PDL cost, aggregation windows.
Connector stallsConnector status, external dependency timeouts, sink rate limits.
Persistence inconsistenciesMtime skew across streams.json / tasks.json / connectors.json; partial restore.

Deeper playbooks: Troubleshooting & Logs.


Performance Tuning

TBD