Version: 2.0.0 (Latest)

Architecture

Big picture

PADAS is built around a simple loop:

Connectors ingest or deliver data
Streams store and fan out events
Tasks run PDL processing to transform, detect, and aggregate

Everything is managed via the REST API — either directly against a Core instance, or through PADAS UI, which provides a central control plane across multiple Core instances.

Main components

PADAS UI (control plane)

PADAS UI is the central management layer for one or more PADAS Core instances. It provides:

Multi-instance management — configure and control streams, tasks, and connectors across multiple Core nodes from a single interface
Metrics aggregation and monitoring — collect and visualize throughput, drop rates, and health signals from all managed instances
Central configuration — deploy and synchronize pipeline configuration across nodes
Pipeline authoring and testing — build and validate PDL queries and connector configurations before applying them to production

PADAS UI communicates with each Core instance via the REST API Reference (/api/v1).

PADAS Core (processing engine)

PADAS Core is the stream processing runtime. Each instance handles:

Stream lifecycle, buffering, and routing
Optional per-stream durability via write-ahead log (WAL)
Task execution (PDL processing and detection)
Connector runtime (sources and sinks)
Embedded REST API for local control and data access

PADAS Core is a single binary with no external broker, schema registry, or connector runtime dependencies. Multiple Core instances can be deployed independently and managed centrally through PADAS UI.

Component overview

Data plane (within each Core instance)

Source connectors ingest events into streams; tasks transform, detect, and aggregate; sink connectors deliver results to external systems.

REST API (Core)

Each Core instance exposes a REST API that serves as both the configuration interface and the event data interface:

Manage streams, tasks, and connectors (lifecycle CRUD)
Produce events to a stream
Consume events from a stream (stateless; client-managed offsets)
Query a stream using PDL ad-hoc
Expose Prometheus-compatible metrics for scraping

PADAS UI uses this API to manage and monitor Core instances centrally.

Connectors (I/O boundary)

Connectors are the integration points to external systems. Each connector is configured with a role (source or sink), a target stream, and protocol-specific settings.

Source connectors (ingest):

Connector	Protocol / Format
Syslog	UDP/TCP, RFC 3164/5424, raw lines
HTTP	REST polling or push, JSON
Kafka	Consumer groups, SASL/SSL
File	File tail or batch read
PADAS TCP	MessagePack, cross-node forwarding

Sink connectors (deliver):

Connector	Protocol / Format
Syslog	UDP/TCP syslog forward
HTTP	REST POST, JSON
Kafka	Producer, SASL/SSL
Splunk	Splunk HEC or API
Object Storage	S3-compatible, Parquet or JSON Lines
PADAS TCP	MessagePack, cross-node forwarding

Data flow

Ingestion path (source → stream)

A source connector receives data from an external system.
It converts it into PADAS events (JSON or text).
It publishes events to a target stream.

Processing path (stream → task → stream)

A task subscribes to a source stream.
It runs a PDL pipeline (filter → transform → enrich → aggregate).
It publishes results to one or more output streams.

Processing tasks and detection tasks can run concurrently on the same source stream.

Delivery path (stream → sink)

A sink connector subscribes to a stream.
It batches and forwards events to an external system.

Multiple sink connectors can read from the same stream independently.

Durability and replay (WAL)

Streams can optionally persist data to a write-ahead log (WAL):

Enables crash recovery: a connector or task can resume from its last committed offset after a restart
Enables historical reads: clients or tasks can seek to any offset (or earliest) and replay past events
WAL segments are sealed, indexed with a sparse offset index, and subject to configurable retention (size or time)
Adds disk I/O overhead, tunable via batch sizes, compression, and retention settings

In-memory streams (no WAL) are suitable for transient fan-out where durability is not required.

Ordering, partitioning, and backpressure

Ordering

PADAS preserves event order within a stream's routing path. For workloads that need strict ordering for windowed aggregation, stream buffers, connector batching, and optional "preserve order" flags let you trade some throughput headroom for ordering guarantees.

Backpressure

Streams use bounded buffers. When a buffer fills, PADAS applies one of three strategies (configurable per stream):

Mode	Behavior
Block	Producer waits until space is available
Drop	Events are dropped immediately and counted
Timeout	Producer waits up to a deadline, then drops

Drop and timeout modes are preferred for high-throughput ingestion where backpressure should not stall upstream connectors.

Performance characteristics

PADAS is designed for sustained high-throughput event processing. Actual throughput depends on pipeline complexity: simple filtering and routing runs significantly faster than multi-stage windowed aggregation over high-cardinality fields. WAL persistence, connector I/O, and enrichment lookups each add overhead that varies with configuration and hardware.

Key throughput levers:

Stream buffer capacity
Connector worker counts and batch sizes
WAL compression and batch flushing
PDL query complexity and enrichment calls

See Configuration & Runtime Engine — Performance Tuning for workload-specific configuration guidance (section placeholder; tuning matrices live in Runtime configurations and the Glossary for runtime terms).

Observability

Metrics stream and Prometheus

PADAS publishes per-component metrics to _padas_metrics at a configurable interval. The REST API exposes a Prometheus-compatible /metrics endpoint for scraping. Typical signals to monitor:

padas_stream_events_* — throughput and drop rates per stream
padas_task_events_* — processing rates per task
padas_connector_events_* — ingest/delivery rates per connector
padas_system_* — CPU, memory, and WAL I/O

Internal events stream

_padas_internal receives structured lifecycle and runtime events (connector starts/stops, task errors, WAL segment rotations). These can be consumed like any stream — useful for automating alerts and post-mortems.

Schema strategy: schema-on-read

PADAS intentionally ingests data in its original shape and defers normalization:

Raw syslog lines stay as text payloads until a task parses them
Structured connector modes decode fields into a reserved ingestion namespace
PDL tasks can then normalize vendor-specific fields into whatever shape downstream systems expect — common targets include OCSF for security analytics, OpenTelemetry semantic conventions for observability, or internal data models
Enrichment (geo, asset ownership, threat intel) is applied by PDL tasks that call a lookup service

This keeps the ingest edge simple and robust: you don't need a schema contract with every data source upfront.

How PADAS extends beyond core

PADAS Core is the streaming engine. Additional platform components are layered alongside it — the core engine does not depend on any of them:

Component	Role
Lookup service	Field enrichment via REST/gRPC; local RocksDB cache for low-latency resolution
Historical search	Query past events from object storage (Parquet/S3)
Entity / intel / vector	Planned: entity context, threat intelligence, vector similarity

Where to go next

Read Core concepts for definitions.
See the API and connector references for concrete configuration and examples.

Big picture​

Main components​

PADAS UI (control plane)​

PADAS Core (processing engine)​

Component overview​

Data plane (within each Core instance)​

REST API (Core)​

Connectors (I/O boundary)​

Data flow​

Ingestion path (source → stream)​

Processing path (stream → task → stream)​

Delivery path (stream → sink)​

Durability and replay (WAL)​

Ordering, partitioning, and backpressure​

Ordering​

Backpressure​

Performance characteristics​

Observability​

Metrics stream and Prometheus​

Internal events stream​

Schema strategy: schema-on-read​

How PADAS extends beyond core​

Where to go next​