Skip to main content
Version: 2.0.0 (Latest)

Architecture

Big picture

PADAS is built around a simple loop:

  • Connectors ingest or deliver data
  • Streams store and fan out events
  • Tasks run PDL processing to transform, detect, and aggregate

Everything is managed via the REST API — either directly against a Core instance, or through PADAS UI, which provides a central control plane across multiple Core instances.

Main components

PADAS UI (control plane)

PADAS UI is the central management layer for one or more PADAS Core instances. It provides:

  • Multi-instance management — configure and control streams, tasks, and connectors across multiple Core nodes from a single interface
  • Metrics aggregation and monitoring — collect and visualize throughput, drop rates, and health signals from all managed instances
  • Central configuration — deploy and synchronize pipeline configuration across nodes
  • Pipeline authoring and testing — build and validate PDL queries and connector configurations before applying them to production

PADAS UI communicates with each Core instance via the REST API Reference (/api/v1).

PADAS Core (processing engine)

PADAS Core is the stream processing runtime. Each instance handles:

  • Stream lifecycle, buffering, and routing
  • Optional per-stream durability via write-ahead log (WAL)
  • Task execution (PDL processing and detection)
  • Connector runtime (sources and sinks)
  • Embedded REST API for local control and data access

PADAS Core is a single binary with no external broker, schema registry, or connector runtime dependencies. Multiple Core instances can be deployed independently and managed centrally through PADAS UI.

Component overview

Data plane (within each Core instance)

Source connectors ingest events into streams; tasks transform, detect, and aggregate; sink connectors deliver results to external systems.

REST API (Core)

Each Core instance exposes a REST API that serves as both the configuration interface and the event data interface:

  • Manage streams, tasks, and connectors (lifecycle CRUD)
  • Produce events to a stream
  • Consume events from a stream (stateless; client-managed offsets)
  • Query a stream using PDL ad-hoc
  • Expose Prometheus-compatible metrics for scraping

PADAS UI uses this API to manage and monitor Core instances centrally.

Connectors (I/O boundary)

Connectors are the integration points to external systems. Each connector is configured with a role (source or sink), a target stream, and protocol-specific settings.

Source connectors (ingest):

ConnectorProtocol / Format
SyslogUDP/TCP, RFC 3164/5424, raw lines
HTTPREST polling or push, JSON
KafkaConsumer groups, SASL/SSL
FileFile tail or batch read
PADAS TCPMessagePack, cross-node forwarding

Sink connectors (deliver):

ConnectorProtocol / Format
SyslogUDP/TCP syslog forward
HTTPREST POST, JSON
KafkaProducer, SASL/SSL
SplunkSplunk HEC or API
Object StorageS3-compatible, Parquet or JSON Lines
PADAS TCPMessagePack, cross-node forwarding

Data flow

Ingestion path (source → stream)

  1. A source connector receives data from an external system.
  2. It converts it into PADAS events (JSON or text).
  3. It publishes events to a target stream.

Processing path (stream → task → stream)

  1. A task subscribes to a source stream.
  2. It runs a PDL pipeline (filter → transform → enrich → aggregate).
  3. It publishes results to one or more output streams.

Processing tasks and detection tasks can run concurrently on the same source stream.

Delivery path (stream → sink)

  1. A sink connector subscribes to a stream.
  2. It batches and forwards events to an external system.

Multiple sink connectors can read from the same stream independently.

Durability and replay (WAL)

Streams can optionally persist data to a write-ahead log (WAL):

  • Enables crash recovery: a connector or task can resume from its last committed offset after a restart
  • Enables historical reads: clients or tasks can seek to any offset (or earliest) and replay past events
  • WAL segments are sealed, indexed with a sparse offset index, and subject to configurable retention (size or time)
  • Adds disk I/O overhead, tunable via batch sizes, compression, and retention settings

In-memory streams (no WAL) are suitable for transient fan-out where durability is not required.

Ordering, partitioning, and backpressure

Ordering

PADAS preserves event order within a stream's routing path. For workloads that need strict ordering for windowed aggregation, stream buffers, connector batching, and optional "preserve order" flags let you trade some throughput headroom for ordering guarantees.

Backpressure

Streams use bounded buffers. When a buffer fills, PADAS applies one of three strategies (configurable per stream):

ModeBehavior
BlockProducer waits until space is available
DropEvents are dropped immediately and counted
TimeoutProducer waits up to a deadline, then drops

Drop and timeout modes are preferred for high-throughput ingestion where backpressure should not stall upstream connectors.

Performance characteristics

PADAS is designed for sustained high-throughput event processing. Actual throughput depends on pipeline complexity: simple filtering and routing runs significantly faster than multi-stage windowed aggregation over high-cardinality fields. WAL persistence, connector I/O, and enrichment lookups each add overhead that varies with configuration and hardware.

Key throughput levers:

  • Stream buffer capacity
  • Connector worker counts and batch sizes
  • WAL compression and batch flushing
  • PDL query complexity and enrichment calls

See Configuration & Runtime Engine — Performance Tuning for workload-specific configuration guidance (section placeholder; tuning matrices live in Runtime configurations and the Glossary for runtime terms).

Observability

Metrics stream and Prometheus

PADAS publishes per-component metrics to _padas_metrics at a configurable interval. The REST API exposes a Prometheus-compatible /metrics endpoint for scraping. Typical signals to monitor:

  • padas_stream_events_* — throughput and drop rates per stream
  • padas_task_events_* — processing rates per task
  • padas_connector_events_* — ingest/delivery rates per connector
  • padas_system_* — CPU, memory, and WAL I/O

Internal events stream

_padas_internal receives structured lifecycle and runtime events (connector starts/stops, task errors, WAL segment rotations). These can be consumed like any stream — useful for automating alerts and post-mortems.

Schema strategy: schema-on-read

PADAS intentionally ingests data in its original shape and defers normalization:

  • Raw syslog lines stay as text payloads until a task parses them
  • Structured connector modes decode fields into a reserved ingestion namespace
  • PDL tasks can then normalize vendor-specific fields into whatever shape downstream systems expect — common targets include OCSF for security analytics, OpenTelemetry semantic conventions for observability, or internal data models
  • Enrichment (geo, asset ownership, threat intel) is applied by PDL tasks that call a lookup service

This keeps the ingest edge simple and robust: you don't need a schema contract with every data source upfront.

How PADAS extends beyond core

PADAS Core is the streaming engine. Additional platform components are layered alongside it — the core engine does not depend on any of them:

ComponentRole
Lookup serviceField enrichment via REST/gRPC; local RocksDB cache for low-latency resolution
Historical searchQuery past events from object storage (Parquet/S3)
Entity / intel / vectorPlanned: entity context, threat intelligence, vector similarity

Where to go next

  • Read Core concepts for definitions.
  • See the API and connector references for concrete configuration and examples.