Version: 2.0.0 (Latest)

What is Padas?

PADAS is a high-performance streaming platform for ingesting, transforming, detecting, and routing events. It's designed for security and operational data (syslog, APIs, Kafka topics, files), but the core model is general: events in → processing → events out.

Throughput scales with pipeline complexity — from simple filter-and-forward to multi-stage windowed detection — and with available hardware. Data moves through the system as a continuous stream: connectors bring data in, tasks transform and detect patterns, connectors send results out.

How to read these docs

If you're new, start with:

This page (what PADAS is, where it fits)
Core concepts (the vocabulary: events, streams, tasks, connectors)
Architecture (how the pieces fit together)

What you can do with it

Ingest from many sources

Syslog over UDP/TCP (RFC 3164/5424, or raw lines)
HTTP polling or push-style ingestion
Kafka topics
Files (tail / batch read)
PADAS-to-PADAS TCP (MessagePack) for cross-node forwarding

Build pipelines with streams and tasks

Create streams as durable (WAL-backed) or in-memory channels for events
Attach tasks that run PDL queries to filter, transform, enrich, and aggregate
Run detection tasks that match events against one or more patterns and emit alerts
Send results to one or more downstream streams

Deliver data to destinations

Syslog, HTTP, Kafka, Splunk — forward to external systems and SIEMs
Object storage (S3-compatible) — write Parquet or JSON Lines for downstream analytics
PADAS REST API — consume streams directly from applications and dashboards

Enrich with context (optional)

Attach a lookup service to resolve fields (IP → geo, asset → owner, hash → threat intel) without schema changes to the core engine
PDL tasks apply enrichment inline during stream processing

How PADAS fits into a platform

PADAS Core is the streaming engine. It can run standalone or alongside optional context services:

Layer	Purpose
PADAS Core	Ingestion, processing, detection, routing
Lookup service	Field enrichment (geo, asset, threat intel)
Historical search	Long-term query over stored Parquet data
Entity / intel / vector	Planned enrichment context services

The core engine stays schema-agnostic. Context services add richness without requiring a centralized schema at the edge.

Mental model

Key design ideas

Schema-on-read: ingest raw data first; normalize and enrich with PDL tasks later.
Streams are the backbone: connectors and tasks all publish to and consume from named streams.
Durability is configurable: per-stream WAL enables crash recovery and historical reads; in-memory streams minimize overhead for transient data.
Operational visibility: metrics and lifecycle events are published as streams (_padas_metrics, _padas_internal), exposable as Prometheus metrics.
Normalize to any schema: PDL tasks map vendor-specific fields to whatever shape downstream systems expect — OCSF for security analytics, OpenTelemetry semantic conventions for observability pipelines, or your own internal model. Raw data stays intact upstream.

A note on scope

Some ecosystems present streaming as a suite of loosely coupled products (brokers, schema registry, stream processors, connectors, governance layers). PADAS integrates the core streaming loop — ingest, process, detect, route — into a single runtime. Higher-level services (lookup/context, long-term storage and search) are optional additions, not prerequisites.

This means less operational surface area and faster time-to-value for the common case, with a clear extension path for larger deployments.

When to use PADAS (and when not to)

Good fits

High-throughput log and event ingestion, from simple routing to complex windowed detection
Real-time transformation, normalization, and field enrichment
Detection and windowed aggregation with stateful processing
Forwarding and fan-out to multiple destinations (Kafka, Splunk, S3, syslog)
Replacing or augmenting heavy-weight streaming stacks for security telemetry use cases

Not a fit by itself

Long-term analytics over months of data — use the optional historical search layer (object storage + query service) alongside PADAS
Full schema governance enforced at ingestion — PADAS intentionally accepts heterogeneous inputs; schema validation is a downstream or application concern

Next steps

Read Core concepts to learn the vocabulary.
Read Architecture to understand components and data flow.

How to read these docs​

What you can do with it​

Ingest from many sources​

Build pipelines with streams and tasks​

Deliver data to destinations​

Enrich with context (optional)​

How PADAS fits into a platform​

Mental model​

Key design ideas​

A note on scope​

When to use PADAS (and when not to)​

Good fits​

Not a fit by itself​

Next steps​