What is Padas?
PADAS is a high-performance streaming platform for ingesting, transforming, detecting, and routing events. It's designed for security and operational data (syslog, APIs, Kafka topics, files), but the core model is general: events in → processing → events out.
Throughput scales with pipeline complexity — from simple filter-and-forward to multi-stage windowed detection — and with available hardware. Data moves through the system as a continuous stream: connectors bring data in, tasks transform and detect patterns, connectors send results out.
How to read these docs
If you're new, start with:
- This page (what PADAS is, where it fits)
- Core concepts (the vocabulary: events, streams, tasks, connectors)
- Architecture (how the pieces fit together)
What you can do with it
Ingest from many sources
- Syslog over UDP/TCP (RFC 3164/5424, or raw lines)
- HTTP polling or push-style ingestion
- Kafka topics
- Files (tail / batch read)
- PADAS-to-PADAS TCP (MessagePack) for cross-node forwarding
Build pipelines with streams and tasks
- Create streams as durable (WAL-backed) or in-memory channels for events
- Attach tasks that run PDL queries to filter, transform, enrich, and aggregate
- Run detection tasks that match events against one or more patterns and emit alerts
- Send results to one or more downstream streams
Deliver data to destinations
- Syslog, HTTP, Kafka, Splunk — forward to external systems and SIEMs
- Object storage (S3-compatible) — write Parquet or JSON Lines for downstream analytics
- PADAS REST API — consume streams directly from applications and dashboards
Enrich with context (optional)
- Attach a lookup service to resolve fields (IP → geo, asset → owner, hash → threat intel) without schema changes to the core engine
- PDL tasks apply enrichment inline during stream processing
How PADAS fits into a platform
PADAS Core is the streaming engine. It can run standalone or alongside optional context services:
| Layer | Purpose |
|---|---|
| PADAS Core | Ingestion, processing, detection, routing |
| Lookup service | Field enrichment (geo, asset, threat intel) |
| Historical search | Long-term query over stored Parquet data |
| Entity / intel / vector | Planned enrichment context services |
The core engine stays schema-agnostic. Context services add richness without requiring a centralized schema at the edge.
Mental model
Key design ideas
- Schema-on-read: ingest raw data first; normalize and enrich with PDL tasks later.
- Streams are the backbone: connectors and tasks all publish to and consume from named streams.
- Durability is configurable: per-stream WAL enables crash recovery and historical reads; in-memory streams minimize overhead for transient data.
- Operational visibility: metrics and lifecycle events are published as streams (
_padas_metrics,_padas_internal), exposable as Prometheus metrics. - Normalize to any schema: PDL tasks map vendor-specific fields to whatever shape downstream systems expect — OCSF for security analytics, OpenTelemetry semantic conventions for observability pipelines, or your own internal model. Raw data stays intact upstream.
A note on scope
Some ecosystems present streaming as a suite of loosely coupled products (brokers, schema registry, stream processors, connectors, governance layers). PADAS integrates the core streaming loop — ingest, process, detect, route — into a single runtime. Higher-level services (lookup/context, long-term storage and search) are optional additions, not prerequisites.
This means less operational surface area and faster time-to-value for the common case, with a clear extension path for larger deployments.
When to use PADAS (and when not to)
Good fits
- High-throughput log and event ingestion, from simple routing to complex windowed detection
- Real-time transformation, normalization, and field enrichment
- Detection and windowed aggregation with stateful processing
- Forwarding and fan-out to multiple destinations (Kafka, Splunk, S3, syslog)
- Replacing or augmenting heavy-weight streaming stacks for security telemetry use cases
Not a fit by itself
- Long-term analytics over months of data — use the optional historical search layer (object storage + query service) alongside PADAS
- Full schema governance enforced at ingestion — PADAS intentionally accepts heterogeneous inputs; schema validation is a downstream or application concern
Next steps
- Read Core concepts to learn the vocabulary.
- Read Architecture to understand components and data flow.