Skip to main content
Version: 2.0.0 (Latest)

Reference

Padas Domain Language (PDL) defines stream-processing expressions over JSON event records: boolean queries, parse stages, transformations (eval, rename, fields, output), partitioning, and windowed aggregations. Stages in a pipeline string are separated by | and execute sequentially in source order; each stage consumes the output projection of the previous stage and emits the next projection downstream.

Per-event stages (query, parse, eval, rename, fields, output, partition_by) evaluate against the current record (and, for aggregations, advance window state as defined by the runtime). Stateful windowed aggregations maintain buffers and partial reducers until a window closes or emits per every_event and windowing rules; their output shape differs from single-event transforms (see Aggregations).

Condensed syntax summary: Quick Reference. Worked examples: Comprehensive examples.

Table of Contents

  1. Execution model
  2. Command categories
  3. Query operations
  4. Transformation commands
  5. Parse commands
  6. Partitioning
  7. Aggregations
  8. Data types
  9. Field access
  10. Pipeline composition
  11. Examples
  12. Runtime considerations
  13. Errors

Execution model

Each input record moves left-to-right through pipeline stages; coloring matches the architecture data-plane legend (streams vs task-style processing).

ConcernSemantics
OrderingPipeline tokens apply left-to-right; the event object (projection) is threaded stage by stage.
FilteringA boolean query stage retains or discards the entire event for that branch when the expression is true or false.
Mutationeval, rename, fields, output, and parse_* materialize or remove fields on the in-flight record.
Partitioningpartition_by derives a routing key used for stream-scoped partitioning and worker affinity.
AggregationWindowed reducers consume a stream of timestamps and values, maintain state until emission, and emit aggregate rows as flat JSON per Output shape (Glossary → Aggregation).
Single evaluationFor processing tasks, each pipeline string is evaluated once per input event on the hot path; aggregation state must not advance twice for the same logical event (Glossary → Processing task).

Command categories

FamilyRoleRepresentative forms
QueryBoolean filter on the current recordfield = "x", NOT (a AND b)
ParseString field → structured fieldsparse_json, parse_csv, parse_regex, …
TransformCompute, rename, project, scalar outputeval, rename, fields, output
PartitionKey extraction for routingpartition_by f1, f2
AggregateWindowed, possibly grouped, reducerssum(x) AS s timespan=5m group_by k

Detailed syntax, parameters, and edge cases follow in the sections below.

Query operations

A query is a boolean expression over the event JSON. If it evaluates to true, the event is retained for subsequent stages (unless a later stage drops it); if false, it is discarded for that branch.

Comparison operators

OperatorSemanticsExample
=Equality; string RHS may include a single * wildcard (see Wildcards)status = "active"
!=Inequality; wildcard strings supportedtier != "free"
> < >= <=Ordered comparison on numeric or comparable scalarsscore >= 80
~=Regex match on stringpath ~= "^/api/v[0-9]+/"
?=String: RHS substring occurs in field value. Array: field array contains the scalar RHSmsg ?= "timeout", tags ?= "prod"
INField value equals any element of RHS array literal (RHS elements must be one uniform type)action IN ["login", "logout"]

?= versus IN

FormLHSRHSTrue when
?=String or arraySingle scalar / literalString: substring match. Array: membership of that value.
INAny comparable fieldArray literalField value equals one of the listed values.

Logical operators and precedence

OperatorRole
NOTUnary negation of the following comparison or parenthesized subquery.
ANDBinary conjunction.
ORBinary disjunction.

Precedence: NOT binds tightest (to its operand). AND binds tighter than OR. Thus a AND b OR c groups as (a AND b) OR c. Parentheses override defaults; mixed AND/OR without parentheses is discouraged in production definitions.

Evaluation: Parenthesized subexpressions evaluate as a unit. Function predicates (isnull, regex, …) evaluate to booleans like comparisons.

Wildcards

Wildcards apply to string comparisons with = and != only; other operators with wildcard patterns are rejected (Wildcard patterns only support = and != operators).

PatternMeaningCost note
* aloneMatches all events (true()); highest volumeAvoid on hot paths
field = "*"Field exists and is non-nullCheaper than pattern translation
field = "pre*"Prefix matchPreferred among patterns
field = "*suf"Suffix matchHigher scan cost than prefix
field = "*mid*"Substring / embedded wildcardsHighest relative cost

Patterns are translated for matching (implementation may cache compiled forms). Prefer explicit field = "*" for existence checks instead of bare *.

Regex (~=)

The RHS is a regular expression applied to the string field. Unbounded repetition and deep alternation increase backtracking risk and CPU cost; anchor patterns where possible.

Query functions

FunctionSemantics
regex(field, pattern)Regex match
cidrmatch(field, cidr)CIDR membership
isnull / isempty / isnumber / isstring / isarrayType / presence predicates
true() / false()Constant booleans

Query examples

user.role = "admin"
event.data.user.id > 1000
categories IN ["security", "monitoring"]
email ~= "^admin@.*\\.com$"
(status = "active" AND priority >= 5) OR (type = "emergency")
(user.role = "admin" OR user.role = "superuser") AND NOT (status = "disabled")
isnull(field1) AND field2 > 0

Transformation commands

eval

Purpose: Evaluate expressions and assign one or more fields on the current event.

Syntax:

eval field = expression
eval f1 = e1, f2 = e2, ...

Runtime behavior: Assignments in one eval run in source order; later assignments may read fields produced earlier in the same statement. Expressions may invoke functions (math, string, coercion, conditional). Type mismatches or invalid arity surface as runtime or validation errors per engine rules.

Arithmetic operators

OperatorDescriptionExample
+Additiontotal = price + tax
-Subtractionnet = gross - fee
*Multiplicationtotal = price * quantity
/Divisionaverage = sum / count

Conditionals and coercion

eval status = if(score > 80, "pass", "fail")
eval grade = case(score >= 90, "A", score >= 80, "B", "C")
eval display_name = coalesce(nickname, first_name, "Anonymous")
eval str_val = to_string(id)
eval num_val = to_number(price_string)

Supported eval functions (selected)

CategoryFunctions
Conditionalif, case, coalesce
Coercionto_number, to_string, to_boolean
Stringto_upper, to_lower, trim, substring, replace, split, join
Date/timenow, format_date, parse_date
Mathabs, round, floor, ceil, sqrt, pow, log, log10
Hash / encodingmd5, sha1, sha256, base64_encode, base64_decode
Arrayslength, index, slice
Utilityrandom, uuid, url_encode, url_decode

Exhaustive signatures and edge cases: Eval functions.

rename

Purpose: Map existing field paths to new names without transforming values.

Syntax:

rename old AS new
rename a AS x, b AS y

Runtime behavior: Path resolution uses the same field-access rules as queries and eval. Invalid paths or collisions follow engine validation.

rename user_id AS id
rename user.profile.first_name AS first_name, user.profile.last_name AS last_name
rename data[0] AS first_item

fields

Purpose: Project the event to a whitelist of fields, or remove listed fields and retain the rest.

Syntax:

fields f1, f2, f3
fields - f1, f2
fields remove f1, f2

Runtime behavior: Default form keeps only listed keys. - or remove drops listed keys. Order of operations matters relative to downstream sinks and aggregations.

output

Purpose: Select a single field as the pipeline result for consumers that expect a scalar or explicitly typed string payload.

Syntax:

output field
output field type=string

Runtime behavior:

AspectSemantics
SelectionExactly one field path; nested paths use dot / bracket notation.
Missing fieldIf the path does not resolve, the engine errors (no silent null emission).
Raw typeWithout type=string, the value’s JSON type is preserved (number, boolean, array, object).
type=stringCoerces the value to a string for wire formats; objects and arrays serialize as JSON text.
DownstreamThe result becomes the payload / result passed to the stage consumer (sink or task binding) per task configuration.
output score
output event.payload
output count type=string

Parse commands

Purpose: Parse a string-typed field and attach extracted keys to the current event.

Common form:

parse_<kind> <source_field> [options]

Command summary

CommandInputNotable options
parse_csvDelimited rowdelimiter, header
parse_kvk=v tokensdelimiter
parse_cefCEF string
parse_leefLEEF string
parse_xmlXML stringxpath
parse_jsonJSON text
parse_regexFree textflags, named captures (?P<name>…)

Field conflict resolution

When a parse would create a field name that already exists on the event, the runtime renames the new key by appending incrementing numeric suffixes so collisions do not overwrite existing values silently.

Per-command notes

  • parse_json: Merges parsed object keys into the projection (nested target paths supported where grammar allows).
  • parse_csv: Header row may be inferred or supplied; delimiter defaults are engine-defined unless overridden.
  • parse_kv: Supports multiple pairs per line and quoted values spanning spaces.
  • parse_regex: Named groups become field names; optional flags (e.g. i). Patterns with catastrophic backtracking are a runtime cost risk.
  • parse_xml: XPath selects extracted fragment; useful for narrow fields inside larger XML.
  • parse_cef / parse_leef: Normalise vendor encodings to first-class fields.
parse_csv log_data delimiter="|" header="ts,user,action"
parse_kv message delimiter=":"
parse_json envelope.body
parse_regex raw "(?P<ip>\\S+) \\[(?P<ts>[^\\]]+)\\]" flags="i"

Partitioning

Purpose: Derive a partition key from one or more fields for key-based routing, stream-scoped isolation, and coherent keyed execution (including alignment with grouped aggregates when combined with rekey).

Syntax:

partition_by field
partition_by f1, f2, ...

Runtime semantics:

TopicBehavior
Single fieldKey is the field’s stringified value (per engine casting rules).
Composite keyMultiple fields are joined with **`
Missing / nullMissing or null components are represented as the literal token null in the composite key string.
Stream scopeThe same key value in different streams does not share partition state.
Worker assignmentThe key drives write-queue / worker affinity so related events cohere for ordering-sensitive processing.
DownstreamPartition key influences how partitioned sinks and grouped aggregates route; pair with aggregate rekey=true when post-aggregate routing must track group_by dimensions.
parse_json | partition_by user_id | count timespan=5m group_by user_id
partition_by tenant_id, user_id | sum(amount) timespan=1h group_by user_id

Aggregations

Aggregations reduce many events within a time window to one or more emit records, optionally per group_by key. All functions in one statement share the same window, where, and grouping clauses.

Syntax

<fn1> [AS a1], <fn2> [AS a2], ...
[window=<type>]
timespan=<n><unit>
[hop=<n><unit>] [slide=<n><unit>] [gap=<n><unit>] [max_duration=<n><unit>]
[group_by f1, f2, ...]
[where <query>]
[rekey=true|false]
[every_event=true|false]

Parameters (selected):

ParameterRole
timespanWindow length (required).
windowtumbling (default), sliding, hopping, session.
slideSliding step; default is engine-defined (bounded, often ~1s or a fraction of timespan).
hopHopping period (required for hopping).
gap / max_durationSession inactivity gap and optional cap.
group_byDistinct key set; each key maintains isolated reducer state.
wherePredicate on ingested events entering the window.
rekeyWhen true, outgoing aggregate events adopt routing key from group_by fields.
every_eventfalse (default): emit when the window closes (event time passes window_end). true: emit on every contributing event (debug / high-volume warning).

Aggregation state semantics

TopicSemantics
Stateful executionWindowed aggregations maintain state (partial sums, buffers, session clocks) until emission or window close.
RetentionState lives for the lifetime of open windows; larger timespan, slide, or session gap increases retention duration.
every_event=falseReduces downstream event volume by emitting on window boundaries only.
every_event=trueMaterializes partial aggregates per event; suitable for debugging or low-rate streams; increases CPU and sink load.
group_by cardinalityEach distinct key holds separate partial state; high cardinality grows memory proportionally.
collectRetains full JSON bodies of matched events in-window → large memory footprint vs scalar reducers.

Output shape

Windowed aggregations emit flat rows (not one nested object per group under arbitrary parent keys):

  • Without group_by: one object with window_start, window_end, and metric fields at the root.
  • With group_by: one object per group per emission, each with group_key (pipe-separated composite, same order as group_by), window_start, window_end, and metrics at the root.

Multi-group results appear as a JSON array of those objects in the PDL API; processing tasks fan out one sink event per array element where configured—see Glossary → Aggregation.

{
"window_start": 1704067200000,
"window_end": 1704067500000,
"event_count": 150
}
[
{"group_key": "ok", "window_start": 1704067200000, "window_end": 1704067500000, "c": 100},
{"group_key": "err", "window_start": 1704067200000, "window_end": 1704067500000, "c": 3}
]

Multi-function aggregation

Multiple reducers may share one window specification in a single statement—one pass maintains consistent window boundaries and where filtering across metrics.

count AS n, sum(bytes) AS total, dc(user_id) AS users timespan=5m group_by tenant where status = "ok"

Aggregation functions

FunctionDescription
count / count(field)Event count / non-null count
sum, avg, min, maxNumeric reducers
dcDistinct count
median, stddev, varianceStatistical
valuesMultiset of distinct values (engine serialisation rules apply)
first / lastFirst / last non-null field value in processing order within window/group
earliest / latestField value from event with minimum / maximum canonical envelope timestamp; ties keep first stabilised winner
collect(query)Array of full events matching query inside window

collect

collect(score > 80) AS high_scores timespan=5m group_by user_id

Output: Emits window_start, window_end, and an array field (default name collected_events if no alias) containing full event objects. Empty arrays are valid when no matches occur.

Windowing strategies

TypeRequired paramsNotes
tumblingtimespanFixed, non-overlapping buckets.
slidingtimespanOptional slidetimespan; overlapping windows.
hoppingtimespan, hophoptimespan.
sessiontimespan, gapOptional max_durationtimespan when set.

Validation (typical): timespan > 0; hop > 0; slide <= timespan; gap > 0; invalid combinations fail at validate or plan time with explicit errors.

Representative aggregation examples

count timespan=5m
count AS c, sum(latency) AS total_ms timespan=1m group_by region where ok = true
sum(amount) AS rev timespan=1h group_by user_id, sku rekey=true
count AS events window=session timespan=1h gap=5m max_duration=2h

Data types

PDL expressions operate on JSON-aligned types.

TypeLiteral / formNotes
String"..."Comparisons may use wildcards with = / != only.
Number123, 45.67Arithmetic and ordered comparisons.
Booleantrue, falseLogical and equality.
Array[...]?=, IN, indexing.
ObjectVia field pathsDot and bracket access.

Coercion: Mixed-type arithmetic or concatenation may trigger implicit coercion; prefer explicit to_number / to_string for deterministic pipelines.

Field access

StyleFormExample
DotNested objectsuser.profile.id
Bracket indexArraysitems[0]
Bracket keyMaps / objectsheaders["content-type"]
Mixedrows[0].cells["id"]

Field names allow letters, digits, _, ., and may start with @ (e.g. @timestamp). Wildcard performance guidance lives under Query operations → Wildcards; avoid duplicating wildcard tables here.

Pipeline composition

query | parse_json body | eval x = y + 1 | rename x AS z | fields z
RuleSemantics
Pipe orderStages evaluate strictly in listed order.
At most one windowed aggregationA single pipeline string may contain one windowed aggregation segment (timespan= with aggregate functions) unless grammar explicitly allows otherwise; the padas-pdl parser / validator is authoritative.
Trailing queryA boolean expression after the aggregate may filter aggregate output when supported by grammar (e.g. threshold on emitted metrics).
Hot pathProcessing tasks: one PDL evaluation per input event; aggregation state advances accordingly (Glossary → Processing task).
parse_kv message
| eval ok = response = 200
| count timespan=5s group_by user, ok
| count > 5

Use validate_pdl_syntax and engine tests for version-specific acceptance.

Examples

status = "error" | parse_json detail | eval sev = to_upper(code) | fields ts, sev, detail
parse_csv line | eval total = qty * price | rename total AS order_total | fields order_id, order_total
partition_by tenant | sum(bytes) AS b timespan=5m group_by svc rekey=true

Additional patterns: Quick Reference, Comprehensive examples.

Runtime considerations

ConcernGuidance
Filter earlyApply cheap boolean queries before parse_* when predicates do not depend on parsed fields.
Project earlyUse fields to drop large blobs before heavy eval or aggregations to cut memory and serialization cost.
Parse costStructured parsers and regex extraction dominate CPU on wide or high-rate streams.
RegexFavour anchored, bounded patterns; avoid deep nesting and unbounded */+ on hot paths.
CoercionRepeated implicit string/number mixing in eval adds overhead; coerce once.
Aggregation stategroup_by cardinality, timespan, slide, collect, and every_event=true directly impact heap usage and emit rate.
WindowsLong timespan defers emission and increases per-window buffer size; short slide on sliding windows multiplies overlapping state.

Errors

Error classTypical causeRecommended action
SyntaxInvalid token order, unknown function, malformed window clauseRun validate_pdl_syntax; compare against grammar in padas-pdl; fix spelling / punctuation.
Type mismatchString compared to number without coercion, illegal operator for typeInsert to_number / to_string / to_boolean; narrow where predicates.
Missing fieldPath absent on event, or output target missingCorrect path; guard with coalesce or query predicates before use.
ParseInput string not valid for selected parse_*Inspect raw field; add upstream where; tighten pattern.
Window parametershop > timespan, slide > timespan, invalid gap / max_durationConsult Windowing strategies validation table; adjust literals.
Wildcard / operatorWildcard with > / < / ~= / etc.Use = / != only for wildcard patterns, or switch to regex ~=.
Resource pressureHigh-cardinality group_by, large collect, every_event=true on flood trafficReduce cardinality, shrink window, disable every_event, or scale consumers.

The parser and runtime implementation for your deployed engine version are authoritative for version-specific acceptance and error strings. Validate production PDL against the same engine version used at runtime.