Version: 2.0.0 (Latest)

Quick Reference

Padas Domain Language (PDL) defines stream-processing expressions over JSON events: filtering (boolean queries that retain or discard a record), parsing (string-to-field extraction), transformation (eval, type coercion, conditionals), routing (partition_by, aggregate rekey), and aggregation (windowed stateful reduction). Normative syntax and edge cases: Reference.

A pipeline is a linear chain of stages separated by | (or an equivalent stage list in the task configuration). Stages execute sequentially in source order. Each stage consumes the event projection produced by the previous stage and emits the next projection downstream; query stages filter without mutating retained rows unless combined with mutation stages in the same task definition.

Execution semantics

Concept	Behavior
Stage chaining	Stages apply in order; there is no implicit parallelism inside a single PDL pipeline unless the runtime maps partitions independently.
Event flow	One inbound JSON record enters the chain; each stage reads the current field tree; parsers and `eval` materialize or overwrite fields; `fields` projects a subset; `output` may reduce the payload to a scalar for specialized sinks.
Filtering	A query stage evaluates a boolean expression; false drops the event for that branch; true forwards the unchanged projection unless a later stage mutates it.
Aggregation state	Windowed aggregations maintain state until the window closes and the engine emits one or more aggregate records per window (and per `group_by` key); see Aggregation.
Routing	`partition_by` and aggregate `rekey` influence how the runtime routes keyed work and sink partitions; see Partitioning.
Windows	`timespan` bounds the window lifecycle; `tumbling`, `sliding`, and `session` modes control overlap and gap handling; open windows retain buffers and partial aggregates until emission.

Query expressions

Queries filter whole events: the expression evaluates to a boolean; true retains the event for subsequent stages, false discards it for that processing branch (unless the enclosing task type documents alternate behavior).

Comparison syntax

Field paths use dot notation for nested JSON. Operators combine a path with a literal or comparable value.

field = value
field != value
field > value
field >= value
field < value
field <= value
field ?= value
field ~= pattern
field IN [v1, v2, v3]

Operator	Semantics
`=` / `!=`	Equality / inequality on scalars; string `=` / `!=` may use a single *``** wildcard in the pattern (see Wildcards).
`>` / `<` / `>=` / `<=`	Ordered comparison on numeric or otherwise comparable scalars; not defined for wildcard string patterns.
`?=`	String: substring contains the right-hand literal. Array: true if the array contains the scalar element (membership).
`~=`	Regex match on string values; pattern syntax follows the engine’s regex implementation.
`IN`	True if the field value equals any element of the right-hand array literal; array elements must be a uniform type (String or Integer) per query definition rules.

Logical operators and precedence

NOT predicate
left AND right
left OR right
(query1 AND query2) OR query3

Construct	Semantics
`NOT`	Unary negation of the immediately following comparison or parenthesized subquery.
`AND` / `OR`	Binary conjunction / disjunction; operands are comparisons or parenthesized queries.

Precedence: NOT binds tightest (to its operand). AND binds tighter than OR. Therefore a AND b OR c groups as (a AND b) OR c. OR chains associate left-to-right at the same precedence level. Parentheses override defaults and should be used wherever mixing AND and OR would otherwise be ambiguous.

Evaluation order: Subexpressions inside parentheses evaluate as a unit before their result participates in outer operators. For deterministic matching and auditability, prefer explicit parentheses over reliance on default precedence.

Boolean and null semantics

Comparisons evaluate against the resolved field value and literal; missing paths or type mismatches surface as runtime or validation errors depending on stage configuration—see Errors. AND and OR use ordinary boolean truth; short-circuiting follows typical boolean evaluation in the engine implementation.

Wildcards

With = / != on string JSON, a single * wildcard is permitted in the pattern. Wildcard patterns are translated internally for matching; leading and embedded * patterns can increase scan cost versus trailing * prefix forms. field = "*" denotes field existence (non-null) semantics per deployment. Standalone * matches all events and should be treated as a last-resort predicate in high-volume streams.

Regex (`~=`)

The right-hand side is a regular expression applied to the string field. Patterns may be cached by the runtime; unbounded quantifiers and nested alternation increase backtracking risk and CPU cost. Prefer anchored, bounded patterns for hot paths.

Query examples

user.age > 25
user.name = "Alice"
user.premium = true
scores ?= 90
scores.length > 3
scores[0] > 80
user.age > 25 AND user.premium = true
user.department = "Engineering" OR user.department = "Sales"
status IN ["active", "pending"]
email ~= "^[^@]+@example\\.com$"

Parse commands

Parse stages read a string field (raw line, embedded JSON text, CEF/LEEF, etc.), parse the payload, and attach structured fields to the current event.

Parse semantics

Topic	Behavior
Extracted fields	Successful parses materialize new keys on the event object (or nested target where the command supports a path).
Collision / overwrite	New keys produced by a parse coexist with prior fields; if a generated key collides with an existing name, the effective value is last writer for that stage chain position—confirm collision rules for your engine version in Reference.
Output structure	`parse_json` merges object keys into the projection; `parse_csv` / `parse_kv` / `parse_regex` / `parse_cef` / `parse_leef` / `parse_xml` emit flat or path-scoped fields per command grammar.
Field attachment model	Parses transform the in-flight record in place for the remainder of the pipeline unless a later stage renames, projects (`fields`), or replaces the payload (`output`).

Command forms

JSON — Parses a string field as JSON and merges object fields.

parse_json field_name
parse_json field_name.subfield

CSV — Splits delimiter-separated values; optional header= defines or overrides column names.

parse_csv field_name
parse_csv field_name header="col1,col2,col3"
parse_csv field_name delimiter=","

XML — Extracts via XPath for legacy or XML-embedded payloads.

parse_xml field_name
parse_xml field_name xpath="//user/name"

Key–value — Tokenizes key=value or key:value forms.

parse_kv field_name
parse_kv field_name delimiter="="

Regex — Named capture groups become output field names.

parse_regex field_name "(?P<level>\w+) (?P<msg>.*)"
parse_regex field_name "(?P<level>\w+) (?P<msg>.*)" flags="i"

CEF / LEEF — Normalizes ArcSight-style CEF and LEEF into standard fields.

parse_cef field_name
parse_leef field_name

Transformations

`eval`

eval evaluates one or more expressions and materializes fields. Assignments execute in source order within a single eval statement; later assignments may read fields produced earlier in the same statement.

eval field = expression
eval field1 = expr1, field2 = expr2

Arithmetic — Numeric operators and parentheses follow conventional precedence; coercion may occur when types differ—normalize with to_number / to_string to control cost.

eval total = price * quantity
eval discount = price * 0.1
eval final = (price * quantity) * (1 - discount)

Mathematical functions — Unary/binary numeric helpers (sqrt, abs, round, floor, ceil, pow, log, log10).

eval sqrt_val = sqrt(value)
eval abs_val = abs(value)
eval round_val = round(value)

String functions — Concatenation, case, length, substring, replace.

eval full_name = name + " " + surname
eval upper_name = to_upper(name)
eval lower_name = to_lower(name)
eval name_len = length(name)
eval substr = substring(text, start, length)
eval replaced = replace(text, "old", "new")

Type conversion — Explicit coercion reduces ambiguity and downstream serialization surprises.

eval str_val = to_string(number)
eval num_val = to_number(string)
eval bool_val = to_boolean(value)

Conditionals — if, case, coalesce evaluate branches and return the first matching or non-null value per function semantics.

eval status = if(condition, true_value, false_value)
eval grade = case(age >= 65, "senior", age >= 18, "adult", "minor")
eval result = coalesce(field1, field2, "default")

Aggregation

Aggregates consume a stream of events within a time window (timespan=…) and emit summarized records. AS names output metrics. Exact JSON shapes and multi-group emission: Reference → Output shape, Glossary → Aggregation.

Runtime and state

Topic	Semantics
Runtime state	Windowed aggregations maintain state (partial sums, counts, buffers, session clocks) until the window closes or the session expires.
Window lifecycle	`timespan` defines the window length; **`window=tumbling
State retention	State exists for the duration of open windows; larger `timespan` and higher cardinality `group_by` increase memory footprint.
Grouped output	`group_by` emits one aggregate row per distinct key per window; multiple groups may serialize as a JSON array; downstream tasks may fan out one sink event per row.
Filtering into the window	`where` restricts which events enter the aggregate computation.
`rekey=true`	Rewrites the routing key from `group_by` fields so partitioned sinks route consistently with aggregate keys.

Forms

sum(field) AS alias timespan=5m
avg(field) AS alias timespan=5m
count AS alias timespan=5m
min(field) AS alias timespan=5m
max(field) AS alias timespan=5m
first(field) AS alias timespan=5m
last(field) AS alias timespan=5m
earliest(field) AS alias timespan=5m
latest(field) AS alias timespan=5m
dc(field) AS alias timespan=5m

sum(field1) AS total, avg(field2) AS average timespan=5m
sum(field) AS total group_by group_field timespan=5m
avg(field) AS average group_by field1, field2 timespan=5m

sum(field) AS total window=tumbling timespan=5m
sum(field) AS total window=sliding timespan=5m slide=1m
sum(field) AS total window=session timespan=5m gap=2m

sum(field) AS total where condition timespan=5m
sum(amount) AS total timespan=1h group_by user_id, department rekey=true
count AS events timespan=5m group_by user_id rekey=true

Partitioning

partition_by extracts one or more fields that form the partition key for keyed execution, scaling, and sink routing. It routes logical work to a stable key derived from the event.

partition_by user_id
partition_by user_id, department

parse_json | partition_by user_id | count timespan=5m
partition_by tenant_id, user_id | sum(amount) timespan=1h group_by user_id

Downstream implications: The key influences which downstream operator instance consumes the event and how aggregates align with sink partitions; combine with aggregate rekey when the post-aggregate key must match the partition scheme.

Output shaping

`fields`

fields projects the event to a subset of keys (whitelist) or removes listed keys.

fields field1, field2, field3
fields remove field1, field2
fields - field1, field2

Reducing payload size before heavy eval or sinks lowers memory and serialization cost.

`rename`

rename maps existing field paths to new names without transforming values.

rename old_field AS new_field
rename field1 AS new1, field2 AS new2

`output`

output selects a single field and exposes its value as the pipeline result for stages that expect a scalar or explicitly typed text payload (certain sink encodings).

output field_name
output field_name type=string

Behavior	Semantics
Scalar extraction	The engine projects one field’s value as the primary emission for the stage result.
Payload replacement	The downstream record serializes around that scalar (or typed string) rather than the full JSON object unless the task merges metadata separately.
Downstream serialization	`type=` hints string coercion for wire formats that require text.
Single-field emission	Multiple `output` stages in one logical pipeline are invalid or last-wins per task grammar—see Reference.

Examples

Pipeline compositions

parse_json raw_data | eval total = price * quantity | fields total

parse_csv data |
eval total = price * quantity |
eval tax = total * 0.08 |
eval final = total + tax |
rename final AS order_total |
fields order_total

user.age > 25 |
eval status = if(premium, "vip", "regular") |
fields name, status

Predicate, transform, and window patterns

user.name != null AND user.email != null AND user.age > 0

eval full_name = first_name + " " + last_name
eval age_group = case(age < 18, "minor", age < 65, "adult", "senior")
eval is_high_value = amount > 1000

sum(amount) AS revenue timespan=1d group_by date
count AS action_count timespan=1h group_by user_id where action = "purchase"

eval ratio = if(divisor != 0, dividend / divisor, 0)
eval name = coalesce(user.name, "Unknown")

Non-normative sample payloads

The JSON below is illustrative only; it does not define schema requirements. Use for manual tests or parse_json fixtures.

E-commerce order

{
  "order_id": "ORD-123",
  "customer": {
    "name": "Alice",
    "email": "alice@example.com",
    "tier": "premium"
  },
  "items": [
    {"name": "Laptop", "price": 999.99, "quantity": 1},
    {"name": "Mouse", "price": 29.99, "quantity": 2}
  ],
  "discount_code": "SAVE10"
}

Log entry

{
  "timestamp": "2024-01-20T14:30:25Z",
  "level": "ERROR",
  "message": "Database connection failed",
  "details": "timeout=30s, retries=3, host=db-prod-01"
}

User event

{
  "user_id": 123,
  "action": "purchase",
  "timestamp": 1640995200,
  "amount": 99.99,
  "product": "Laptop",
  "category": "Electronics"
}

Performance and runtime considerations

Concern	Guidance
Parse cost	`parse_regex`, `parse_xml`, and large `parse_json` on wide strings dominate CPU; filter before parse when the predicate does not depend on parsed fields.
Regex backtracking	`~=` and `parse_regex` patterns with nested quantifiers risk exponential backtracking; prefer bounded classes and anchors.
Memory / state	Long `timespan`, high-cardinality `group_by`, and session windows retain more in-flight state.
Aggregation cost	More functions per window and more keys increase merge work at emit time.
Projection	Early `fields` drops large blobs before `eval` and aggregations, reducing per-event memory and serialization volume.
Type coercion	Repeated implicit coercion in `eval` adds overhead; coerce once with `to_number` / `to_string`.

Errors

Failures fall into overlapping categories below; the exact error code and message depend on the engine build.

Category	Description
Validation failure	Pipeline or query fails static checks (syntax, unknown function, illegal token order) before execution.
Execution-time errors	A stage evaluates at runtime and encounters an illegal value (for example division by zero, missing path where required).
Parse-time errors	A *`parse_`** stage receives input that does not match the expected format.
Runtime failure model	The task may drop the event, retry per connector policy, or surface the error to observability depending on task type—see task and stream documentation.

Message / code (typical)	Cause	Mitigation
`FieldNotFound`	Resolved path missing on the event	Correct the path; use `coalesce` or guards
`InvalidSyntax`	Token order or spelling	Compare with Reference
`TypeMismatch`	String vs number, etc.	Insert `to_string` / `to_number` / `to_boolean`
`DivisionByZero`	Divisor evaluates to zero	Guard with `if`
`ParseError`	Input not valid for *`parse_`**	Inspect raw field; `where` before parse

Execution semantics​

Query expressions​

Comparison syntax​

Logical operators and precedence​

Boolean and null semantics​

Wildcards​

Regex (~=)​

Query examples​

Parse commands​

Parse semantics​

Command forms​

Transformations​

eval​

Aggregation​

Runtime and state​

Forms​

Partitioning​

Output shaping​

fields​

rename​

output​

Examples​

Pipeline compositions​

Predicate, transform, and window patterns​

Non-normative sample payloads​

E-commerce order​

Log entry​

User event​

Performance and runtime considerations​

Errors​