Quick Reference
Padas Domain Language (PDL) defines stream-processing expressions over JSON events: filtering (boolean queries that retain or discard a record), parsing (string-to-field extraction), transformation (eval, type coercion, conditionals), routing (partition_by, aggregate rekey), and aggregation (windowed stateful reduction). Normative syntax and edge cases: Reference.
A pipeline is a linear chain of stages separated by | (or an equivalent stage list in the task configuration). Stages execute sequentially in source order. Each stage consumes the event projection produced by the previous stage and emits the next projection downstream; query stages filter without mutating retained rows unless combined with mutation stages in the same task definition.
Execution semantics
| Concept | Behavior |
|---|---|
| Stage chaining | Stages apply in order; there is no implicit parallelism inside a single PDL pipeline unless the runtime maps partitions independently. |
| Event flow | One inbound JSON record enters the chain; each stage reads the current field tree; parsers and eval materialize or overwrite fields; fields projects a subset; output may reduce the payload to a scalar for specialized sinks. |
| Filtering | A query stage evaluates a boolean expression; false drops the event for that branch; true forwards the unchanged projection unless a later stage mutates it. |
| Aggregation state | Windowed aggregations maintain state until the window closes and the engine emits one or more aggregate records per window (and per group_by key); see Aggregation. |
| Routing | partition_by and aggregate rekey influence how the runtime routes keyed work and sink partitions; see Partitioning. |
| Windows | timespan bounds the window lifecycle; tumbling, sliding, and session modes control overlap and gap handling; open windows retain buffers and partial aggregates until emission. |
Query expressions
Queries filter whole events: the expression evaluates to a boolean; true retains the event for subsequent stages, false discards it for that processing branch (unless the enclosing task type documents alternate behavior).
Comparison syntax
Field paths use dot notation for nested JSON. Operators combine a path with a literal or comparable value.
field = value
field != value
field > value
field >= value
field < value
field <= value
field ?= value
field ~= pattern
field IN [v1, v2, v3]
| Operator | Semantics |
|---|---|
= / != | Equality / inequality on scalars; string = / != may use a single * wildcard in the pattern (see Wildcards). |
> / < / >= / <= | Ordered comparison on numeric or otherwise comparable scalars; not defined for wildcard string patterns. |
?= | String: substring contains the right-hand literal. Array: true if the array contains the scalar element (membership). |
~= | Regex match on string values; pattern syntax follows the engine’s regex implementation. |
IN | True if the field value equals any element of the right-hand array literal; array elements must be a uniform type (String or Integer) per query definition rules. |
Logical operators and precedence
NOT predicate
left AND right
left OR right
(query1 AND query2) OR query3
| Construct | Semantics |
|---|---|
NOT | Unary negation of the immediately following comparison or parenthesized subquery. |
AND / OR | Binary conjunction / disjunction; operands are comparisons or parenthesized queries. |
Precedence: NOT binds tightest (to its operand). AND binds tighter than OR. Therefore a AND b OR c groups as (a AND b) OR c. OR chains associate left-to-right at the same precedence level. Parentheses override defaults and should be used wherever mixing AND and OR would otherwise be ambiguous.
Evaluation order: Subexpressions inside parentheses evaluate as a unit before their result participates in outer operators. For deterministic matching and auditability, prefer explicit parentheses over reliance on default precedence.
Boolean and null semantics
Comparisons evaluate against the resolved field value and literal; missing paths or type mismatches surface as runtime or validation errors depending on stage configuration—see Errors. AND and OR use ordinary boolean truth; short-circuiting follows typical boolean evaluation in the engine implementation.
Wildcards
With = / != on string JSON, a single * wildcard is permitted in the pattern. Wildcard patterns are translated internally for matching; leading and embedded * patterns can increase scan cost versus trailing * prefix forms. field = "*" denotes field existence (non-null) semantics per deployment. Standalone * matches all events and should be treated as a last-resort predicate in high-volume streams.
Regex (~=)
The right-hand side is a regular expression applied to the string field. Patterns may be cached by the runtime; unbounded quantifiers and nested alternation increase backtracking risk and CPU cost. Prefer anchored, bounded patterns for hot paths.
Query examples
user.age > 25
user.name = "Alice"
user.premium = true
scores ?= 90
scores.length > 3
scores[0] > 80
user.age > 25 AND user.premium = true
user.department = "Engineering" OR user.department = "Sales"
status IN ["active", "pending"]
email ~= "^[^@]+@example\\.com$"
Parse commands
Parse stages read a string field (raw line, embedded JSON text, CEF/LEEF, etc.), parse the payload, and attach structured fields to the current event.
Parse semantics
| Topic | Behavior |
|---|---|
| Extracted fields | Successful parses materialize new keys on the event object (or nested target where the command supports a path). |
| Collision / overwrite | New keys produced by a parse coexist with prior fields; if a generated key collides with an existing name, the effective value is last writer for that stage chain position—confirm collision rules for your engine version in Reference. |
| Output structure | parse_json merges object keys into the projection; parse_csv / parse_kv / parse_regex / parse_cef / parse_leef / parse_xml emit flat or path-scoped fields per command grammar. |
| Field attachment model | Parses transform the in-flight record in place for the remainder of the pipeline unless a later stage renames, projects (fields), or replaces the payload (output). |
Command forms
JSON — Parses a string field as JSON and merges object fields.
parse_json field_name
parse_json field_name.subfield
CSV — Splits delimiter-separated values; optional header= defines or overrides column names.
parse_csv field_name
parse_csv field_name header="col1,col2,col3"
parse_csv field_name delimiter=","
XML — Extracts via XPath for legacy or XML-embedded payloads.
parse_xml field_name
parse_xml field_name xpath="//user/name"
Key–value — Tokenizes key=value or key:value forms.
parse_kv field_name
parse_kv field_name delimiter="="
Regex — Named capture groups become output field names.
parse_regex field_name "(?P<level>\w+) (?P<msg>.*)"
parse_regex field_name "(?P<level>\w+) (?P<msg>.*)" flags="i"
CEF / LEEF — Normalizes ArcSight-style CEF and LEEF into standard fields.
parse_cef field_name
parse_leef field_name
Transformations
eval
eval evaluates one or more expressions and materializes fields. Assignments execute in source order within a single eval statement; later assignments may read fields produced earlier in the same statement.
eval field = expression
eval field1 = expr1, field2 = expr2
Arithmetic — Numeric operators and parentheses follow conventional precedence; coercion may occur when types differ—normalize with to_number / to_string to control cost.
eval total = price * quantity
eval discount = price * 0.1
eval final = (price * quantity) * (1 - discount)
Mathematical functions — Unary/binary numeric helpers (sqrt, abs, round, floor, ceil, pow, log, log10).
eval sqrt_val = sqrt(value)
eval abs_val = abs(value)
eval round_val = round(value)
String functions — Concatenation, case, length, substring, replace.
eval full_name = name + " " + surname
eval upper_name = to_upper(name)
eval lower_name = to_lower(name)
eval name_len = length(name)
eval substr = substring(text, start, length)
eval replaced = replace(text, "old", "new")
Type conversion — Explicit coercion reduces ambiguity and downstream serialization surprises.
eval str_val = to_string(number)
eval num_val = to_number(string)
eval bool_val = to_boolean(value)
Conditionals — if, case, coalesce evaluate branches and return the first matching or non-null value per function semantics.
eval status = if(condition, true_value, false_value)
eval grade = case(age >= 65, "senior", age >= 18, "adult", "minor")
eval result = coalesce(field1, field2, "default")
Aggregation
Aggregates consume a stream of events within a time window (timespan=…) and emit summarized records. AS names output metrics. Exact JSON shapes and multi-group emission: Reference → Output shape, Glossary → Aggregation.
Runtime and state
| Topic | Semantics |
|---|---|
| Runtime state | Windowed aggregations maintain state (partial sums, counts, buffers, session clocks) until the window closes or the session expires. |
| Window lifecycle | timespan defines the window length; **`window=tumbling |
| State retention | State exists for the duration of open windows; larger timespan and higher cardinality group_by increase memory footprint. |
| Grouped output | group_by emits one aggregate row per distinct key per window; multiple groups may serialize as a JSON array; downstream tasks may fan out one sink event per row. |
| Filtering into the window | where restricts which events enter the aggregate computation. |
rekey=true | Rewrites the routing key from group_by fields so partitioned sinks route consistently with aggregate keys. |
Forms
sum(field) AS alias timespan=5m
avg(field) AS alias timespan=5m
count AS alias timespan=5m
min(field) AS alias timespan=5m
max(field) AS alias timespan=5m
first(field) AS alias timespan=5m
last(field) AS alias timespan=5m
earliest(field) AS alias timespan=5m
latest(field) AS alias timespan=5m
dc(field) AS alias timespan=5m
sum(field1) AS total, avg(field2) AS average timespan=5m
sum(field) AS total group_by group_field timespan=5m
avg(field) AS average group_by field1, field2 timespan=5m
sum(field) AS total window=tumbling timespan=5m
sum(field) AS total window=sliding timespan=5m slide=1m
sum(field) AS total window=session timespan=5m gap=2m
sum(field) AS total where condition timespan=5m
sum(amount) AS total timespan=1h group_by user_id, department rekey=true
count AS events timespan=5m group_by user_id rekey=true
Partitioning
partition_by extracts one or more fields that form the partition key for keyed execution, scaling, and sink routing. It routes logical work to a stable key derived from the event.
partition_by user_id
partition_by user_id, department
parse_json | partition_by user_id | count timespan=5m
partition_by tenant_id, user_id | sum(amount) timespan=1h group_by user_id
Downstream implications: The key influences which downstream operator instance consumes the event and how aggregates align with sink partitions; combine with aggregate rekey when the post-aggregate key must match the partition scheme.
Output shaping
fields
fields projects the event to a subset of keys (whitelist) or removes listed keys.
fields field1, field2, field3
fields remove field1, field2
fields - field1, field2
Reducing payload size before heavy eval or sinks lowers memory and serialization cost.
rename
rename maps existing field paths to new names without transforming values.
rename old_field AS new_field
rename field1 AS new1, field2 AS new2
output
output selects a single field and exposes its value as the pipeline result for stages that expect a scalar or explicitly typed text payload (certain sink encodings).
output field_name
output field_name type=string
| Behavior | Semantics |
|---|---|
| Scalar extraction | The engine projects one field’s value as the primary emission for the stage result. |
| Payload replacement | The downstream record serializes around that scalar (or typed string) rather than the full JSON object unless the task merges metadata separately. |
| Downstream serialization | type= hints string coercion for wire formats that require text. |
| Single-field emission | Multiple output stages in one logical pipeline are invalid or last-wins per task grammar—see Reference. |
Examples
Pipeline compositions
parse_json raw_data | eval total = price * quantity | fields total
parse_csv data |
eval total = price * quantity |
eval tax = total * 0.08 |
eval final = total + tax |
rename final AS order_total |
fields order_total
user.age > 25 |
eval status = if(premium, "vip", "regular") |
fields name, status
Predicate, transform, and window patterns
user.name != null AND user.email != null AND user.age > 0
eval full_name = first_name + " " + last_name
eval age_group = case(age < 18, "minor", age < 65, "adult", "senior")
eval is_high_value = amount > 1000
sum(amount) AS revenue timespan=1d group_by date
count AS action_count timespan=1h group_by user_id where action = "purchase"
eval ratio = if(divisor != 0, dividend / divisor, 0)
eval name = coalesce(user.name, "Unknown")
Non-normative sample payloads
The JSON below is illustrative only; it does not define schema requirements. Use for manual tests or parse_json fixtures.
E-commerce order
{
"order_id": "ORD-123",
"customer": {
"name": "Alice",
"email": "alice@example.com",
"tier": "premium"
},
"items": [
{"name": "Laptop", "price": 999.99, "quantity": 1},
{"name": "Mouse", "price": 29.99, "quantity": 2}
],
"discount_code": "SAVE10"
}
Log entry
{
"timestamp": "2024-01-20T14:30:25Z",
"level": "ERROR",
"message": "Database connection failed",
"details": "timeout=30s, retries=3, host=db-prod-01"
}
User event
{
"user_id": 123,
"action": "purchase",
"timestamp": 1640995200,
"amount": 99.99,
"product": "Laptop",
"category": "Electronics"
}
Performance and runtime considerations
| Concern | Guidance |
|---|---|
| Parse cost | parse_regex, parse_xml, and large parse_json on wide strings dominate CPU; filter before parse when the predicate does not depend on parsed fields. |
| Regex backtracking | ~= and parse_regex patterns with nested quantifiers risk exponential backtracking; prefer bounded classes and anchors. |
| Memory / state | Long timespan, high-cardinality group_by, and session windows retain more in-flight state. |
| Aggregation cost | More functions per window and more keys increase merge work at emit time. |
| Projection | Early fields drops large blobs before eval and aggregations, reducing per-event memory and serialization volume. |
| Type coercion | Repeated implicit coercion in eval adds overhead; coerce once with to_number / to_string. |
Errors
Failures fall into overlapping categories below; the exact error code and message depend on the engine build.
| Category | Description |
|---|---|
| Validation failure | Pipeline or query fails static checks (syntax, unknown function, illegal token order) before execution. |
| Execution-time errors | A stage evaluates at runtime and encounters an illegal value (for example division by zero, missing path where required). |
| Parse-time errors | A parse_* stage receives input that does not match the expected format. |
| Runtime failure model | The task may drop the event, retry per connector policy, or surface the error to observability depending on task type—see task and stream documentation. |
| Message / code (typical) | Cause | Mitigation |
|---|---|---|
FieldNotFound | Resolved path missing on the event | Correct the path; use coalesce or guards |
InvalidSyntax | Token order or spelling | Compare with Reference |
TypeMismatch | String vs number, etc. | Insert to_string / to_number / to_boolean |
DivisionByZero | Divisor evaluates to zero | Guard with if |
ParseError | Input not valid for parse_* | Inspect raw field; where before parse |