Version: 2.0.0 (Latest)

Monitoring

Monitoring is the runtime operations and observability surface for a single PADAS Core: operators watch live data movement across streams, tasks, sources, and sinks, and pivot into stream diagnostics, live event inspection, PDL querying, and system telemetry without leaving one workspace. It replaces separate Streams-only and Core-only monitoring pages—this is the unified streaming operations console for pipeline health, throughput analysis, and incident response.

The UI combines live telemetry with event visibility: tables expose EPS, drops, and latency; Query and Monitor open runtime stream tooling; the metrics tab charts resource saturation and backpressure-related signals. Together they read like a Kafka-style stream observability layer paired with SIEM-grade runtime diagnostics—grounded in production troubleshooting, not a passive dashboard.

Runtime scope

Rule	Operational detail
Core-local metrics	Every number and row reflects the selected Core only—runtime telemetry is not normalized across engines.
Live engine behavior	Readouts come from the running Core’s APIs; Monitoring shows runtime state, not registry intent.
Per-Core variance	Throughput, drops, and EPS can differ sharply between Cores; switch the selector to compare—there is no cluster-wide aggregation.
No merged topology intent	Graph placement and definitions remain authoritative under Pipelines (registry intent) and Management → Pipelines (assign/deploy); this page answers what is happening now on disk and wire.

Monitoring workspace

The workspace is a single operational monitoring surface: Streams, Tasks, Sources, and Sinks share one scrollable context so runtime bottlenecks surface faster—ingress pressure, stream congestion, task lag, and sink stalls appear side by side.

Unified topology visibility — User pipelines and internal streams (metrics, internal, and similar) sit in the same Streams lens so platform runtime observability is visible next to business event flow.
Core ribbon — Status, uptime, aggregate event volume, engine version, and counts (connectors, streams, tasks) give a pipeline health pulse before you drill into rows.
Pivot without navigation — Toggle from component tables to the system telemetry view to correlate throughput swings with CPU, memory, and internal counters.

Monitoring Core selector streams tasks sources sections and actions — Monitoring workspace.

Streams visibility

Streams are the backbone of stream diagnostics: each row shows event flow health—rates, drops, subscriber geometry, and uptime—so downstream pressure and routing mistakes stand out.

Internal streams — Platform channels appear alongside application streams; they matter for metrics fan-out, control traffic, and explaining “missing” volume during deep dives.
Producers / consumers — Seeing who publishes and subscribes helps debug stalled consumers, orphaned sinks, or tasks not attached to the stream you expected.
Congestion signals — Rising dropped events, flattening EPS, or widening ingress vs egress gaps often precede stream congestion or backpressure symptoms—pair with Monitor / Query on the same stream.

Tasks, sources, and sinks extend the same story: task rows expose mode, stream wiring, pool usage, and processing latency; connector rows expose class, enablement, events in/out, EPS, and last error for connector saturation and failure triage.

Runtime metrics

Use row metrics as live telemetry for throughput analysis and runtime troubleshooting:

Signal	Production interpretation
EPS (avg / peak)	EPS collapse after a deploy often flags a failed pipeline slice, blocked stream, or silent connector; sustained low EPS vs peers hints stalled consumers or throttled ingress.
Events in / out / count	Throughput imbalance (ingress high, egress flat) suggests downstream congestion or an overloaded task; the inverse may indicate idle sinks or misrouted keys.
Dropped events	Abnormal drops plus flat EPS can mean buffer pressure or publish-side rejection; spikes during incidents deserve correlation with system telemetry and connector errors.
Avg processing (ms) (tasks)	Elevated latency with stable ingest points to overloaded tasks, hot partitions, or expensive PDL on the hot path.
Last error (connectors)	Shortcut to connector saturation, auth, or destination faults before log diving.

Treat these as runtime metrics that complement—not replace—live event inspection and stream querying.

Querying stream data

Query runs PDL directly against runtime stream data: the editor, time window, refresh cadence, and row limits target the selected stream’s live tail and, where retention allows, historical segments. That makes Query the fastest path for ad-hoc diagnostics during incidents—validate parsers, filters, and eval logic on real traffic without replay tooling.

Control	When operators use it
Run Query	One-shot replay-style investigation over the configured window—ideal for proving a hypothesis on retained or sliding history.
Run & Capture	Sampling runs that retain result sets for tickets, comparisons, or follow-up—operational distinction: structured capture of query outcomes vs ephemeral scrolling.

WAL-backed streams with retention may expose enough history for replay investigation of recent failures; in-memory or short-retention streams emphasize near-live behavior—see Streams for durability expectations.

Results surface as event id, timestamp, and JSON payloads so teams can confirm fields before touching registry definitions.

Stream query modal PDL editor time window Run Query results — Runtime stream query.

Syntax references: PDL Quick Reference, PDL Reference.

Live monitoring

Monitor delivers near real-time event visibility: a LIVE feed with refresh intervals, limits, stop/start controls, and inline filtering so operators watch bytes-on-wire behavior during rollout validation or parser troubleshooting.

Capture from Monitor (or related controls) seeds Testing with replay datasets—freeze production-shaped events, then iterate PDL in the Testing workspace without mutating live pipeline execution. That connects live stream observability to runtime-safe experimentation; Testing remains its own validation surface, not a nested mode inside Monitoring.

Live monitoring feed with LIVE badge stream events table — Live stream monitoring.

Registry stream inspection

Registry opens runtime registry inspection for the stream row: canonical stream metadata—ids, status, activity timestamps, nested WAL configuration, and related fields—in tree or raw JSON, with search, copy, and download.

Use it when runtime behavior diverges from expectations: verify persistence flags, confirm WAL is enabled/disabled as designed, validate activity timestamps during deployment troubleshooting, or export evidence for change review. It complements Streams by showing what the engine currently materialized, not only what was authored.

Registry stream modal JSON tree status WAL fields — Registry stream inspection.

System metrics

The metrics tab is the system telemetry dashboard for the same Core: preset windows (15m–7d), selectable series (CPU, memory, connector/stream/task counters, backpressure totals, processed/dropped aggregates), and synchronized charts for runtime stability analysis.

Operators correlate throughput dips with resource saturation, validate capacity after post-deploy regression windows, and tie stream congestion symptoms to host pressure or Runtime Engine resource limits. This is enterprise-grade observability layered on top of row-level pipeline diagnostics—not “CPU-only monitoring.”

System metrics charts CPU memory metric pills time range — System telemetry dashboard.

Troubleshooting workflows

Playbooks are maintained centrally in Troubleshooting & Logs → Monitoring and pipeline playbooks.

warning

Monitoring is runtime-focused: it exposes live telemetry, event visibility, and stream diagnostics—it does not author registry objects or deploy topology. Make registry edits under Pipelines and assign/deploy under Management → Pipelines; use Control Tower for graph-level execution controls. Monitoring reflects runtime state, not intended topology alone.

Control Tower — live pipeline graph and graph-level runtime controls
Testing — replay datasets, PDL validation, capture hand-off from Monitor
Advanced — Tasks, sources & sinks — runtime start / stop on the selected Core
Streams — WAL, retention, buffering definitions
Tasks — task modes and stream wiring
Pipelines — pipeline topology authoring
REST API Reference — /api/v1/*, runtime status
PDL Quick Reference — Query editor syntax
Core concepts — Observability — metrics and internal streams background
Glossary — EPS, throughput, runtime diagnostics, internal stream terms

Runtime scope​

Monitoring workspace​

Streams visibility​

Runtime metrics​

Querying stream data​

Live monitoring​

Registry stream inspection​

System metrics​

Troubleshooting workflows​

Related pages​