Skip to main content
Version: 2.0.0 (Latest)

Object storage connector

Class: object_storageSink-only in the UI: subscribes to a stream and writes objects to S3-compatible storage as Parquet or JSON Lines, with optional compression and batching for large uploads.

Create and edit under Sinks only. Advanced Settings may expose more runtime options depending on deployment and permissions.

Source and sink behavior

RoleBehavior
SourceNot supported in the UI for this class (type must be sink).
SinkConsumes the subscribed stream, batches events, and uploads objects using bucket, region, optional custom endpoint, and format settings.
StreamsUpstream tasks / pipelines must publish to the stream id this sink consumes (Streams).

Required fields

Every connector row

FieldRequiredNotes
nameYesDisplay name; id derived from it.
classYesMust be object_storage.
streamYesResolved stream id (sink reads here).
typeYesMust be sink (UI rejects source for this class).
configYesClass-specific object; see below.

Class object_storage — required configuration

SettingRequiredNotes
bucketYesTarget bucket (UI errors if empty).
regionConditionalRequired for default AWS-style endpoints; optional when a custom endpoint is set—confirm for your deployment.

UI validation

  • emit_verification_manifest requires integrity features to be enabled.
  • max_batch_bytesmax_object_bytes when both are set (same relationship for nested upload.batch.max_bytes vs upload.object.max_bytes when present).
  • format: parquet must not combine with outer compression: gzip (Core + UI).

Create connector

  1. Open SinksCreate.
  2. Set Class to Object storage (S3-compatible), set Sink Name, stream behavior, and Enabled.
  3. Enter bucket and region (or custom endpoint per form).
  4. Choose format, compression, batch/object size limits, and credentials.
  5. Save, then ensure tasks publish into the subscribed stream.

Sink (UI)

Create New Sink modal with class Object storage S3-compatible, Auto Create Stream checked, Bucket and Region
The Object storage sink connector form.
UI areaConnector settings (typical)
Bucket / regionbucket, region
Endpointobject_storage_endpoint, force_path_style
Prefix / layoutprefix
Formatobject_storage_format, parquet_compression, …
Batch / sizemax_batch_*, max_object_bytes
Credentialsrole_arn, keys, or instance metadata patterns per build

Configuration

Bucket and layout

  • bucket, region, prefix, object_storage_endpoint, force_path_style — Namespace and S3-compatible API targets (MinIO, ECS, etc.).

Format

  • object_storage_formatjson_lines vs Parquet plus Parquet row-group and compression options.

Batching and size limits

  • max_batch_events, max_batch_bytes, max_batch_age_ms, max_object_bytes — Flush cadence and object caps.

Credentials and IAM

  • role_arn, role_session_name, external_id — Assume-role style access when supported.

Reliability

  • max_retries, retry_initial_backoff_ms, integrity_enabled, emit_verification_manifest — Retries and optional integrity features when present.

Timestamps

  • timestamp — How event time is reflected in object metadata or columns.

Runtime behavior

  • The sink runs after deployment when Enabled; it drains the stream and uploads asynchronously according to batch settings.
  • Disabled sinks do not write objects.

Performance and operational notes

  • Right-size batch thresholds to object size limits and storage rate policies.
  • Prefer integrity manifests when compliance requires end-to-end object verification.