roboto.experimental.topics.decode.scan_task#

Module Contents#

roboto.experimental.topics.decode.scan_task.ScanTaskDecoder#

Decodes one scan task into a stream of native-order RecordBatches.

Called with the scan task, its partition (for the timestamp designation), the window translated into the partition’s stored-time domain, the projection restricted to the scan task’s subtree. Produced timestamps are in the stored-time domain; the executor applies the partition’s time_offset_ns.

roboto.experimental.topics.decode.scan_task.make_scan_task_decoder(params, schema_fields, projection_paths)#

Bind execution inputs into a ScanTaskDecoder.

The returned decoder decodes one scan task (one file, one format), filtered and projected, dispatching on the scan task’s server-declared format.

Timestamps are keyed off the partition’s timestamp designation and produced in the stored-time domain; the caller applies the partition’s time_offset_ns. Rows are filtered to the decoder’s window (inclusive on both ends) and projected to its projection_paths.

The schema tree needed to decode MCAP is derived from schema_fields and projection_paths, but only when the plan actually contains an MCAP scan task: the first such task builds it once, under a lock, and the decode workers then share it read-only. A pure-Parquet plan never builds one (Parquet decodes column-wise from its own file footer).

Parameters:
Returns:

A decoder that, given a scan task, its plan partition, an inclusive time window, and the projected field paths the scan task covers, returns the decoded scan task; consume its batches once.

Raises:
  • The returned decoder raises

  • NotImplementedError – The scan task’s storage format has no decoder, or a Parquet scan task is designated an envelope-derived timestamp.

Return type:

ScanTaskDecoder