roboto.experimental.topics.decode.scan_task#
Module Contents#
- roboto.experimental.topics.decode.scan_task.ScanTaskDecoder#
Decodes one scan task into a stream of native-order RecordBatches.
Called with the scan task, its partition (for the timestamp designation), the window translated into the partition’s stored-time domain, the projection restricted to the scan task’s subtree. Produced timestamps are in the stored-time domain; the executor applies the partition’s
time_offset_ns.
- roboto.experimental.topics.decode.scan_task.make_scan_task_decoder(params, schema_fields, projection_paths)#
Bind execution inputs into a
ScanTaskDecoder.The returned decoder decodes one scan task (one file, one format), filtered and projected, dispatching on the scan task’s server-declared
format.Timestamps are keyed off the partition’s timestamp designation and produced in the stored-time domain; the caller applies the partition’s
time_offset_ns. Rows are filtered to the decoder’swindow(inclusive on both ends) and projected to itsprojection_paths.The schema tree needed to decode MCAP is derived from
schema_fieldsandprojection_paths, but only when the plan actually contains an MCAP scan task: the first such task builds it once, under a lock, and the decode workers then share it read-only. A pure-Parquet plan never builds one (Parquet decodes column-wise from its own file footer).- Parameters:
params (roboto.experimental.topics.decode.common.ScanTaskDecodeParams) – Execution inputs (URL resolution and cache policy).
schema_fields (collections.abc.Sequence[roboto.domain.topics.record.SchemaFieldRecord]) – Every declared field for the plan’s schema, as returned by the fields GET.
projection_paths (collections.abc.Sequence[roboto.domain.topics.record.FieldPath]) – The plan’s projection as explicit field paths; restricts the schema tree’s columns.
- Returns:
A decoder that, given a scan task, its plan partition, an inclusive time window, and the projected field paths the scan task covers, returns the decoded scan task; consume its batches once.
- Raises:
The returned decoder raises –
NotImplementedError – The scan task’s storage format has no decoder, or a Parquet scan task is designated an envelope-derived timestamp.
- Return type:
ScanTaskDecoder