roboto.experimental.topics.decode.common#
Module Contents#
- class roboto.experimental.topics.decode.common.DecodedScanTask(*, batches_factory)#
One scan task decoded into RecordBatches.
Decoding (including the network fetch) starts when the iterator returned by
batches()is first advanced; consume it once.- Parameters:
batches_factory (Callable[[], collections.abc.Iterator[pyarrow.RecordBatch]])
- batches()#
Decode into RecordBatches, each prefixed with a metadata-marked stored-time column.
Rows come out in the scan task’s persisted order. For MCAP, that’s the reader’s native chunk order, which is itself the persisted order. For Parquet, that’s the file’s row order. Every representation of a partition must share one persisted row order.
Batch boundaries themselves carry no meaning.
- Return type:
collections.abc.Iterator[pyarrow.RecordBatch]
- class roboto.experimental.topics.decode.common.ScanTaskDecodeParams#
Execution inputs a scan-task decode needs beyond the read plan.
The plan says what to read; these supply how to reach and cache it: a resolver that mints download URLs, plus the local-disk cache policy and directory. Caching applies to Parquet scan tasks only — MCAP always streams.
- cache_dir: pathlib.Path#
Directory Parquet files are cached under.
- cache_policy: roboto.storage.CachePolicy#
Whether fetched Parquet files are cached to local disk.
- signed_url_resolver: SignedUrlResolver#
Mints a signed download URL for a scan task’s backing file.
- roboto.experimental.topics.decode.common.SignedUrlResolver#
Resolves a file id (
fs_node_id) to a signed download URL.
- roboto.experimental.topics.decode.common.disambiguated_timestamp_name(taken)#
The emitted timestamp column name, suffixed with
_until it collides with no output column.takenis the set of output-column names the timestamp must not shadow — the MCAP path passes the projected schema fields’ names; the Parquet path passes the file’s own Arrow column names.- Parameters:
taken (collections.abc.Iterable[str])
- Return type:
str
- roboto.experimental.topics.decode.common.leaf_most(paths)#
Drop every path that has a strict descendant in the set.
A projection enumerates registered fields at every level (a struct parent and its children both appear). Decoding the parent would read its whole subtree, defeating an exclusion of one child — so decoders read only the leaf-most projected paths, which together cover exactly the projected set.
- Parameters:
paths (collections.abc.Sequence[roboto.domain.topics.record.FieldPath])
- Return type:
list[roboto.domain.topics.record.FieldPath]