roboto.experimental.topics.decode.mcap#
Module Contents#
- roboto.experimental.topics.decode.mcap.BATCH_ROW_COUNT = 1024#
Row count per RecordBatch built from a decoded message stream.
- class roboto.experimental.topics.decode.mcap.McapDecodePlan#
The fixed decode shape for one MCAP scan task, settled from the schema and projection.
Everything here is independent of the message stream: the output columns and their Arrow types, the leaf-most paths to read, how the timestamp is sourced, and the disambiguated timestamp column name. No build strategy and no per-leaf classification — structure comes from the schema, so there is nothing to infer from a sample.
- columns: tuple[roboto.experimental.topics.decode.schema_tree.SchemaNode, Ellipsis]#
The output columns (top-level projected fields), in projection order.
- decode_paths: list[roboto.domain.topics.record.FieldPath]#
The leaf-most paths handed to the reader (column leaves plus the timestamp field).
- timestamp_name: str#
The emitted timestamp column name, disambiguated from any output-column collision.
- timestamp_path: roboto.domain.topics.record.FieldPath#
The schema-field timestamp’s path, or
()for an envelope timestamp.
- roboto.experimental.topics.decode.mcap.decode_mcap_batches(scan_task, timestamp, window, projection_paths, schema_tree, params)#
Build nested RecordBatches directly from decoded MCAP messages, typed from the schema.
The column set, order, and Arrow types are fixed from the schema tree restricted to the projection: one column per projected top-level field, every batch. Each column’s value is extracted per message through schema-derived accessors and transposed into a typed Arrow column. An unprojected timestamp field simply gets no column (the restricted tree excludes it); it is still decoded to key the rows.
Batches come out in the reader’s native chunk order (its cross-chunk log-time sort is skipped), which is the file’s persisted order — the order the partition overlay and cross-partition concatenation rely on (see
DecodedScanTask.batches()).- Raises:
RobotoInternalException – No schema tree was supplied; production always supplies one resolved from the topic’s schema.
- Parameters:
scan_task (roboto.experimental.topics.read_plan.ReadPlanScanTask)
timestamp (roboto.experimental.topics.read_plan.ReadPlanTimestamp)
projection_paths (collections.abc.Sequence[roboto.domain.topics.record.FieldPath])
schema_tree (Optional[roboto.experimental.topics.decode.schema_tree.SchemaTree])
params (roboto.experimental.topics.decode.common.ScanTaskDecodeParams)
- Return type:
collections.abc.Generator[pyarrow.RecordBatch, None, None]
- roboto.experimental.topics.decode.mcap.extract_row_timestamp(row, timestamp)#
Read the designated timestamp field out of a decoded row, normalized to nanoseconds.
- Parameters:
row (dict[str, Any])
timestamp (roboto.experimental.topics.read_plan.ReadPlanTimestamp)
- Return type:
Optional[int]
- roboto.experimental.topics.decode.mcap.logger#
- roboto.experimental.topics.decode.mcap.mcap_decode_paths(tree, timestamp)#
The leaf-most paths the reader must decode: every output column’s leaves, plus the timestamp field.
A schema-field timestamp outside the projection is decoded anyway (rows are keyed by it); the restricted tree has already excluded it from the output columns, so it gets no column even though it is read.
- Parameters:
- Return type:
list[roboto.domain.topics.record.FieldPath]
- roboto.experimental.topics.decode.mcap.resolve_mcap_decode_plan(tree, timestamp)#
Settle the fixed decode shape from a projection-restricted schema tree.
Computed once, before the fetch, and reused for every message and batch.
- Parameters:
- Return type:
- roboto.experimental.topics.decode.mcap.stored_timestamp_for_message(timestamp, timestamp_accessors, message, log_time, publish_time)#
The message’s stored-time timestamp, or
Nonewhen it carries no designated value.Envelope timestamps (log time, publish time) are taken straight off the reader and coerced to
int(the reader types themint | floatonly to carry itsmath.infend-of-stream sentinel, which a real message never bears); a schema-field timestamp is read through its accessors and normalized to nanoseconds. Only a schema-field message that lacks the field returnsNone— the caller skips such messages.- Parameters:
timestamp (roboto.experimental.topics.read_plan.ReadPlanTimestamp)
timestamp_accessors (Optional[_LazyAccessors])
message (Any)
log_time (Union[int, float])
publish_time (Union[int, float])
- Return type:
Optional[int]