roboto.experimental.topics.decode.mcap#

Module Contents#

roboto.experimental.topics.decode.mcap.BATCH_ROW_COUNT = 1024#

Row count per RecordBatch built from a decoded message stream.

class roboto.experimental.topics.decode.mcap.McapDecodePlan#

The fixed decode shape for one MCAP scan task, settled from the schema and projection.

Everything here is independent of the message stream: the output columns and their Arrow types, the leaf-most paths to read, how the timestamp is sourced, and the disambiguated timestamp column name. No build strategy and no per-leaf classification — structure comes from the schema, so there is nothing to infer from a sample.

columns: tuple[roboto.experimental.topics.decode.schema_tree.SchemaNode, Ellipsis]#

The output columns (top-level projected fields), in projection order.

decode_paths: list[roboto.domain.topics.record.FieldPath]#

The leaf-most paths handed to the reader (column leaves plus the timestamp field).

timestamp_name: str#

The emitted timestamp column name, disambiguated from any output-column collision.

timestamp_path: roboto.domain.topics.record.FieldPath#

The schema-field timestamp’s path, or () for an envelope timestamp.

roboto.experimental.topics.decode.mcap.decode_mcap_batches(scan_task, timestamp, window, projection_paths, schema_tree, params)#

Build nested RecordBatches directly from decoded MCAP messages, typed from the schema.

The column set, order, and Arrow types are fixed from the schema tree restricted to the projection: one column per projected top-level field, every batch. Each column’s value is extracted per message through schema-derived accessors and transposed into a typed Arrow column. An unprojected timestamp field simply gets no column (the restricted tree excludes it); it is still decoded to key the rows.

Batches come out in the reader’s native chunk order (its cross-chunk log-time sort is skipped), which is the file’s persisted order — the order the partition overlay and cross-partition concatenation rely on (see DecodedScanTask.batches()).

Raises:

RobotoInternalException – No schema tree was supplied; production always supplies one resolved from the topic’s schema.

Parameters:
Return type:

collections.abc.Generator[pyarrow.RecordBatch, None, None]

roboto.experimental.topics.decode.mcap.extract_row_timestamp(row, timestamp)#

Read the designated timestamp field out of a decoded row, normalized to nanoseconds.

Parameters:
Return type:

Optional[int]

roboto.experimental.topics.decode.mcap.logger#
roboto.experimental.topics.decode.mcap.mcap_decode_paths(tree, timestamp)#

The leaf-most paths the reader must decode: every output column’s leaves, plus the timestamp field.

A schema-field timestamp outside the projection is decoded anyway (rows are keyed by it); the restricted tree has already excluded it from the output columns, so it gets no column even though it is read.

Parameters:
Return type:

list[roboto.domain.topics.record.FieldPath]

roboto.experimental.topics.decode.mcap.resolve_mcap_decode_plan(tree, timestamp)#

Settle the fixed decode shape from a projection-restricted schema tree.

Computed once, before the fetch, and reused for every message and batch.

Parameters:
Return type:

McapDecodePlan

roboto.experimental.topics.decode.mcap.stored_timestamp_for_message(timestamp, timestamp_accessors, message, log_time, publish_time)#

The message’s stored-time timestamp, or None when it carries no designated value.

Envelope timestamps (log time, publish time) are taken straight off the reader and coerced to int (the reader types them int | float only to carry its math.inf end-of-stream sentinel, which a real message never bears); a schema-field timestamp is read through its accessors and normalized to nanoseconds. Only a schema-field message that lacks the field returns None — the caller skips such messages.

Parameters:
Return type:

Optional[int]