roboto.experimental.topics.decode.schema_tree#

Typed schema tree for schema-driven MCAP decode.

A topic’s per-field schema (SchemaFieldRecord) declares every field’s path, canonical type, and native type before any byte is read. This module wraps those records into a navigable tree whose every column’s Arrow type and per-leaf structural resolution is fixed statically, so the MCAP decoder never samples a message to discover its shape.

Built once per read from the topic’s field declarations and consumed by the MCAP decode path. (Parquet reads take their Arrow types from the file’s own schema and do not use this tree.)

Module Contents#

class roboto.experimental.topics.decode.schema_tree.SchemaNode#

One declared schema field, plus its declared child fields.

Children are held in fields-GET response order, parents before children. The node’s name is the leaf component of its path — the schema-native attribute name the decoder’s accumulator is keyed by — which is also its Arrow field name.

arrow_type(dialect=McapDialect.OTHER)#

The Arrow type for this node’s column, derived entirely from the schema.

Canonical type decides the structure (struct vs. list vs. scalar); the native token decides scalar width and, for an array, list-nesting depth – a token declaring multiple dimensions (e.g. float32[3][3]) types as a correspondingly nested list (list<list<float32>>). This is the structural inverse of the Arrow-to- canonical mapping in roboto.formats.parquet.arrow_to_roboto, so an MCAP-decoded column and the Parquet-ingested one for the same field agree.

A multi-dimensional array whose schema records only a single, flattened suffix (e.g. a float[3][3] persisted as float32[9]) still types as a 1-D list; faithful nesting requires the native token to carry the full suffix chain.

dialect resolves the two framework-dependent details a canonical type cannot carry: the signedness of byte/char scalars and the subfield names of ROS time structs. It defaults to OTHER, which keeps the canonical, framework-neutral typing and naming for non-ROS data and dialect-agnostic callers.

Parameters:

dialect (roboto.formats.mcap.McapDialect)

Return type:

pyarrow.DataType

property canonical: roboto.domain.topics.record.CanonicalDataType#

The field’s normalized, encoding-agnostic data type.

Return type:

roboto.domain.topics.record.CanonicalDataType

children: tuple[SchemaNode, Ellipsis]#
property is_array: bool#

Whether this node is a sequence (its decoded value is a list).

Return type:

bool

property is_struct: bool#

Whether this node is a struct with declared children (its decoded value is a mapping).

Return type:

bool

leaves()#

The scalar / whole-list leaves under this node, for accessor compilation.

A struct or array-of-struct descends into its children; a scalar or array-of- scalar is itself a leaf, whose accessor reads the whole list value at once.

Return type:

list[SchemaNode]

property name: str#

the leaf component of the field’s path.

Doubles as the decode accumulator’s key and the Arrow struct/column field name, so accumulated values land under the field that declares them by construction.

Type:

The schema-native attribute name

Return type:

str

property native: str#

The field’s native, framework-specific type token (e.g. "float32", "esc_report[8]").

Return type:

str

property path: roboto.domain.topics.record.FieldPath#

This field’s path components, outermost to leaf.

Return type:

roboto.domain.topics.record.FieldPath

record: roboto.domain.topics.record.SchemaFieldRecord#
class roboto.experimental.topics.decode.schema_tree.SchemaTree#

A topic’s full declared schema, with a projection-restricted column view.

A topic schema has many top-level fields (e.g. header, pose, twist), each the root of its own subtree, so the top level is a forest of trees rather than a single-rooted tree — hence forest. forest and nodes_by_path hold the whole schema and never change; columns is the subset a given read projects. restrict() re-derives columns for a narrower projection against the same shared forest.

columns: tuple[SchemaNode, Ellipsis]#

Top-level output columns, restricted to the current projection, in projection order.

forest: tuple[SchemaNode, Ellipsis]#

Every top-level field with its full declared subtree, in fields-GET order.

nodes_by_path: dict[roboto.domain.topics.record.FieldPath, SchemaNode]#

Every node in the full forest, keyed by path, for ancestor lookup.

resolution_for(leaf)#

The structural accessor resolution for leaf, derived from the schema.

Splits the leaf’s path at each enclosing list into a per-element sequence resolution, terminating in a simple attribute chain. Path components are the canonical (schema) field names; the runtime ROS time-field renaming is applied separately against a sample by remap_time_fields().

Parameters:

leaf (SchemaNode)

Return type:

roboto.formats.mcap.Resolution

resolution_for_path(path)#

The structural accessor resolution for a bare path (e.g. a timestamp field).

Like resolution_for() but keyed by path, for fields read without an output column (the designated timestamp).

Parameters:

path (roboto.domain.topics.record.FieldPath)

Return type:

roboto.formats.mcap.Resolution

restrict(projection_paths)#

A tree whose columns cover exactly projection_paths (the full forest is shared).

Parameters:

projection_paths (collections.abc.Sequence[roboto.domain.topics.record.FieldPath])

Return type:

SchemaTree

roboto.experimental.topics.decode.schema_tree.array_dimension_count(token)#

The number of trailing array suffixes a native type token declares – its list-nesting depth.

float32 -> 0, float32[] -> 1, float32[3][3] -> 2, float[2][2][2] -> 3. A token that declares more than one dimension types as a correspondingly nested Arrow list (float32[3][3] -> list<list<float32>>), matching the nested value the decoder yields for such a field.

Parameters:

token (str)

Return type:

int

roboto.experimental.topics.decode.schema_tree.build_schema_tree(fields, projection_paths)#

Build a SchemaTree from the fields-GET records, restricted to projection_paths.

Each record is placed under its parent (the path with its last component dropped), preserving GET order among siblings; columns is then the projection-restricted view. A projected parent pulls in its whole declared subtree. An empty projection yields no columns (decode emits timestamp-only batches).

Parameters:
Return type:

SchemaTree

roboto.experimental.topics.decode.schema_tree.dialect_time_name(name, dialect)#

The subfield name dialect carries for a canonical ROS time field name.

Non-time names, and every name under the OTHER dialect, are returned unchanged.

Parameters:
Return type:

str

roboto.experimental.topics.decode.schema_tree.logger#
roboto.experimental.topics.decode.schema_tree.strip_array_suffixes(token)#

Strip the entire trailing chain of [N]/[] array suffixes from a native type token.

float32[3][3] -> float32, float32[] -> float32, float32 -> float32. Unlike strip_fixed_size(), which removes only the outermost dimension, this collapses a multi-dimensional token down to its base scalar so the leaf element type can be derived from it.

Parameters:

token (str)

Return type:

str

roboto.experimental.topics.decode.schema_tree.strip_fixed_size(token)#

Strip one trailing [N]/[] array suffix from a native type token.

float64[36] -> float64, uint8_t[50] -> uint8_t, char[9] -> char, uint8[] -> uint8. A token with no suffix is returned unchanged. Removes only the outermost dimension; use strip_array_suffixes() to collapse a multi-dimensional token.

Parameters:

token (str)

Return type:

str