roboto.experimental.topics.decode.schema_tree#
Typed schema tree for schema-driven MCAP decode.
A topic’s per-field schema (SchemaFieldRecord)
declares every field’s path, canonical type, and native type before any byte is
read. This module wraps those records into a navigable tree whose every column’s
Arrow type and per-leaf structural resolution is fixed statically, so the MCAP
decoder never samples a message to discover its shape.
Built once per read from the topic’s field declarations and consumed by the MCAP decode path. (Parquet reads take their Arrow types from the file’s own schema and do not use this tree.)
Module Contents#
- class roboto.experimental.topics.decode.schema_tree.SchemaNode#
One declared schema field, plus its declared child fields.
Children are held in fields-GET response order, parents before children. The node’s name is the leaf component of its path — the schema-native attribute name the decoder’s accumulator is keyed by — which is also its Arrow field name.
- arrow_type(dialect=McapDialect.OTHER)#
The Arrow type for this node’s column, derived entirely from the schema.
Canonical type decides the structure (struct vs. list vs. scalar); the native token decides scalar width and, for an array, list-nesting depth – a token declaring multiple dimensions (e.g.
float32[3][3]) types as a correspondingly nested list (list<list<float32>>). This is the structural inverse of the Arrow-to- canonical mapping inroboto.formats.parquet.arrow_to_roboto, so an MCAP-decoded column and the Parquet-ingested one for the same field agree.A multi-dimensional array whose schema records only a single, flattened suffix (e.g. a
float[3][3]persisted asfloat32[9]) still types as a 1-D list; faithful nesting requires the native token to carry the full suffix chain.dialectresolves the two framework-dependent details a canonical type cannot carry: the signedness ofbyte/charscalars and the subfield names of ROS time structs. It defaults toOTHER, which keeps the canonical, framework-neutral typing and naming for non-ROS data and dialect-agnostic callers.- Parameters:
dialect (roboto.formats.mcap.McapDialect)
- Return type:
pyarrow.DataType
- property canonical: roboto.domain.topics.record.CanonicalDataType#
The field’s normalized, encoding-agnostic data type.
- Return type:
- children: tuple[SchemaNode, Ellipsis]#
- property is_array: bool#
Whether this node is a sequence (its decoded value is a list).
- Return type:
bool
- property is_struct: bool#
Whether this node is a struct with declared children (its decoded value is a mapping).
- Return type:
bool
- leaves()#
The scalar / whole-list leaves under this node, for accessor compilation.
A struct or array-of-struct descends into its children; a scalar or array-of- scalar is itself a leaf, whose accessor reads the whole list value at once.
- Return type:
list[SchemaNode]
- property name: str#
the leaf component of the field’s path.
Doubles as the decode accumulator’s key and the Arrow struct/column field name, so accumulated values land under the field that declares them by construction.
- Type:
The schema-native attribute name
- Return type:
str
- property native: str#
The field’s native, framework-specific type token (e.g.
"float32","esc_report[8]").- Return type:
str
- property path: roboto.domain.topics.record.FieldPath#
This field’s path components, outermost to leaf.
- Return type:
roboto.domain.topics.record.FieldPath
- class roboto.experimental.topics.decode.schema_tree.SchemaTree#
A topic’s full declared schema, with a projection-restricted column view.
A topic schema has many top-level fields (e.g.
header,pose,twist), each the root of its own subtree, so the top level is a forest of trees rather than a single-rooted tree — henceforest.forestandnodes_by_pathhold the whole schema and never change;columnsis the subset a given read projects.restrict()re-derivescolumnsfor a narrower projection against the same shared forest.- columns: tuple[SchemaNode, Ellipsis]#
Top-level output columns, restricted to the current projection, in projection order.
- forest: tuple[SchemaNode, Ellipsis]#
Every top-level field with its full declared subtree, in fields-GET order.
- nodes_by_path: dict[roboto.domain.topics.record.FieldPath, SchemaNode]#
Every node in the full forest, keyed by path, for ancestor lookup.
- resolution_for(leaf)#
The structural accessor resolution for
leaf, derived from the schema.Splits the leaf’s path at each enclosing list into a per-element sequence resolution, terminating in a simple attribute chain. Path components are the canonical (schema) field names; the runtime ROS time-field renaming is applied separately against a sample by
remap_time_fields().- Parameters:
leaf (SchemaNode)
- Return type:
roboto.formats.mcap.Resolution
- resolution_for_path(path)#
The structural accessor resolution for a bare
path(e.g. a timestamp field).Like
resolution_for()but keyed by path, for fields read without an output column (the designated timestamp).- Parameters:
path (roboto.domain.topics.record.FieldPath)
- Return type:
roboto.formats.mcap.Resolution
- restrict(projection_paths)#
A tree whose
columnscover exactlyprojection_paths(the full forest is shared).- Parameters:
projection_paths (collections.abc.Sequence[roboto.domain.topics.record.FieldPath])
- Return type:
- roboto.experimental.topics.decode.schema_tree.array_dimension_count(token)#
The number of trailing array suffixes a native type token declares – its list-nesting depth.
float32-> 0,float32[]-> 1,float32[3][3]-> 2,float[2][2][2]-> 3. A token that declares more than one dimension types as a correspondingly nested Arrow list (float32[3][3]->list<list<float32>>), matching the nested value the decoder yields for such a field.- Parameters:
token (str)
- Return type:
int
- roboto.experimental.topics.decode.schema_tree.build_schema_tree(fields, projection_paths)#
Build a
SchemaTreefrom the fields-GET records, restricted toprojection_paths.Each record is placed under its parent (the path with its last component dropped), preserving GET order among siblings;
columnsis then the projection-restricted view. A projected parent pulls in its whole declared subtree. An empty projection yields no columns (decode emits timestamp-only batches).- Parameters:
fields (collections.abc.Sequence[roboto.domain.topics.record.SchemaFieldRecord])
projection_paths (collections.abc.Sequence[roboto.domain.topics.record.FieldPath])
- Return type:
- roboto.experimental.topics.decode.schema_tree.dialect_time_name(name, dialect)#
The subfield name
dialectcarries for a canonical ROS time fieldname.Non-time names, and every name under the
OTHERdialect, are returned unchanged.- Parameters:
name (str)
dialect (roboto.formats.mcap.McapDialect)
- Return type:
str
- roboto.experimental.topics.decode.schema_tree.logger#
- roboto.experimental.topics.decode.schema_tree.strip_array_suffixes(token)#
Strip the entire trailing chain of
[N]/[]array suffixes from a native type token.float32[3][3]->float32,float32[]->float32,float32->float32. Unlikestrip_fixed_size(), which removes only the outermost dimension, this collapses a multi-dimensional token down to its base scalar so the leaf element type can be derived from it.- Parameters:
token (str)
- Return type:
str
- roboto.experimental.topics.decode.schema_tree.strip_fixed_size(token)#
Strip one trailing
[N]/[]array suffix from a native type token.float64[36]->float64,uint8_t[50]->uint8_t,char[9]->char,uint8[]->uint8. A token with no suffix is returned unchanged. Removes only the outermost dimension; usestrip_array_suffixes()to collapse a multi-dimensional token.- Parameters:
token (str)
- Return type:
str