roboto.experimental.topics.decode.parquet#

Module Contents#

roboto.experimental.topics.decode.parquet.CACHED_PARQUET_NAME_PATTERN = '{fs_node_id}.parquet'#

Filename template for locally cached Parquet files; keyed on the stable file id.

roboto.experimental.topics.decode.parquet.decode_parquet_batches(scan_task, timestamp, window, projection_paths, params)#

Wrap each surviving Parquet row group into a RecordBatch, prefixed with the stored-time column.

Empty row groups are skipped. The timestamp column name is suffixed until it stops colliding with a projected column, the same disambiguation the MCAP path applies.

Parameters:
Return type:

collections.abc.Generator[pyarrow.RecordBatch, None, None]

roboto.experimental.topics.decode.parquet.parquet_filtered_row_groups(scan_task, timestamp, window, projection_paths, params)#

Yield each surviving row group as (projected_table, stored_timestamps).

The table holds the projected columns only (the timestamp column is read for filtering and dropped when the projection omits it); rows are filtered to window (inclusive on both ends), and stored_timestamps is the aligned int64 nanosecond column.

Parameters:
Return type:

collections.abc.Generator[tuple[pyarrow.Table, pyarrow.Int64Array], None, None]

roboto.experimental.topics.decode.parquet.row_group_fully_in_window(row_group_metadata, timestamp, start, end)#

Whether column-chunk statistics prove every row’s timestamp is non-null and within the inclusive window.

Conservative: any absent or incomplete statistic returns False. Returns True only when the timestamp column chunk reports zero nulls and a [min, max] range that sits inside [start, end] — conditions under which the row-level mask would be all-true.

Parameters:
Return type:

bool