roboto.experimental.topics.decode.parquet#
Module Contents#
- roboto.experimental.topics.decode.parquet.CACHED_PARQUET_NAME_PATTERN = '{fs_node_id}.parquet'#
Filename template for locally cached Parquet files; keyed on the stable file id.
- roboto.experimental.topics.decode.parquet.decode_parquet_batches(scan_task, timestamp, window, projection_paths, params)#
Wrap each surviving Parquet row group into a RecordBatch, prefixed with the stored-time column.
Empty row groups are skipped. The timestamp column name is suffixed until it stops colliding with a projected column, the same disambiguation the MCAP path applies.
- Parameters:
scan_task (roboto.experimental.topics.read_plan.ReadPlanScanTask)
timestamp (roboto.experimental.topics.read_plan.ReadPlanTimestamp)
projection_paths (collections.abc.Sequence[roboto.domain.topics.record.FieldPath])
params (roboto.experimental.topics.decode.common.ScanTaskDecodeParams)
- Return type:
collections.abc.Generator[pyarrow.RecordBatch, None, None]
- roboto.experimental.topics.decode.parquet.parquet_filtered_row_groups(scan_task, timestamp, window, projection_paths, params)#
Yield each surviving row group as
(projected_table, stored_timestamps).The table holds the projected columns only (the timestamp column is read for filtering and dropped when the projection omits it); rows are filtered to
window(inclusive on both ends), andstored_timestampsis the aligned int64 nanosecond column.- Parameters:
scan_task (roboto.experimental.topics.read_plan.ReadPlanScanTask)
timestamp (roboto.experimental.topics.read_plan.ReadPlanTimestamp)
projection_paths (collections.abc.Sequence[roboto.domain.topics.record.FieldPath])
params (roboto.experimental.topics.decode.common.ScanTaskDecodeParams)
- Return type:
collections.abc.Generator[tuple[pyarrow.Table, pyarrow.Int64Array], None, None]
- roboto.experimental.topics.decode.parquet.row_group_fully_in_window(row_group_metadata, timestamp, start, end)#
Whether column-chunk statistics prove every row’s timestamp is non-null and within the inclusive window.
Conservative: any absent or incomplete statistic returns
False. ReturnsTrueonly when the timestamp column chunk reports zero nulls and a [min, max] range that sits inside[start, end]— conditions under which the row-level mask would be all-true.- Parameters:
row_group_metadata (pyarrow.parquet.RowGroupMetaData)
timestamp (roboto.formats.parquet.Timestamp)
start (int)
end (int)
- Return type:
bool