roboto.storage#

Remote file storage I/O.

Whole-file transfer (upload transactions, download sessions, credentials, and the object-store abstraction) for moving files in and out of Roboto storage, plus the range-reader, local cache, and sparse-buffer primitives for streaming byte-range reads that the format decoders in roboto.formats build on.

Submodules#

Package Contents#

class roboto.storage.AbortTransactionsRequest(/, **data)#

Bases: pydantic.BaseModel

Request payload for aborting file upload transactions.

Used to cancel ongoing file upload transactions, typically when uploads fail or are no longer needed. This cleans up any reserved resources and marks associated files as no longer pending.

Parameters:: data (Any)

transaction_ids: list[str]#: List of transaction IDs to abort.

class roboto.storage.BeginSignedUrlUploadRequest(/, **data)#

Bases: pydantic.BaseModel

Request payload to begin a single file upload with a signed URL.

Used for simpler upload scenarios where a pre-signed URL is preferred over temporary credentials. The returned URL can be used directly for uploading the file content.

Parameters:: data (Any)

association: roboto.association.Association#: The entity this file will be associated with (e.g., dataset, topic).

file_path: str#: Destination path for the file within the association.

file_size: int#: Size of the file in bytes.

origination: str | None = None#: Optional description of the upload source.

class roboto.storage.BeginSignedUrlUploadResponse(/, **data)#

Bases: pydantic.BaseModel

Response from beginning a single file upload.

Contains the upload ID for completing the transaction and a pre-signed URL that can be used to upload the file content directly.

Parameters:: data (Any)

upload_id: str#: Unique identifier for this upload transaction.

upload_url: str#: Pre-signed URL for uploading the file content.

class roboto.storage.BeginUploadRequest(/, **data)#

Bases: pydantic.BaseModel

Request payload to begin a batch file upload transaction.

Used to initiate a multi-file upload transaction for any association type (dataset, topic, etc.). Returns a transaction ID and upload mappings that specify where each file should be uploaded.

Parameters:: data (Any)

association: roboto.association.Association#: The entity these files will be associated with (e.g., dataset, topic).

device_id: str | None = None#: Optional identifier of the device that generated this data.

origination: str#: Description of the upload source (e.g., ‘roboto-sdk v1.0.0’).

resource_manifest: dict[str, int]#: Dictionary mapping destination file paths to file sizes in bytes.

class roboto.storage.BeginUploadResponse(/, **data)#

Bases: pydantic.BaseModel

Response from beginning a batch upload transaction.

Contains the transaction ID needed for subsequent progress reporting and completion calls, plus mappings from file paths to their upload URIs.

Parameters:: data (Any)

transaction_id: str#: Unique identifier for this upload transaction.

upload_mappings: dict[str, str]#: Dictionary mapping file paths to their S3 upload URIs.

class roboto.storage.CachePolicy#

Bases: str, enum.Enum

Governs whether a fetched data file is cached to local disk before reading.

The policy applies to formats with a disk-cache path (Parquet today); a format that always streams (MCAP) ignores it.

ADAPTIVE = 'adaptive'#: Reuse an already-cached file; otherwise download when the read projects enough columns (COLUMN_COUNT_LOCAL_CACHE_THRESHOLD) to justify it, and stream over HTTP when it does not.

ALWAYS = 'always'#: Download the file to the local cache before reading, regardless of how much of it the read projects.

NEVER = 'never'#: Always stream over HTTP; never write to local disk.

class roboto.storage.DownloadableFile#

Bases: TypedDict

A file to be downloaded from the Roboto Platform.

bucket_name: str#: Name of the bucket where the file is stored.

destination_path: pathlib.Path#: Local path where the file should be saved.

source_uri: str#

//bucket/key’).

Type:: Full URI of the file in cloud storage (e.g., ‘s3

class roboto.storage.FileService(roboto_client=None, object_store_registry=None)#

Application service for performing upload and download to the Roboto Platform.

Agnostic to object store provider.

Parameters:

roboto_client (Optional[roboto.http.RobotoClient])
object_store_registry (Optional[roboto.storage.object_store.StoreRegistry])

download(files, association, caller_org_id=None, on_progress=None)#

Download files from the Roboto Platform.

Parameters:

files (collections.abc.Sequence[roboto.storage.download_session.DownloadableFile]) – Sequence of files to download, each with source_uri and destination_path.
association (roboto.association.Association) – Association of the files to download.
caller_org_id (Optional[str]) – Optional organization ID for cross-org access.
on_progress (Optional[roboto.storage.object_store.OnProgress]) – Optional callback to be periodically called with the number of bytes downloaded.

Return type:

None

upload(files, association, destination_paths={}, batch_size=_DEFAULT_UPLOAD_BATCH_SIZE, device_id=None, caller_org_id=None, on_progress=None)#

Parameters:

files (collections.abc.Iterable[pathlib.Path])
association (roboto.association.Association)
destination_paths (collections.abc.Mapping[pathlib.Path, str])
batch_size (int)
device_id (Optional[str])
caller_org_id (Optional[str])
on_progress (Optional[roboto.storage.object_store.OnProgress])

Return type:

list[str]

class roboto.storage.HttpRangeReader(url, read_ahead_size=_READ_AHEAD_SIZE)#

A seekable, buffered byte-range reader backed by an HTTP URL.

Uses HTTP range requests so only the requested byte ranges are fetched, allowing efficient partial access to remote files (e.g., reading just the MCAP summary/index section at the end of a file without downloading the full data payload).

Reads are satisfied from an in-memory sparse cache. HTTP requests are only issued on a cache miss, fetching _READ_AHEAD_SIZE bytes at a time. Unlike a simple single-buffer approach, this cache retains all fetched regions, so seeking back to previously-read data doesn’t trigger re-fetches.

This class implements the IO[bytes] protocol methods needed by mcap.reader.

Uses urllib3 connection pooling to reuse HTTP connections across requests, reducing TCP handshake and TLS negotiation overhead.

Parameters:

url (str)
read_ahead_size (int)

close()#

Close the reader and release resources.

Return type:: None

prefetch_range(start, end)#

Prefetch a byte range using parallel HTTP requests.

Byte spans already in the cache (e.g., placed there by the footer read-behind at open, which covers the whole file when it is small) are not re-fetched; only the uncovered gaps are requested.

Parameters:

start (int) – Start byte offset (inclusive)
end (int) – End byte offset (inclusive)

Return type:

None

read(size=-1)#

Parameters:: size (int)
Return type:: bytes

readable()#

Return type:: bool

seek(offset, whence=0)#

Parameters:

offset (int)
whence (int)

Return type:

int

seekable()#

Return type:: bool

property size: int#

Get the total size of the remote file in bytes.

Return type:: int

tell()#

Return type:: int

writable()#

Return type:: bool

class roboto.storage.ReportUploadProgressRequest(/, **data)#

Bases: pydantic.BaseModel

Request payload for reporting file upload progress.

Used to notify the platform about the completion status of individual files within a batch upload transaction. This enables progress tracking and partial completion handling for large file uploads.

Parameters:: data (Any)

manifest_items: list[str]#: List of file URIs that have completed upload.

class roboto.storage.RobotoCredentials(/, **data)#

Bases: pydantic.BaseModel

Credentials returned from the Roboto Platform

Parameters:: data (Any)

access_key_id: str#

bucket: str#

expiration: datetime.datetime#

is_expired()#

Return type:: bool

region: str#

required_prefix: str#

secret_access_key: str#

session_token: str#

to_dict()#

Return type:: dict[str, Any]

to_object_store_credentials()#

Return type:: roboto.storage.object_store.Credentials

class roboto.storage.SparseBuffer(file_size)#

A seekable, read-only file-like object backed by sparse in-memory byte regions.

Stores fetched byte regions and provides a standard IO[bytes] interface for reading from them. Regions are automatically merged when they overlap or are adjacent, keeping the internal representation compact.

This is intended to be used as: 1. The cache backend for HttpRangeReader (sparse storage with smart fetching) 2. The stream for mcap.reader.SeekingReader after bulk-fetching byte ranges

Example

>>> buf = SparseBuffer(file_size=1000)
>>> buf.add_region(0, b"MCAP_MAGIC")  # header
>>> buf.add_region(900, b"footer_data")  # footer
>>> buf.seek(0)
0
>>> buf.read(10)
b'MCAP_MAGIC'

Parameters:: file_size (int)

add_region(offset, data)#

Store a byte region at the given file offset.

Merges with any overlapping or adjacent existing regions.

Parameters:

offset (int) – Byte offset within the virtual file.
data (bytes) – Raw bytes to store at that offset.

Return type:

None

clear()#

Remove all cached regions.

Return type:: None

find_region(start, size)#

Check if [start, start+size) is fully contained in a cached region.

Parameters:

start (int) – Start byte offset.
size (int) – Number of bytes.

Returns:

The requested bytes if fully cached, None otherwise.

Return type:

bytes | None

read(size=-1)#

Read up to size bytes from the current position.

If the current position is within a cached region, returns available bytes (may be fewer than requested if the region ends before size bytes). If the current position is not in any cached region (a gap), returns b””.

This allows callers to detect partial hits and fetch missing data: - len(result) == size: fully satisfied - 0 < len(result) < size: partial hit, more data may be needed - len(result) == 0: gap at current position, caller should fetch

Parameters:: size (int) – Maximum number of bytes to read. -1 means read to end of file.
Returns:: Bytes read from cached regions, or b”” if at a gap or past EOF.
Return type:: bytes

readable()#

Return True - this buffer supports reading.

Return type:: bool

property regions: list[tuple[int, int]]#

List of (start, end) byte ranges currently cached.

End is exclusive. Useful for fetch planning and debugging.

Return type:: list[tuple[int, int]]

seek(offset, whence=0)#

Move the read position.

Parameters:

offset (int) – Byte offset relative to the position indicated by whence.
whence (int) – 0=SEEK_SET (start), 1=SEEK_CUR (current), 2=SEEK_END (end).

Returns:

The new absolute position.

Return type:

int

seekable()#

Return True - this buffer supports seeking.

Return type:: bool

property size: int#

Total size of the virtual file.

Return type:: int

tell()#

Return the current read position.

Return type:: int

writable()#

Return False - this buffer is read-only.

Return type:: bool

roboto.storage.as_io_bytes(reader)#

Cast an HttpRangeReader to typing.IO[bytes] for type-checking purposes.

Parameters:: reader (HttpRangeReader)
Return type:: IO[bytes]