Readers

obspec_utils.readers.BufferedStoreReader ¶

A file-like reader with buffered on-demand reads.

This class provides a file-like interface (read, seek, tell) on top of any object store. The reader uses get_range() calls to fetch data on-demand, with optional read-ahead buffering for efficiency.

When to Use

Use BufferedStoreReader when:

Sequential reading with rare backward seeks: Best for workloads that mostly read forward through a file with rare backward seeks.
Simple use cases: When you need a basic file-like interface without caching or concurrent fetching.
Streaming data: Processing data as it arrives without loading the full file into memory.

Consider alternatives when:

You need to read the entire file anyway → use EagerStoreReader
You have many non-contiguous reads → use BlockStoreReader
You'll repeatedly access the same regions → use EagerStoreReader or BlockStoreReader

closed `property` ¶

closed: bool

Return True if the reader has been closed.

Store ¶

Bases: Get, GetRange, Head, Protocol

Store protocol required by BufferedStoreReader.

Combines Get, GetRange, and Head from obspec.

enter ¶

__enter__() -> 'BufferedStoreReader'

Enter the context manager.

exit ¶

__exit__(exc_type, exc_val, exc_tb) -> None

Exit the context manager and close the reader.

init ¶

__init__(store: Store, path: str, buffer_size: int = 1024 * 1024) -> None

Create a file-like reader for any object store.

Parameters:

store (Store) –

Any object implementing Get and GetRange.
path (str) –

The path to the file within the store.
buffer_size (int, default: 1024 * 1024 ) –

Read-ahead buffer size in bytes. When reading, up to this many bytes may be fetched ahead to reduce the number of requests.

close ¶

close() -> None

Close the reader and release the read-ahead buffer.

read ¶

read(size: int = -1) -> bytes

Read up to size bytes from the file.

Parameters:

size (int, default: -1 ) –

Number of bytes to read. If -1, read from current position to end.

Returns:

bytes –

The data read from the file.

readable ¶

readable() -> bool

Return True, indicating this reader supports reading.

readall ¶

readall() -> bytes

Read the entire file.

Returns:

bytes –

The complete file contents.

seek ¶

seek(offset: int, whence: int = 0) -> int

Move the file position.

Parameters:

offset (int) –

Position offset.
whence (int, default: 0 ) –

Reference point: 0=start (SEEK_SET), 1=current (SEEK_CUR), 2=end (SEEK_END).

Returns:

int –

The new absolute position.

seekable ¶

seekable() -> bool

Return True, indicating this reader supports seeking.

tell ¶

tell() -> int

Return the current file position.

Returns:

int –

Current position in bytes from start of file.

writable ¶

writable() -> bool

Return False, indicating this reader does not support writing.

obspec_utils.readers.EagerStoreReader ¶

A file-like reader that eagerly loads the entire file into memory.

This reader fetches the complete file on first access and then serves all subsequent reads from the in-memory cache. Useful for files that will be read multiple times or when seeking is frequent.

By default, the file is fetched using concurrent range requests via get_ranges(), which can significantly improve load time for large files. The defaults (12 MB request size, max 18 concurrent requests) are tuned for cloud storage. The file size is determined automatically via a HEAD request.

The concurrent fetching strategy is based on Icechunk's approach: github.com/earth-mover/icechunk/blob/main/icechunk/src/storage/mod.rs

When to Use

Use EagerStoreReader when:

Reading the entire file: When you know you'll need most or all of the file's contents.
Repeated random access: After the initial load, any byte is accessible with no network latency.
Small to medium files: Files that fit comfortably in memory.
Concurrent initial fetch: The default settings use concurrent requests for faster download on cloud storage.

Consider alternatives when:

You only need a small portion of a large file → use BlockStoreReader
Memory is constrained → use BlockStoreReader (bounded cache) or BufferedStoreReader
You're streaming sequentially and won't revisit data → use BufferedStoreReader

closed `property` ¶

closed: bool

Return True if the reader has been closed.

Store ¶

Bases: Get, GetRanges, Head, Protocol

Store protocol required by EagerStoreReader.

Combines Get, GetRanges, and Head from obspec.

enter ¶

__enter__() -> 'EagerStoreReader'

Enter the context manager.

exit ¶

__exit__(exc_type, exc_val, exc_tb) -> None

Exit the context manager and close the reader.

init ¶

__init__(
    store: Store,
    path: str,
    request_size: int = 12 * 1024 * 1024,
    file_size: int | None = None,
    max_concurrent_requests: int = 18,
) -> None

Create an eager reader that loads the entire file into memory.

The file is fetched immediately and cached in memory.

Parameters:

store (Store) –

Any object implementing Get, GetRanges, and Head.
path (str) –

The path to the file within the store.
request_size (int, default: 12 * 1024 * 1024 ) –

Target size for each concurrent range request in bytes. Default is 12 MB, tuned for cloud storage throughput. The file will be divided into parts of this size and fetched using get_ranges().
file_size (int | None, default: None ) –

File size in bytes. If not provided, the size is determined via store.head(). Pass this to skip the HEAD request if you already know the file size.
max_concurrent_requests (int, default: 18 ) –

Maximum number of concurrent range requests. Default is 18. If the file would require more requests than this, request sizes are increased to fit within this limit.

close ¶

close() -> None

Close the reader and release the in-memory buffer.

read ¶

read(size: int = -1) -> bytes

Read up to size bytes from the cached file.

readable ¶

readable() -> bool

Return True, indicating this reader supports reading.

readall ¶

readall() -> bytes

Read the entire cached file.

seek ¶

seek(offset: int, whence: int = 0) -> int

Move the file position within the cached data.

seekable ¶

seekable() -> bool

Return True, indicating this reader supports seeking.

tell ¶

tell() -> int

Return the current position in the cached file.

writable ¶

writable() -> bool

Return False, indicating this reader does not support writing.

obspec_utils.readers.BlockStoreReader ¶

A file-like reader that uses concurrent range requests for efficient block fetching.

This reader divides the file into fixed-size blocks and uses get_ranges() to fetch multiple blocks with concurrency. An LRU cache stores recently accessed blocks to avoid redundant fetches.

This is particularly efficient for workloads that access multiple non-contiguous regions of a file.

When to Use

Use BlockStoreReader when:

Sparse access patterns: Reading many non-contiguous regions of a file.
Large files with partial reads: When you only need portions of a large file and don't want to load it all into memory.
Memory-constrained environments: The LRU cache has bounded memory usage (block_size * max_cached_blocks), regardless of file size.
Unknown access patterns: When you don't know upfront which parts of the file you'll need.

Consider alternatives when:

You'll read the entire file anyway → use EagerStoreReader
Access is purely sequential → use BufferedStoreReader
You need repeated access to more data than fits in the cache → use EagerStoreReader to avoid re-fetching evicted blocks

closed `property` ¶

closed: bool

Return True if the reader has been closed.

Store ¶

Bases: Get, GetRanges, Head, Protocol

Store protocol required by BlockStoreReader.

Combines Get, GetRanges, and Head from obspec.

enter ¶

__enter__() -> 'BlockStoreReader'

Enter the context manager.

exit ¶

__exit__(exc_type, exc_val, exc_tb) -> None

Exit the context manager and close the reader.

init ¶

__init__(
    store: Store,
    path: str,
    block_size: int = 1024 * 1024,
    max_cached_blocks: int = 64,
) -> None

Create a block-based reader with LRU caching.

Parameters:

store (Store) –

Any object implementing Get and GetRanges.
path (str) –

The path to the file within the store.
block_size (int, default: 1024 * 1024 ) –

Size of each block in bytes. Default is 1 MB, tuned for cloud object stores where HTTP request overhead is significant. Smaller blocks mean more granular caching but more requests.
max_cached_blocks (int, default: 64 ) –

Maximum number of blocks to keep in the LRU cache. Default is 64, giving a 64 MB cache with the default block size.

close ¶

close() -> None

Close the reader and release the block cache.

read ¶

read(size: int = -1) -> bytes

Read up to size bytes from the file.

Parameters:

size (int, default: -1 ) –

Number of bytes to read. If -1, read from current position to end.

Returns:

bytes –

The data read from the file.

readable ¶

readable() -> bool

Return True, indicating this reader supports reading.

readall ¶

readall() -> bytes

Read the entire file.

Returns:

bytes –

The complete file contents.

seek ¶

seek(offset: int, whence: int = 0) -> int

Move the file position.

Parameters:

offset (int) –

Position offset.
whence (int, default: 0 ) –

Reference point: 0=start (SEEK_SET), 1=current (SEEK_CUR), 2=end (SEEK_END).

Returns:

int –

The new absolute position.

seekable ¶

seekable() -> bool

Return True, indicating this reader supports seeking.

tell ¶

tell() -> int

Return the current file position.

Returns:

int –

Current position in bytes from start of file.

writable ¶

writable() -> bool

Return False, indicating this reader does not support writing.

Readers

obspec_utils.readers.BufferedStoreReader ¶

closed property ¶

Store ¶

__enter__ ¶

__exit__ ¶

__init__ ¶

close ¶

read ¶

readable ¶

readall ¶

seek ¶

seekable ¶

tell ¶

writable ¶

obspec_utils.readers.EagerStoreReader ¶

closed property ¶

Store ¶

__enter__ ¶

__exit__ ¶

__init__ ¶

close ¶

read ¶

readable ¶

readall ¶

seek ¶

seekable ¶

tell ¶

writable ¶

obspec_utils.readers.BlockStoreReader ¶

closed property ¶

Store ¶

__enter__ ¶

__exit__ ¶

__init__ ¶

close ¶

read ¶

readable ¶

readall ¶

seek ¶

seekable ¶

tell ¶

writable ¶

closed `property` ¶

enter ¶

exit ¶

init ¶

closed `property` ¶

enter ¶

exit ¶

init ¶

closed `property` ¶

enter ¶

exit ¶

init ¶