Skip to content

Readers

obspec_utils.readers.BufferedStoreReader

A file-like reader with buffered on-demand reads.

This class provides a file-like interface (read, seek, tell) on top of any object store. The reader uses get_range() calls to fetch data on-demand, with optional read-ahead buffering for efficiency.

When to Use

Use BufferedStoreReader when:

  • Sequential reading with rare backward seeks: Best for workloads that mostly read forward through a file with rare backward seeks.
  • Simple use cases: When you need a basic file-like interface without caching or concurrent fetching.
  • Streaming data: Processing data as it arrives without loading the full file into memory.

Consider alternatives when:

See Also

closed property

closed: bool

Return True if the reader has been closed.

Store

Bases: Get, GetRange, Head, Protocol

Store protocol required by BufferedStoreReader.

Combines Get, GetRange, and Head from obspec.

__enter__

__enter__() -> 'BufferedStoreReader'

Enter the context manager.

__exit__

__exit__(exc_type, exc_val, exc_tb) -> None

Exit the context manager and close the reader.

__init__

__init__(store: Store, path: str, buffer_size: int = 1024 * 1024) -> None

Create a file-like reader for any object store.

Parameters:

  • store (Store) –

    Any object implementing Get and GetRange.

  • path (str) –

    The path to the file within the store.

  • buffer_size (int, default: 1024 * 1024 ) –

    Read-ahead buffer size in bytes. When reading, up to this many bytes may be fetched ahead to reduce the number of requests.

close

close() -> None

Close the reader and release the read-ahead buffer.

read

read(size: int = -1) -> bytes

Read up to size bytes from the file.

Parameters:

  • size (int, default: -1 ) –

    Number of bytes to read. If -1, read from current position to end.

Returns:

  • bytes

    The data read from the file.

readable

readable() -> bool

Return True, indicating this reader supports reading.

readall

readall() -> bytes

Read the entire file.

Returns:

  • bytes

    The complete file contents.

seek

seek(offset: int, whence: int = 0) -> int

Move the file position.

Parameters:

  • offset (int) –

    Position offset.

  • whence (int, default: 0 ) –

    Reference point: 0=start (SEEK_SET), 1=current (SEEK_CUR), 2=end (SEEK_END).

Returns:

  • int

    The new absolute position.

seekable

seekable() -> bool

Return True, indicating this reader supports seeking.

tell

tell() -> int

Return the current file position.

Returns:

  • int

    Current position in bytes from start of file.

writable

writable() -> bool

Return False, indicating this reader does not support writing.

obspec_utils.readers.EagerStoreReader

A file-like reader that eagerly loads the entire file into memory.

This reader fetches the complete file on first access and then serves all subsequent reads from the in-memory cache. Useful for files that will be read multiple times or when seeking is frequent.

By default, the file is fetched using concurrent range requests via get_ranges(), which can significantly improve load time for large files. The defaults (12 MB request size, max 18 concurrent requests) are tuned for cloud storage. The file size is determined automatically via a HEAD request.

The concurrent fetching strategy is based on Icechunk's approach: github.com/earth-mover/icechunk/blob/main/icechunk/src/storage/mod.rs

When to Use

Use EagerStoreReader when:

  • Reading the entire file: When you know you'll need most or all of the file's contents.
  • Repeated random access: After the initial load, any byte is accessible with no network latency.
  • Small to medium files: Files that fit comfortably in memory.
  • Concurrent initial fetch: The default settings use concurrent requests for faster download on cloud storage.

Consider alternatives when:

See Also

closed property

closed: bool

Return True if the reader has been closed.

Store

Bases: Get, GetRanges, Head, Protocol

Store protocol required by EagerStoreReader.

Combines Get, GetRanges, and Head from obspec.

__enter__

__enter__() -> 'EagerStoreReader'

Enter the context manager.

__exit__

__exit__(exc_type, exc_val, exc_tb) -> None

Exit the context manager and close the reader.

__init__

__init__(
    store: Store,
    path: str,
    request_size: int = 12 * 1024 * 1024,
    file_size: int | None = None,
    max_concurrent_requests: int = 18,
) -> None

Create an eager reader that loads the entire file into memory.

The file is fetched immediately and cached in memory.

Parameters:

  • store (Store) –

    Any object implementing Get, GetRanges, and Head.

  • path (str) –

    The path to the file within the store.

  • request_size (int, default: 12 * 1024 * 1024 ) –

    Target size for each concurrent range request in bytes. Default is 12 MB, tuned for cloud storage throughput. The file will be divided into parts of this size and fetched using get_ranges().

  • file_size (int | None, default: None ) –

    File size in bytes. If not provided, the size is determined via store.head(). Pass this to skip the HEAD request if you already know the file size.

  • max_concurrent_requests (int, default: 18 ) –

    Maximum number of concurrent range requests. Default is 18. If the file would require more requests than this, request sizes are increased to fit within this limit.

close

close() -> None

Close the reader and release the in-memory buffer.

read

read(size: int = -1) -> bytes

Read up to size bytes from the cached file.

readable

readable() -> bool

Return True, indicating this reader supports reading.

readall

readall() -> bytes

Read the entire cached file.

seek

seek(offset: int, whence: int = 0) -> int

Move the file position within the cached data.

seekable

seekable() -> bool

Return True, indicating this reader supports seeking.

tell

tell() -> int

Return the current position in the cached file.

writable

writable() -> bool

Return False, indicating this reader does not support writing.

obspec_utils.readers.BlockStoreReader

A file-like reader that uses concurrent range requests for efficient block fetching.

This reader divides the file into fixed-size blocks and uses get_ranges() to fetch multiple blocks with concurrency. An LRU cache stores recently accessed blocks to avoid redundant fetches.

This is particularly efficient for workloads that access multiple non-contiguous regions of a file.

When to Use

Use BlockStoreReader when:

  • Sparse access patterns: Reading many non-contiguous regions of a file.
  • Large files with partial reads: When you only need portions of a large file and don't want to load it all into memory.
  • Memory-constrained environments: The LRU cache has bounded memory usage (block_size * max_cached_blocks), regardless of file size.
  • Unknown access patterns: When you don't know upfront which parts of the file you'll need.

Consider alternatives when:

See Also

closed property

closed: bool

Return True if the reader has been closed.

Store

Bases: Get, GetRanges, Head, Protocol

Store protocol required by BlockStoreReader.

Combines Get, GetRanges, and Head from obspec.

__enter__

__enter__() -> 'BlockStoreReader'

Enter the context manager.

__exit__

__exit__(exc_type, exc_val, exc_tb) -> None

Exit the context manager and close the reader.

__init__

__init__(
    store: Store,
    path: str,
    block_size: int = 1024 * 1024,
    max_cached_blocks: int = 64,
) -> None

Create a block-based reader with LRU caching.

Parameters:

  • store (Store) –

    Any object implementing Get and GetRanges.

  • path (str) –

    The path to the file within the store.

  • block_size (int, default: 1024 * 1024 ) –

    Size of each block in bytes. Default is 1 MB, tuned for cloud object stores where HTTP request overhead is significant. Smaller blocks mean more granular caching but more requests.

  • max_cached_blocks (int, default: 64 ) –

    Maximum number of blocks to keep in the LRU cache. Default is 64, giving a 64 MB cache with the default block size.

close

close() -> None

Close the reader and release the block cache.

read

read(size: int = -1) -> bytes

Read up to size bytes from the file.

Parameters:

  • size (int, default: -1 ) –

    Number of bytes to read. If -1, read from current position to end.

Returns:

  • bytes

    The data read from the file.

readable

readable() -> bool

Return True, indicating this reader supports reading.

readall

readall() -> bytes

Read the entire file.

Returns:

  • bytes

    The complete file contents.

seek

seek(offset: int, whence: int = 0) -> int

Move the file position.

Parameters:

  • offset (int) –

    Position offset.

  • whence (int, default: 0 ) –

    Reference point: 0=start (SEEK_SET), 1=current (SEEK_CUR), 2=end (SEEK_END).

Returns:

  • int

    The new absolute position.

seekable

seekable() -> bool

Return True, indicating this reader supports seeking.

tell

tell() -> int

Return the current file position.

Returns:

  • int

    Current position in bytes from start of file.

writable

writable() -> bool

Return False, indicating this reader does not support writing.