Protocols and duck-typing when using obspec-utils¶
This guide describes obspec-util's philosophy around protocols and duck-typing and provides recommendations for downstream users, such as VirtualiZarr parsers.
obspec's Recommended Approach¶
obspec uses independent protocols rather than a monolithic interface. Obspec-util's adoption of that philosophy can be summarized as:
- Compose flat, independent protocols for each use case
- Don't force unnecessary capabilities — requiring fewer operations means more backend compatibility
- Avoid hierarchical tiers — they create artificial coupling between unrelated capabilities
The short summary for VirtualiZarr parsers is that we recommend each parser should define exactly the protocols it needs:
from typing import Protocol
from obspec import Get, GetAsync, GetRange, GetRangeAsync, GetRanges, GetRangesAsync, Head, HeadAsync, List, ListAsync
# Kerchunk - truly minimal
class KerchunkProtocol(Get, GetAsync, Protocol):
"""Fetch whole objects only."""
# HDF5 - range requests + file size
class HDF5Protocol(GetRange, GetRangeAsync, Head, HeadAsync, Protocol):
"""Random access with metadata."""
# Zarr - enumeration + file size
class ZarrProtocol(List, ListAsync, Head, HeadAsync, Protocol):
"""Chunk discovery and size detection."""
# COG - parallel ranges + file size
class COGProtocol(GetRange, GetRangeAsync, GetRanges, GetRangesAsync, Head, HeadAsync, Protocol):
"""Parallel tile fetching."""
obspec-utils Internal Design¶
obspec-utils uses two patterns for protocol requirements:
Readers: Nested Store Protocols¶
Each reader defines its own nested Store protocol with exactly what it needs:
class BufferedStoreReader:
class Store(Get, GetRange, Protocol):
"""Requires Get + GetRange."""
pass
class EagerStoreReader:
class Store(Get, GetRanges, Protocol):
"""Requires Get + GetRanges (+ optional Head)."""
pass
Wrappers: Internal ReadableStore¶
Transparent proxy wrappers (CachingReadableStore, TracingReadableStore, SplittingReadableStore) share an internal ReadableStore protocol since they all need the same full read interface:
# Internal to obspec-utils (not exported)
class ReadableStore(Get, GetAsync, GetRange, GetRangeAsync, GetRanges, GetRangesAsync, Protocol):
"""Full read interface for transparent store wrappers."""
External consumers should compose their own protocols from obspec.
Generic Registry Design¶
The registry is generic with Get as the bound, allowing callers to specify their exact protocol requirements:
from typing import TypeVar, Generic
from obspec import Get
T = TypeVar("T", bound=Get)
class ObjectStoreRegistry(Generic[T]):
def __init__(self, stores: dict[Url, T] | None = None) -> None: ...
def register(self, url: Url, store: T) -> None: ...
def resolve(self, url: Url) -> tuple[T, Path]: ...
Usage with parser-specific protocols:
# Zarr workflow
registry: ObjectStoreRegistry[ZarrProtocol] = ObjectStoreRegistry({
"s3://bucket": s3_store,
})
store, path = registry.resolve(url) # store: ZarrProtocol
store.list(path) # OK
store.head(path) # OK
# Kerchunk workflow - less restrictive
registry: ObjectStoreRegistry[Get] = ObjectStoreRegistry({
"https://cdn.example.com": http_store, # Only needs Get
})
Why Not Protocol Tiers?¶
A tiered approach (MinimalStore → ReadableStore → ListableStore) creates artificial coupling:
| Tier approach | Problem |
|---|---|
ReadableStore bundles GetRange + GetRanges + Head |
Some range readers don't need Head (size passed explicitly) |
ReadableStore requires GetRanges |
Some backends only support single GetRange |
ListableStore requires all of ReadableStore |
ZarrParser needs List + Head, not GetRanges |
Flat composition avoids these issues — each protocol includes only what's actually needed.
Options for downstream users¶
Runtime Validation¶
Since Protocol isinstance() checks are unreliable, parsers should validate at call time:
class ZarrParser:
def __call__(self, url: str, registry: ObjectStoreRegistry) -> ManifestStore:
store, _ = registry.resolve(url)
if not (hasattr(store, "list") and hasattr(store, "head")):
raise TypeError(
f"ZarrParser requires List + Head protocols. "
f"{type(store).__name__} is missing required methods."
)
# ... proceed
We also recommend using static type checkers.
Escape Hatches¶
Provide parameters to reduce requirements where desired:
class ZarrParser:
def __init__(self, consolidated_metadata: dict | None = None):
self.consolidated_metadata = consolidated_metadata # Skip List requirement
class HDF5Parser:
def __init__(self, file_size: int | None = None):
self.file_size = file_size # Skip Head requirement
Backwards Compatibility¶
Can VirtualiZarr depend on obspec-utils without parser changes?
At runtime, resolve() returns the actual store object (e.g., S3Store), which has all methods. Type hints only affect static analysis.
| Layer | Behavior | Parser changes needed? |
|---|---|---|
| Runtime | Stores have all methods | No |
| Static typing | Type checkers see declared protocol | Depends on approach |
Migration Path¶
- Immediate: Duck typing — no changes, works at runtime, type checkers complain
- Incremental: Type-ignore pragmas —
store.list(path) # type: ignore[attr-defined] - Full type safety: Generic registry with parser-specific protocols
VirtualiZarr Implementation Guide¶
VirtualiZarr parsers should define their protocol requirements in VirtualiZarr, not in obspec-utils. This keeps obspec-utils minimal and lets VirtualiZarr evolve its requirements independently.
Defining Parser Protocols¶
In virtualizarr/parsers/protocols.py:
from typing import Protocol
from obspec import Get, GetAsync, GetRange, GetRangeAsync, Head, HeadAsync, List, ListAsync
class KerchunkStore(Get, GetAsync, Protocol):
"""Store protocol for Kerchunk-based parsers (pre-indexed offsets)."""
pass
class HDF5Store(GetRange, GetRangeAsync, Head, HeadAsync, Protocol):
"""Store protocol for HDF5 parsing (random access + file size)."""
pass
class ZarrStore(List, ListAsync, Head, HeadAsync, Protocol):
"""Store protocol for Zarr parsing (chunk discovery + sizes)."""
pass
Using Protocols in Parsers¶
Each parser uses its protocol for type hints and validates at runtime:
# virtualizarr/parsers/zarr.py
from typing import Protocol
from obspec import List, ListAsync, Head, HeadAsync
from obspec_utils import ObjectStoreRegistry
class ZarrStore(List, ListAsync, Head, HeadAsync, Protocol):
"""Store protocol for Zarr parsing."""
pass
class ZarrParser:
def __call__(
self,
url: str,
registry: ObjectStoreRegistry[ZarrStore],
) -> ManifestStore:
store, path = registry.resolve(url)
# Runtime validation with clear error message
missing = []
if not hasattr(store, "list"):
missing.append("List")
if not hasattr(store, "head"):
missing.append("Head")
if missing:
raise TypeError(
f"ZarrParser requires {', '.join(missing)} protocols. "
f"{type(store).__name__} does not support these operations. "
"Use S3Store, LocalStore, or another store with listing support."
)
# Type checker knows store has list() and head()
chunks = store.list(path)
# ...
Creating Typed Registries¶
Users create registries with the appropriate protocol for their workflow:
# For Zarr workflows
from virtualizarr.parsers.protocols import ZarrStore
registry: ObjectStoreRegistry[ZarrStore] = ObjectStoreRegistry({
"s3://my-bucket": S3Store(bucket="my-bucket"),
})
# Type checker enforces that only ZarrStore-compatible stores are registered
# and that resolved stores have list() and head() methods
Nested Store Protocol Pattern¶
Following obspec-utils' reader pattern, parsers can define their protocol as a nested class:
class ZarrParser:
class Store(List, ListAsync, Head, HeadAsync, Protocol):
"""Store protocol required by ZarrParser."""
pass
def __call__(
self,
url: str,
registry: ObjectStoreRegistry["ZarrParser.Store"],
) -> ManifestStore:
# ...
This is self-documenting — the protocol is defined alongside the parser that requires it.
Summary¶
- Flat composition over tiers — each consumer defines exactly the protocols it needs
- Generic registry with Get bound
- obspec-utils internal patterns:
- Readers use nested
Storeprotocols (each with specific requirements) - Wrappers share internal
ReadableStore - External consumers (like VirtualiZarr) should compose protocols from obspec directly
- Runtime validation in parsers with clear error messages
- Escape hatches where feasible (
file_size,consolidated_metadata) - Backwards compatible — duck typing works immediately; generics for full type safety