Debugging Slow Data Access¶

This guide shows how to understand what network requests your code is making when data access is slower than expected.

Tracing Xarray Operations¶

Wrap your store with TracingReadableStore to see what requests are made when opening a dataset:

import xarray as xr
from obstore.store import HTTPStore
from obspec_utils.wrappers import TracingReadableStore, RequestTrace
from obspec_utils.readers import EagerStoreReader

# Access sample NetCDF files over HTTP
store = HTTPStore.from_url("https://raw.githubusercontent.com/pydata/xarray-data/refs/heads/master/")

trace = RequestTrace()
traced_store = TracingReadableStore(store, trace)

path = "air_temperature.nc"

with EagerStoreReader(traced_store, path) as reader:
    ds = xr.open_dataset(reader, engine="scipy")
    var_names = list(ds.data_vars)

summary = trace.summary()
print(f"Opening dataset required:")
print(f"  {summary['total_requests']} request(s)")
print(f"  {summary['total_bytes'] / 1e6:.2f} MB transferred")
print(f"Variables found: {var_names}")

Opening dataset required:
  2 request(s)
  7.75 MB transferred
Variables found: ['air']

The RequestTrace collects information about each request, including byte ranges, timing, and request method. Use summary() for quick statistics or access individual RequestRecord objects via trace.requests.

Common Patterns to Look For¶

When analyzing traces, watch for:

Pattern	Symptom	Solution
Many small requests	High request count, low bytes per request	Use `EagerStoreReader` to fetch full file or `BlockStoreReader` to fetch and cache larger blocks
Duplicate requests	Same file/range requested multiple times	Add `CachingReadableStore`
Sequential tiny reads	Many requests with incrementing offsets	Increase buffer size or use eager loading