Debugging Slow Data Access¶
This guide shows how to understand what network requests your code is making when data access is slower than expected.
Tracing Xarray Operations¶
Wrap your store with TracingReadableStore to see what requests are made when opening a dataset:
import xarray as xr
from obstore.store import HTTPStore
from obspec_utils.wrappers import TracingReadableStore, RequestTrace
from obspec_utils.readers import EagerStoreReader
# Access sample NetCDF files over HTTP
store = HTTPStore.from_url("https://raw.githubusercontent.com/pydata/xarray-data/refs/heads/master/")
trace = RequestTrace()
traced_store = TracingReadableStore(store, trace)
path = "air_temperature.nc"
with EagerStoreReader(traced_store, path) as reader:
ds = xr.open_dataset(reader, engine="scipy")
var_names = list(ds.data_vars)
summary = trace.summary()
print(f"Opening dataset required:")
print(f" {summary['total_requests']} request(s)")
print(f" {summary['total_bytes'] / 1e6:.2f} MB transferred")
print(f"Variables found: {var_names}")
Opening dataset required:
2 request(s)
7.75 MB transferred
Variables found: ['air']
The RequestTrace collects information about each request, including byte ranges, timing, and request method. Use summary() for quick statistics or access individual RequestRecord objects via trace.requests.
Common Patterns to Look For¶
When analyzing traces, watch for:
| Pattern | Symptom | Solution |
|---|---|---|
| Many small requests | High request count, low bytes per request | Use EagerStoreReader to fetch full file or BlockStoreReader to fetch and cache larger blocks |
| Duplicate requests | Same file/range requested multiple times | Add CachingReadableStore |
| Sequential tiny reads | Many requests with incrementing offsets | Increase buffer size or use eager loading |