Accessing remote Zarr arrays#

In ironArray it is possible to create a proxy of a Zarr array, by just specifying the path to it (either locally or remotely stored). The data will only be in the actual Zarr container and will be retrieved only when needed.

To see how this works, we will create an on disk ironArray array from the whole Zarr array already used in the Reductions tutorial, we will then open and slice it to compute some reductions and get an ironArray array from it. Let’s go!

The Zarr proxy array#

We will first create our ironArray array from a Zarr array stored in the cloud by giving its path. Because it is stored in the cloud, we will have to put at the beginning of the path s3://:

[1]:
import iarray as ia

year = 1987
month = 10
datestring = "s3://era5-pds/zarr/{year}/{month:02d}/data/".format(year=year, month=month)
zarr_urlpath = datestring + "precipitation_amount_1hour_Accumulation.zarr/precipitation_amount_1hour_Accumulation"
precip = ia.zarr_proxy(zarr_urlpath, urlpath="precip.iarr", mode="w")
precip.info
[1]:
typeIArray
shape(744, 721, 1440)
chunks(372, 150, 150)
blocks(372, 150, 150)
cratio17.60

As can be seen, we can pass to the constructor different Config properties such as urlpath or mode.

Note that in the info the compression ratio is negative, that is due to the fact that Zarr does not give the compressed size from a remotely stored array. The shape, chunks and data type are retrieved from the original array.

We can see that this proxy has no data on it since the space that takes from the filesystem is:

[2]:
! du -sh "precip.iarr"
4,0K    precip.iarr

So although with this array we have access to a pretty large amount of data, our array only takes around 4 KB.

This opens the door to use external Zarr arrays as if they were native ironArray arrays, so all the computing machinery in ironArray will work seamlessly with Zarr proxies.

Let’s see different operations that can be performed with ironArray on top of Zarr proxies. For example, here it is how we can open and get an slice of a Zarr proxy:

[3]:
precip2 = ia.open("precip.iarr")
print(precip2)
precip_slice = precip2[:300, :400, :500]
print(precip_slice)
<IArray (744, 721, 1440) np.float32>
<IArray (300, 400, 500) np.float32>

In the next sections we will see some more different operations.

Reductions#

Reductions can operate on top of Zarr proxies or slices of them:

[4]:
ia.sum(precip_slice)[()]
[4]:
6429.312
[5]:
ia.prod(precip_slice)[()]
[5]:
0.0

or just to some axis:

[6]:
red_sum = ia.sum(precip_slice, axis=(0, 2))
red_sum
[6]:
<IArray (400,) np.float32>
[7]:
red_prod = ia.prod(precip_slice, axis=(1, 2))
red_prod
[7]:
<IArray (300,) np.float32>

Convert proxy into an ironArray array#

If you are interested in working with the data as a normal ironArray array without overwriting the Zarr array, you can always make a copy of a proxy and that will create a new ironArray array with all the data on it. Let’s do it only for the slice:

[8]:
iarr = precip_slice.copy(urlpath="copy.iarr", mode="w")

! du -sh "copy.iarr"
26M     copy.iarr

As can be seen, this takes a lot more data than the proxy.

Conclusions#

With the Zarr proxy functionality you can access either local or remote Zarr arrays as if they were native ironArray arrays, allowing all the machinery of ironArray on top of them.

The only limitation is that writing is not supported for Zarr proxies. This could be fixed in a future version.