Reductions#

ironArray supports a broad range of reduction facilities, like sum, min, max, mean and others. Also, they work on any (or group of) dimensions. One interesting aspect of these is that the implementation leverages the multi-threading capabilities of ironArray, so they can be pretty fast (although sometimes they need some help from the user).

In order to exercise some of this functionality, let’s use the precipitation data from a period of 3 months; the uncompressed size of this dataset is about 9 GB. In case the data has not been downloaded yet, this should not take more than a couple of minutes. Let’s go:

[1]:
%load_ext memprofiler

import iarray as ia
[2]:
%run fetch_data.py
Dataset precip-3m.iarr is already here!
[3]:
!du -sh precip-3m*.iarr
1,1G    precip-3m-optimal.iarr
672M    precip-3m.iarr

The whole dataset is stored now on a single file of less than 1 GB, which is about 10x less than the original dataset thanks to compression. That’s a big win! In addition, there is an assortment of other, smaller files for the purposes of tutorials.

Ok. Now, let’s import this data into ironArray before proceeding with reductions:

[4]:
%%time

ia_precip = ia.load("precip-3m.iarr")
print(ia_precip)
print("cratio: ", round(ia_precip.cratio, 2))
<IArray (3, 720, 721, 1440) np.float32>
cratio:  15.43
CPU times: user 968 µs, sys: 173 ms, total: 174 ms
Wall time: 173 ms

Ok, so ironArray achieves a compression ratio of more than 10x, which is a big win in terms of memory consumption. Now, let’s have a look at how reduction works:

[5]:
%%mprof_run 0.mean

reduc0 = ia.mean(ia_precip, axis=(0, 2, 3)).data
memprofiler: used 37.58 MiB RAM (peak of 1218.75 MiB) in 5.7069 s, total RAM usage 929.01 MiB

Ok, so that’s pretty slow. Now, it is time to remember that ironArray uses chunked storage, even when it holds data in-memory. In this case, we have been traversing the array in a very innefficient way. In general, in chunked storage, it is always better to start reducing by the dimension that is the largest, and we took the inverse order. With this in mind, let’s try with a more reasonable order:

[6]:
%%mprof_run 1.reorder_mean

reduc0 = ia.mean(ia_precip, axis=(3, 2, 0)).data
memprofiler: used 13.26 MiB RAM (peak of 13.72 MiB) in 0.6701 s, total RAM usage 942.36 MiB
[7]:
%mprof_plot 0.mean 1.reorder_mean