Reductions

ironArray supports a broad range of reduction facilities, like sum, min, max, mean and others. Also, they work on any (or group of) dimensions. One interesting aspect of these is that the implementation leverages the multi-threading capabilities of ironArray, so they can be pretty fast (although sometimes they need some help from the user).

In order to exercise some of this functionality, let’s use the precipitation data from a period of 3 months. In case the data has not been downloaded yet, this should not take more than a couple of minutes. Let’s go:

[1]:
%load_ext memprofiler

import iarray as ia
[2]:
%run fetch_data.py
Dataset precip-3m.iarr is already here!
[3]:
!du -sh precip-3m*.iarr
1,1G    precip-3m-optimal.iarr
809M    precip-3m.iarr

The whole dataset is stored now on a single file of less than 1 GB, which is about 10x less than the original dataset thanks to compression. That’s a big win! In addition, there is an assortment of other, smaller files for the purposes of tutorials.

Ok. Now, let’s import this data into ironArray before proceeding with reductions:

[4]:
%%mprof_run

ia_precip = ia.load("precip-3m.iarr")
print(ia_precip)
print("cratio: ", round(ia_precip.cratio, 2))
<IArray (3, 720, 721, 1440) np.float32>
cratio:  12.82
memprofiler: used 809.81 MiB RAM (peak of 809.81 MiB) in 0.2004 s, total RAM usage 1032.87 MiB

Ok, so ironArray achieves a compression ratio of more than 10x, which is a big win in terms of memory consumption. Now, let’s have a look at how reduction works:

[5]:
%%mprof_run mean

reduc0 = ia.mean(ia_precip, axis=(0, 2, 3)).data
memprofiler: used 40.75 MiB RAM (peak of 1204.70 MiB) in 5.1008 s, total RAM usage 1073.63 MiB

Ok, so that’s pretty slow. Now, it is time to remember that ironArray uses chunked storage, even when it holds data in-memory. In this case, we have been traversing the array in a very innefficient way. In general, in chunked storage, it is always better to start reducing by the dimension that is the largest, and we took the inverse order. With this in mind, let’s try with a more reasonable order:

[6]:
%%mprof_run reorder_mean

reduc0 = ia.mean(ia_precip, axis=(3, 2, 0)).data
memprofiler: used 1.30 MiB RAM (peak of 1.55 MiB) in 0.6539 s, total RAM usage 1075.16 MiB
[7]:
%mprof_plot mean reorder_mean