ironArray supports a broad range of reduction facilities, like
mean and others. Also, they work on any (or group of) dimensions. One interesting aspect of these is that the implementation leverages the multi-threading capabilities of ironArray, so they can be pretty fast (although sometimes they need some help from the user).
In order to exercise some of this functionality, let’s use the precipitation data from a period of 3 months; the uncompressed size of this dataset is about 9 GB. In case the data has not been downloaded yet, this should not take more than a couple of minutes. Let’s go:
%load_ext memprofiler import iarray as ia
Dataset precip-3m.iarr is already here!
!du -sh precip-3m*.iarr
1,1G precip-3m-optimal.iarr 672M precip-3m.iarr
The whole dataset is stored now on a single file of less than 1 GB, which is about 10x less than the original dataset thanks to compression. That’s a big win! In addition, there is an assortment of other, smaller files for the purposes of tutorials.
Ok. Now, let’s import this data into ironArray before proceeding with reductions:
%%time ia_precip = ia.load("precip-3m.iarr") print(ia_precip) print("cratio: ", round(ia_precip.cratio, 2))
<IArray (3, 720, 721, 1440) np.float32> cratio: 15.43 CPU times: user 968 µs, sys: 173 ms, total: 174 ms Wall time: 173 ms
Ok, so ironArray achieves a compression ratio of more than 10x, which is a big win in terms of memory consumption. Now, let’s have a look at how reduction works:
%%mprof_run 0.mean reduc0 = ia.mean(ia_precip, axis=(0, 2, 3)).data
memprofiler: used 37.58 MiB RAM (peak of 1218.75 MiB) in 5.7069 s, total RAM usage 929.01 MiB
Ok, so that’s pretty slow. Now, it is time to remember that ironArray uses chunked storage, even when it holds data in-memory. In this case, we have been traversing the array in a very innefficient way. In general, in chunked storage, it is always better to start reducing by the dimension that is the largest, and we took the inverse order. With this in mind, let’s try with a more reasonable order:
%%mprof_run 1.reorder_mean reduc0 = ia.mean(ia_precip, axis=(3, 2, 0)).data
memprofiler: used 13.26 MiB RAM (peak of 13.72 MiB) in 0.6701 s, total RAM usage 942.36 MiB
%mprof_plot 0.mean 1.reorder_mean