Matrix multiplication is a binary operation that produces a matrix from two matrices. Computing matrix products is a central operation in all computational applications of linear algebra.
In this tutorial we will see how matrix multiplication works in ironArray while showing how to significantly improve its resource consumption with the use of a simple optimization function.
First, let us create two chunked, compressed matrices using ironArray. We will choose one of the matrices to be on disk and the other in memory just to show that mixing storage in operands is not a problem:
%load_ext memprofiler import iarray as ia import numpy as np import os
%%mprof_run ia.set_config_defaults(dtype=np.float64) nrows = 100_000 # number of rows in first matrix ncols = 25_000 # number of columns in first matrix ncols2 = 1000 # number of columns in second matrix shape = (nrows, ncols, ncols2) amshape = (shape, shape) bmshape = (shape, shape) filename = "arr-gemm.iarr" if os.path.exists(filename): am = ia.open(filename) else: am = ia.random.normal(amshape, 3, 2, urlpath=filename, fp_mantissa_bits=20) print(am.info) w = np.ones(bmshape) bm = ia.numpy2iarray(w) print(bm.info)
type : IArray shape : (100000, 25000) chunks : (4116, 4116) blocks : (147, 147) cratio : 2.85 type : IArray shape : (25000, 1000) chunks : (8192, 256) blocks : (512, 32) cratio : 461.52 memprofiler: used 199.70 MiB RAM (peak of 215.68 MiB) in 0.0779 s, total RAM usage 407.97 MiB
Unlike numpy, ironArray performs the matrix multiplication using blocks. In this way, instead of decompressing all the data from the two arrays, we only have to decompress the data needed in each block operation (allowing larger operations with less memory).
Before proceeding, let’s evict the just created file from the OS page cache, so as to better assess an actual out of core evaluation from disk.
!vmtouch -e $filename
Files: 1 Directories: 0 Evicted Pages: 2035023 (7G) Elapsed: 3.6e-05 seconds
If we use the operator
@ to perform the multiplications in ironArray, all the params of the output array will be those defined in the global configuration.
%%mprof_run matmul cm = am @ bm cm.info
memprofiler: used 270.12 MiB RAM (peak of 3538.72 MiB) in 206.2146 s, total RAM usage 678.25 MiB
As said, as the chunk and the block shapes are not defined in the global configuration, they are automatically defined in the output array.
%mprof_plot matmul -t "Matrix multiplication computation"