Matrix multiplication#

Matrix multiplication is a binary operation that produces a matrix from two matrices. Computing matrix products is a central operation in all computational applications of linear algebra.

In this tutorial we will see how matrix multiplication works in ironArray while showing how to significantly improve its resource consumption with the use of a simple optimization function.

First, let us create two chunked, compressed matrices using ironArray. We will choose one of the matrices to be on disk and the other in memory just to show that mixing storage in operands is not a problem:

%load_ext memprofiler
import iarray as ia
import numpy as np
import os

nrows = 100_000  # number of rows in first matrix
ncols = 25_000   # number of columns in first matrix
ncols2 = 1000    # number of columns in second matrix

shape = (nrows, ncols, ncols2)
amshape = (shape[0], shape[1])
bmshape = (shape[1], shape[2])

filename = "arr-gemm.iarr"
if os.path.exists(filename):
    am =
    am = ia.random.normal(amshape, 3, 2, urlpath=filename, fp_mantissa_bits=20)

w = np.ones(bmshape)
bm = ia.numpy2iarray(w)
type   : IArray
shape  : (100000, 25000)
chunks : (4116, 4116)
blocks : (147, 147)
cratio : 2.85

type   : IArray
shape  : (25000, 1000)
chunks : (8192, 256)
blocks : (512, 32)
cratio : 461.52

memprofiler: used 199.70 MiB RAM (peak of 215.68 MiB) in 0.0779 s, total RAM usage 407.97 MiB

Unlike numpy, ironArray performs the matrix multiplication using blocks. In this way, instead of decompressing all the data from the two arrays, we only have to decompress the data needed in each block operation (allowing larger operations with less memory).

Before proceeding, let’s evict the just created file from the OS page cache, so as to better assess an actual out of core evaluation from disk.

!vmtouch -e $filename
           Files: 1
     Directories: 0
   Evicted Pages: 2035023 (7G)
         Elapsed: 3.6e-05 seconds

If we use the operator @ to perform the multiplications in ironArray, all the params of the output array will be those defined in the global configuration.

%%mprof_run matmul
cm = am @ bm
shape(100000, 1000)
chunks(16384, 128)
blocks(1024, 16)
memprofiler: used 270.12 MiB RAM (peak of 3538.72 MiB) in 206.2146 s, total RAM usage 678.25 MiB

As said, as the chunk and the block shapes are not defined in the global configuration, they are automatically defined in the output array.

%mprof_plot matmul -t "Matrix multiplication computation"