Expression Evaluation (User Defined Functions)#

So far we have seen that ironArray has support for evaluating expressions that are passed as strings or as simple Python statements. There is another, more flexible way for evaluating expressions called User Defined Functions, or UDFs for short.

UDFs are small functions that can be expressed in a simple subset of Python. These functions are then passed to the internal compiler in ironArray and a binary specific and optimized for the local machine is generated. This binary is optimized for the CPU and in addition, it will make use of the available SIMD hardware in the CPU for accelerating transcendental functions.

Let’s see how this works. We will use the same data used in the previous tutorial with the optimal chunks and blocks.

[1]:
%load_ext memprofiler
%matplotlib inline
import iarray as ia
[2]:
%%time
precip1 = ia.load("precip1.iarr", chunks=(360, 128, 1440), blocks=(8, 8, 720))
precip2 = ia.load("precip2.iarr", chunks=(360, 128, 1440), blocks=(8, 8, 720))
precip3 = ia.load("precip3.iarr", chunks=(360, 128, 1440), blocks=(8, 8, 720))
CPU times: user 15.6 s, sys: 4.63 s, total: 20.3 s
Wall time: 7.8 s

Note that the load() and copy functions behave in the same way when it comes to the iarray configuration: if neither a configuration nor kwargs are passed, then the iarray will have the same configuration as the original array.

Now, let’s define a simple function that computes the mean for this data:

[3]:
from iarray.udf import jit, Array, float32

@jit()
def mean(out: Array(float32, 3),
         p1: Array(float32, 3),
         p2: Array(float32, 3),
         p3: Array(float32, 3)) -> int:

    l = p1.window_shape[0]
    m = p1.window_shape[1]
    n = p1.window_shape[2]

    for i in range(l):
        for j in range(m):
            for k in range(n):
                value = p1[i,j,k] + p2[i,j,k] + p3[i,j,k]
                out[i,j,k] = value / 3

    return 0

and create the ironArray expression from this User Defined Function with:

[4]:
%%time
precip_expr = ia.expr_from_udf(mean, [precip1, precip2, precip3])
CPU times: user 23.5 ms, sys: 0 ns, total: 23.5 ms
Wall time: 42.4 ms

As can be seen, converting the user defined function into a native ironArray expression is pretty fast. And as always, in order to do the actual evaluation, we have to call .eval() on the expression:

[5]:
%%mprof_run mean_UDF
precip_mean = precip_expr.eval()
precip_mean
[5]:
<IArray (720, 721, 1440) np.float32>
memprofiler: used 949.07 MiB RAM (peak of 949.07 MiB) in 0.3462 s, total RAM usage 2292.96 MiB

Let’s compare this time with the evaluation via a regular lazy expression:

[6]:
precip_expr2 = (precip1 + precip2 + precip3) / 3
[7]:
%%mprof_run mean_lazy
precip_mean2 = precip_expr2.eval()
precip_mean2
[7]:
<IArray (720, 721, 1440) np.float32>
memprofiler: used 953.45 MiB RAM (peak of 953.45 MiB) in 0.3772 s, total RAM usage 3246.43 MiB

Ok, so the times are very close. It turns out that UDFs compile and execute in ironArray using the very same internal compiler, which explains times being similar. It is up to the user to use one or the other depending on the needs.

Transcendental functions in User Defined Functions#

Now, let’s use expressions with some transcendental functions. This does not make sense for this case (precipitation data), but we are doing this just as an indication of the efficiency of the computational engine inside ironArray:

[8]:
import math

@jit()
def trans(out: Array(float32, 3),
          p1: Array(float32, 3),
          p2: Array(float32, 3),
          p3: Array(float32, 3)) -> int:

    l = p1.window_shape[0]
    m = p1.window_shape[1]
    n = p1.window_shape[2]

    for i in range(l):
        for j in range(m):
            for k in range(n):
                value = math.sin(p1[i,j,k]) * math.sin(p2[i,j,k]) + math.cos(p2[i,j,k])
                value *= math.tan(p1[i,j,k])
                value += math.cosh(p3[i,j,k]) * 2
                out[i,j,k] = value

    return 0
[9]:
%%time
precip_expr = ia.expr_from_udf(trans, [precip1, precip2, precip3])
CPU times: user 18.7 ms, sys: 483 µs, total: 19.1 ms
Wall time: 39.8 ms
[10]:
%%mprof_run trans_UDF
precip_mean = precip_expr.eval()
precip_mean
[10]:
<IArray (720, 721, 1440) np.float32>
memprofiler: used 377.08 MiB RAM (peak of 377.08 MiB) in 0.4519 s, total RAM usage 3626.08 MiB

In this case we see that the overhead of using transcendental functions is pretty low compared with plain arithmetic operations (sum, rest, mult, division…).

Let’s see how a regular lazy expression behaves:

[11]:
%%mprof_run trans_lazy
lazy_expr = ia.tan(precip1) * (ia.sin(precip1) * ia.sin(precip2) + ia.cos(precip2)) + ia.sqrt(precip3) * 2
lazy_result = lazy_expr.eval()
lazy_result
[11]:
<IArray (720, 721, 1440) np.float32>
memprofiler: used 870.57 MiB RAM (peak of 870.57 MiB) in 0.5610 s, total RAM usage 4496.77 MiB

Ok, this is really slow, but this is kind of expected, as ironArray comes with support for evaluating transcendental functions using the existing SIMD capabilities in the CPU.

Resource consumption#

As a summary, let’s do a plot on the speed for the different kind of computations. First for a regular mean:

[12]:
%mprof_plot mean_UDF mean_lazy -t "Mean computation"