Skip to main content

2 posts tagged with "optimization"

View All Tags

Python-Blosc2 4.0: Unleashing Compute Speed with miniexpr

· 9 min read
Francesc Alted
CEO, ironArray SLU
Luke Shaw
Product Manager, ironArray SLU

We are thrilled to announce the immediate availability of Python-Blosc2 4.0. This major release represents a significant architectural leap forward: we have given new powers to the internal compute engine by adding support for miniexpr, so it is possible now to evaluate expressions on blocks rather than chunks.

The result? Python-Blosc2 is now not just a fast storage library, but a compute powerhouse that can outperform specialized in-memory engines like NumPy or even NumExpr, even while handling compressed data.

Beating the Memory Wall (Again)

In our previous post, The Surprising Speed of Compressed Data: A Roofline Story, we showed how Blosc2 outruns the competition for out-of-core workloads, but for in-memory, low-intensity computations it often lagged behind Numexpr. Our faith in the compression-first Blosc2 paradigm, which is optimized for cache hierarchies, motivated the development of miniexpr. This is a purpose-built, thread-safe evaluator with vectorization capabilities, SIMD acceleration for common math functions and following NumPy conventions for type inference and promotions. We didn't just optimize existing code; we built a new engine from scratch to exploit modern CPU caches.

As a result, Python-Blosc2 4.0 improves greatly on earlier Blosc2 versions for memory-bound workloads:

  • The new miniexpr path dramatically improves low-intensity performance in memory.
  • The biggest gains are in the very-low/low kernels where cache traffic dominates.
  • High-intensity (compute-bound) workloads remain essentially unchanged, as expected.
  • Real-world applications like Cat2Cloud see immediate speedups (up to 4.5x) for data-intensive operations.

Keep reading to learn more about the results.

Computing Expressions in Blosc2

· 7 min read
Oumaima Ech Chdig
Intern, ironArray SLU

What expressions are?

The forthcoming version of Blosc2 will bring a powerful tool for performing mathematical operations on pre-compressed arrays, that is, on arrays whose data has been reduced in size using compression techniques. This functionality provides a flexible and efficient way to perform a wide range of operations, such as addition, subtraction, multiplication and other mathematical functions, directly on compressed arrays. This approach saves time and resources, especially when working with large data sets.

An example of expression computation in Blosc2 might be:

dtype = np.float64
shape = [30_000, 4_000]
size = shape[0] * shape[1]
a = np.linspace(0, 10, num=size, dtype=dtype).reshape(shape)
b = np.linspace(0, 10, num=size, dtype=dtype).reshape(shape)
c = np.linspace(0, 10, num=size, dtype=dtype).reshape(shape)

# Convert numpy arrays to Blosc2 arrays
a1 = blosc2.asarray(a, cparams=cparams)
b1 = blosc2.asarray(b, cparams=cparams)
c1 = blosc2.asarray(c, cparams=cparams)

# Perform the mathematical operation
expr = a1 + b1 * c1 # LazyExpr expression
expr += 2 # expressions can be modified
output = expr.compute(cparams=cparams) # compute! (output is compressed too)

Compressed arrays ( a1, b1, c1) are created from existing numpy arrays ( a, b, c) using Blosc2, then mathematical operations are performed on these compressed arrays using general algebraic expressions. The computation of these expressions is lazy, in that they are not evaluated immediately, but are meant to be evaluated later. Finally, the resulting expression is actually computed (via .compute()) and the desired output (compressed as well) is obtained.

How it works