Skip to main content

3 posts tagged with "memory"

View All Tags

Matrices, Blosc2 and PyTorch

· 7 min read
Luke Shaw
Product Manager at ironArray SLU
Francesc Alted
CEO ironArray SLU

One of the core functions of any numerical computing library is linear algebra, which is ubiquitous in scientific and industrial applications. Much image processing can be reduced to matrix-matrix or matrix-vector operations; and it is well-known that the majority of the computational effort expended in evaluating neural networks is due to batched matrix multiplications.

At the same time, the data which provide the operands for these transformations must be appropriately handled by the library in question - being able to rapidly perform the floating-point operations (FLOPs) internal to the matrix multiplication is of little use if the data cannot be fed to the compute engine (and then whisked away after computation) with sufficient speed and without overburdening memory.

In this space, PyTorch has proven to be one of the most popular libraries, backed by high-performance compiled C++ code, optional GPU acceleration, and an extensive library with a huge array of efficient functions for matrix multiplication, creation and management. It is also one of the most array API-compliant libraries out there.

Blosc2 and Linear Algebra

However PyTorch does not offer an interface for on-disk data. This means that when working with large datasets which do not fit in memory, all data must be fetched in batches into memory, computed with, and then saved back to disk, using another library such as h5py. This secondary library also handles compression so as to reduce storage space (and increase the I/O speed with which data is sent to disk).

Blosc2 is an integrated solution which provides highly efficient storage via compression and marries it to a powerful compute engine. One can easily write and read compressed data to disk with a succint syntax, with decompression and computation handled efficiently by the computational workhorse. In addition, the library automatically selects optimal chunking parameters for the data, without any of the ad-hoc experimentation required to find 'good' batch sizes.

Bringing Blosc2 to Heel

· 7 min read
Luke Shaw
Product Manager at ironArray SLU

There are many array libraries in the scientific and data ecosystem that provide native array types (Numpy, PyTorch, Zarr, h5py, PyTables, JAX, Blosc2) and an even larger list of those that ''consume'' these provided array types (Scikit-Learn, Parcels, Dask, Pillow, ScikitImage). Moroever, the division between the two groups is not very cleancut - PyTorch tensors are wrappers for NumPy arrays, and thus straddle the boundary between array provider and consumer.

Such a high degree of interdependency makes it crucial that array objects are portable between libraries - this means that the array objects must be standardised between libraries, but also that the libraries are equipped with a minimal set of functions that have the same names and signatures across the ecosystem, and that know how to ingest, produce and process the arrays. The ideal would be to be able to write code that works with arrays

import array_lib as xp
#
# Do array things with library
#

and then simply swap in any array library array_lib and have the code run.

From this set of concerns has sprung an open-source effort to develop the array API standard, along with an extensive associated test suite, to drive the array ecosystem towards this holy grail of interoperability.

Blosc2 and the array API

Blosc2 has been developed with the array API in mind from an early stage, but it is only now that ironArray has been able to dedicate development time to integration efforts. While the standard and test suite are still evolving (the latest version was released in December 2024), it is sufficiently stable to form the basis for ironArray's work.

Compress Better, Compute Bigger

· 10 min read
Francesc Alted
CEO ironArray SLU

Have you ever experienced the frustration of not being able to analyze a dataset because it's too large to fit in memory? Or perhaps you've encountered the memory wall, where computation is hindered by slow memory access? These are common challenges in data science and high-performance computing. The developers of Blosc and Blosc2 have consistently focused on achieving compression and decompression speeds that approach or even exceed memory bandwidth limits.

Moreover, with the introduction of a new compute engine in Blosc2 3.0, the guiding principle has evolved to "Compress Better, Compute Bigger." This enhancement enables computations on datasets that are over 100 times larger than the available RAM, all while maintaining high performance. Continue reading to know how to operate with datasets of 8 TB in human timeframes, using your own hardware.

The Importance of Better Compression

Data compression typically requires a trade-off between speed and compression ratio. Blosc2 allows users to fine-tune this balance. They can select from a variety of codecs and filters to maximize compression, and even introduce custom ones via its plugin system. For optimal speed, it's crucial to understand and utilize modern CPU capabilities. Multicore processing, SIMD, and cache hierarchies can significantly boost compression performance. Blosc2 leverages these features to achieve speeds close to memory bandwidth limits, and sometimes even surpassing them, particularly with contemporary CPUs.